S4.04 - Task_and_domain_label
S4.04 ? Task and domain label
flowchart LR
A[Vague AI use descriptions
mixed tasks and domains] --> B[RAIDT
run-level evidence framework]
A2[Generic governance claims
without situated context] --> B
B --> C[[Task and domain label]]
H[Task type
summarisation, explanation, triage, Q&A] --> C
I[Domain
healthcare, finance, policy, cybersecurity] --> C
J[Risk band and reviewer expertise] --> C
C --> D[Evidence pack
right evidence for the right run]
C --> E[Score profile
context-sensitive pillar scoring]
C --> F[Reviewer reconstruction
clearer audit trail]
C --> G[Governance readiness
reviewability, contestability, learning]? Star S4 - Evidence Architecture and Artefacts
Star context: Specifies the concrete fields and artefacts that make a run record inspectable, including the classification cues that determine which evidence, thresholds, and review expectations apply to a given GenAI run.
Academic picture
Definition / background
The task and domain labels classify the run, such as healthcare summarisation, finance explanation, policy Q&A or cybersecurity triage. In RAIDT terms, this means recording both what kind of work the GenAI system was being used to perform and the substantive domain in which that work took place.
Conceptually, the task label answers the question, "What is the system being asked to do in this run?" The domain label answers the question, "In what organisational, professional, or sectoral setting is that task being performed?" These are related but not identical. A summarisation task may appear in healthcare, law, education, or public administration, yet the governance implications change because the stakes, norms, and harm pathways differ by domain.
This distinction matters in GenAI governance because many apparent performance issues are actually context issues. The same model output may be acceptable in a low-stakes brainstorming context and unacceptable in a regulated or safety-critical context. RAIDT therefore treats task and domain labels as part of the evidence architecture of a run, not as optional descriptive metadata. Without them, a reviewer cannot know which rubric anchors, escalation thresholds, or evidence requirements should apply.
Within RAIDT, task and domain labels belong inside run-level evidence because RAIDT governs configured use, not abstract models. They help connect the evidence pack to the score profile across Responsibility, Auditability, Interpretability, Dependability, and Traceability. In effect, they are part of the classification layer that makes later judgement operationally meaningful.
Why this concept matters
Task and domain labels solve a basic but often neglected governance problem: organisations cannot review a run properly if they do not know what kind of activity the run represented. Without this classification, reviewers are forced to apply generic expectations to context-specific work. That weakens comparison across runs, distorts scoring, and makes it harder to justify why one case triggered stronger review than another.
The concept also prevents a common confusion between system capability and situated use. A model may be capable of many things, but RAIDT evaluates one configured run for one concrete task in one context. The task and domain labels make that unit of analysis visible. This is essential for moving from broad principles such as fairness, accountability, or safety toward operational governance that can be audited, challenged, and improved.
If the item is missing, several risks follow: sector-specific harms may be missed; inappropriate benchmarks may be used; reviewers with the wrong expertise may be assigned; and organisations may overstate governance maturity because they appear to have controls that are not actually matched to the run under review. In practice, the label is a small field with large downstream consequences.
Key idea: Task and domain labels matter because they tell RAIDT which governance expectations belong to a specific run, turning generic oversight into context-sensitive evidence and review.
What this item captures
- The type of work performed in the run, such as summarisation, explanation, classification, drafting, triage, or question answering.
- The substantive domain or sector in which the work occurs, such as healthcare, finance, law, education, social care, public services, or cybersecurity.
- The context needed to select appropriate evidence expectations, review criteria, and escalation thresholds.
- The basis for comparing like with like across runs rather than mixing dissimilar use cases into one governance category.
- The conditions under which pillar scoring should be interpreted differently because domain stakes and task demands vary.
- The organisational framing needed for reviewer assignment, policy mapping, and learning across repeated deployments.
Practical example / likely audience question
Audience question
Why classify tasks?
Answer
The concern behind the question is usually that classification sounds bureaucratic or redundant, as though a reviewer can simply inspect the prompt and output and work out what happened. In reality, that assumption breaks down quickly in organisational settings. A prompt may look similar across runs, while the governance significance differs because the intended task and domain differ.
The direct answer is that scoring anchors and risk thresholds depend on domain and task type. A model-generated summary for internal meeting notes is not governed in the same way as a model-generated summary of a patient record. Likewise, explanatory output in a finance setting may need to support stricter documentation and review than explanatory output in a low-stakes training context.
RAIDT handles this better than generic AI governance approaches because it binds the classification to a run-level evidence record. Instead of saying, "Our organisation uses GenAI in healthcare," RAIDT can say, "This run was healthcare summarisation, performed at this time, with this prompt lineage, this evidence pack, and these applicable review expectations." That shift makes the governance claim inspectable.
Practical example in RAIDT terms
Consider a hospital team using a GenAI assistant to summarise referral letters before a clinician review. The use case is summarisation, but the domain is healthcare, not generic office productivity. At run level, the issue is that the output may omit a clinically salient symptom, medication conflict, or safeguarding concern even if the summary appears fluent.
In RAIDT terms, the run record should therefore label the task as clinical summarisation and the domain as healthcare. The evidence needed includes the prompt version, retrieved context if any, output hash, reviewer notes, and a domain-appropriate judgement of completeness and risk. Responsibility is affected because accountability for use must be clearly allocated. Dependability is affected because the run must perform consistently under clinical expectations. Auditability and Traceability are affected because a reviewer must be able to reconstruct why this run was handled under healthcare controls rather than a generic summarisation rubric. The label improves governance readiness by ensuring that the run enters the right evidence pathway from the start.
Detailed link to RAIDT
Task and domain label links to RAIDT in four ways.
First, it links to RAIDT's core idea that governance should apply to situated runs rather than to abstract claims about a model or product.
Second, it links directly to run-level evidence because it classifies the specific use that the evidence record is meant to describe.
Third, it links to both the evidence pack and the score profile because task and domain determine which evidence is expected and how pillar judgements should be interpreted.
Fourth, it links to reviewability, contestability, audit readiness, and organisational learning because the classification makes runs comparable, challengeable, and reviewable across repeated use.
Task and domain label ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness
In short, the item helps RAIDT decide what kind of run is being governed before asking whether that run is responsible, auditable, interpretable, dependable, or traceable.
Link to the five RAIDT pillars
Responsibility
Task and domain labels support Responsibility by clarifying the context in which obligations apply. A run in a high-stakes domain may require stricter approval, stronger human oversight, or clearer role allocation than a similar task in a low-stakes setting.
Example evidence / implication:
- The run is labelled as healthcare summarisation, triggering clinician review rather than general administrative sign-off.
- Domain classification justifies why additional safeguards were required before operational use.
Auditability
This item strongly affects Auditability because reviewers need to know what category of work they are auditing. Without the label, it is difficult to judge whether the right standards, controls, and comparison cases were applied.
Example evidence / implication:
- Audit logs show that the run was reviewed against a domain-specific checklist rather than a generic output-quality rubric.
- Cross-run analysis can compare similar finance explanation runs with one another instead of mixing them with unrelated tasks.
Interpretability
The effect on Interpretability is moderate but important. Knowing the task and domain helps explain what kinds of output characteristics matter and why certain explanation demands exist.
Example evidence / implication:
- An explanation judged sufficient for enterprise productivity may be judged insufficient for legal or clinical use.
- Reviewers can interpret model behaviour relative to the expectations of the specific task class.
Dependability
This item strongly affects Dependability because reliability must be judged against task-specific and domain-specific expectations, not generic notions of acceptable performance.
Example evidence / implication:
- A cybersecurity triage run can be assessed for consistency, false reassurance, and escalation reliability within that task domain.
- Failure patterns can be tracked by labelled category, revealing where dependable use is or is not being achieved.
Traceability
Task and domain labels strongly support Traceability by making the purpose and context of the run explicit in the evidence trail. They allow downstream artefacts to be interpreted in relation to the correct use case.
Example evidence / implication:
- The label explains why certain prompts, retrieval sources, and reviewer roles appear in the run record.
- Investigators can trace how a particular class of run accumulated governance controls over time.
Why this item is more than a generic concept
In general AI governance, task or domain labels may be treated as broad categories used for portfolio reporting, project scoping, or policy statements. In RAIDT, they mean something more operational: they are run-level classification fields that determine how a specific evidence record should be interpreted and reviewed.
The RAIDT meaning is more operational because it is tied to run-level evidence. The label is not merely descriptive; it helps configure oversight. It influences what evidence is collected, which reviewers are appropriate, how scoring anchors are interpreted, and how the resulting case contributes to organisational learning. That is why this item belongs in the evidence architecture rather than only in a high-level taxonomy.
Common misunderstanding
Misunderstanding
A task and domain label is just an administrative tag added for filing convenience.
Correction
In RAIDT, the label is not just for filing. It helps decide which governance pathway the run enters. For example, two runs may both involve summarisation, but one may summarise workshop notes and the other may summarise legal witness material. Treating them as equivalent because they share a task verb would hide important differences in evidential burden, reviewer expertise, and acceptable failure tolerance.
Boundary and limitation
Task and domain labels do not prove that a run is safe, lawful, accurate, fair, or effective. They also do not replace substantive evaluation, human oversight, or domain expertise. A label can still be wrong, overly coarse, inconsistently applied, or strategically chosen to understate risk.
The item works best when the labelling scheme is defined clearly enough to support consistent use, but flexible enough to reflect real organisational variation. RAIDT handles this limitation by treating the label as one part of a broader evidence pack. The label guides review; it does not settle the review outcome. If a classification is contested, that itself becomes an inspectable governance issue.
Implementation levels
Manual implementation
A researcher, practitioner, or small team can apply this item manually by recording the intended task type and domain for each run in a structured note or evidence template. A short controlled vocabulary and a short explanation field are usually sufficient at this level.
Semi-automated implementation
Semi-automated implementation can use form fields, templates, dropdown taxonomies, or lightweight review workflows so that operators choose from standardised task and domain categories while still being able to add clarifying notes. This improves consistency and supports later aggregation across runs.
Fully automated implementation
At scale, a governance wrapper, orchestration layer, or platform can require task and domain labels before a run is executed or promoted. The system can then route the run into domain-specific logging, scoring, escalation, reviewer assignment, and dashboard views. In this form, the label becomes a control point inside the operational governance pipeline.
Practical use in the RAIDT project
Within the RAIDT project, this item is useful in several ways. In Paper 08 Foundations, it helps justify why run-level governance must classify use context before evaluating evidence quality. In Paper 09 Empirical Validation, it supports analysis of whether reviewers score runs differently when task and domain are specified clearly. In Paper 10 Policy Pathways, it helps translate abstract policy language into concrete administrative fields that organisations can actually record.
It is also useful in sector playbooks because each sector can define its own task families, domain sensitivities, and escalation expectations. In the evidence pack, the item helps structure comparability and reviewer briefing. In the scoring rubric, it supports context-sensitive anchors. In viva defence or supervision discussion, it is a strong example of RAIDT's central move from general principles to inspectable operational metadata.
Key audience questions to prepare for
Q1. Why not infer the task and domain from the prompt and output afterwards?
Inference after the fact is possible, but it is weaker than explicit capture. Prompts can be ambiguous, and outputs do not always reveal the stakes or intended use context. Explicit recording improves comparability, review assignment, and audit defensibility.
Q2. Isn't this just another taxonomy exercise?
Only if it stays abstract. In RAIDT, the value comes from linking the classification to run-level evidence, scoring anchors, review pathways, and governance decisions. That makes it operational rather than purely descriptive.
Q3. What happens when one run spans multiple tasks or domains?
The run should either record a primary label with secondary qualifiers or be decomposed into more precise sub-runs if governance consequences differ materially. RAIDT benefits from granularity when different controls would apply.
Q4. Could rigid labels oversimplify messy real-world use?
Yes, if implemented poorly. That is why the scheme should support controlled categories plus explanatory notes. The goal is structured clarity, not false neatness.
Q5. Why is this important for PhD-level argument rather than just system administration?
Because it demonstrates that governance quality depends on how the unit of analysis is specified. The item helps show that responsible AI is not only about values or model properties, but also about how situated use is classified and rendered reviewable.
Suggested citation concepts to support this item
- task taxonomy in AI governance
- context-sensitive evaluation of generative AI systems
- domain-specific risk assessment for foundation model use
- sociotechnical classification in AI accountability
- metadata standards for AI audit trails
- human oversight requirements in high-stakes AI domains
- run-level evidence and AI assurance
- sector-specific governance controls for generative AI
- auditability of AI decision-support workflows
- operationalising responsible AI through use-case classification
Short explanation for presentation
Task and domain label is the field that tells RAIDT what kind of run is being governed. It records both the activity being performed, such as summarisation or triage, and the domain in which it occurs, such as healthcare, finance, or cybersecurity. That matters because governance expectations are not the same across contexts. In RAIDT, the label is not just descriptive metadata. It determines which evidence is relevant, which reviewers are appropriate, how scoring anchors should be interpreted, and what level of scrutiny is justified. The concept therefore helps move governance away from generic claims about AI use and toward run-level evidence that is inspectable, comparable, and contestable.
One-line takeaway
Task and domain label is the run-level classification of what the GenAI system was doing and where, because RAIDT needs that context to attach the right evidence, scoring logic, and governance review.
Related items in evidence architecture and artefacts
- S4.01 ? run_id
- S4.02 ? Timestamp
- S4.03 ? User role / operator role
- S4.05 ? Prompt registry
- S4.06 ? Prompt ID and version
- S4.07 ? Prompt hash
- S4.08 ? Model/provider/version identifier
- S4.09 ? Decoding parameters
- S4.10 ? Retrieval query and index ID
- S4.11 ? Retrieved document IDs and hashes
- S4.12 ? Tool-chain trace
- S4.13 ? Adapter ID / PEFT lineage
- S4.14 ? Alignment policy ID
- S4.15 ? Output hash
- S4.16 ? Review decision and reviewer notes
- ? and 1 more
Anchored questions
No anchored questions were present in the source item, so this note preserves that absence rather than inventing new anchor prompts.