S4.04 - Task_and_domain_label

S4.04 ? Task and domain label

flowchart LR
    A[Vague AI use descriptions
mixed tasks and domains] --> B[RAIDT
run-level evidence framework] A2[Generic governance claims
without situated context] --> B B --> C[[Task and domain label]] H[Task type
summarisation, explanation, triage, Q&A] --> C I[Domain
healthcare, finance, policy, cybersecurity] --> C J[Risk band and reviewer expertise] --> C C --> D[Evidence pack
right evidence for the right run] C --> E[Score profile
context-sensitive pillar scoring] C --> F[Reviewer reconstruction
clearer audit trail] C --> G[Governance readiness
reviewability, contestability, learning]

? Star S4 - Evidence Architecture and Artefacts

Star context: Specifies the concrete fields and artefacts that make a run record inspectable, including the classification cues that determine which evidence, thresholds, and review expectations apply to a given GenAI run.


Academic picture
Definition / background

The task and domain labels classify the run, such as healthcare summarisation, finance explanation, policy Q&A or cybersecurity triage. In RAIDT terms, this means recording both what kind of work the GenAI system was being used to perform and the substantive domain in which that work took place.

Conceptually, the task label answers the question, "What is the system being asked to do in this run?" The domain label answers the question, "In what organisational, professional, or sectoral setting is that task being performed?" These are related but not identical. A summarisation task may appear in healthcare, law, education, or public administration, yet the governance implications change because the stakes, norms, and harm pathways differ by domain.

This distinction matters in GenAI governance because many apparent performance issues are actually context issues. The same model output may be acceptable in a low-stakes brainstorming context and unacceptable in a regulated or safety-critical context. RAIDT therefore treats task and domain labels as part of the evidence architecture of a run, not as optional descriptive metadata. Without them, a reviewer cannot know which rubric anchors, escalation thresholds, or evidence requirements should apply.

Within RAIDT, task and domain labels belong inside run-level evidence because RAIDT governs configured use, not abstract models. They help connect the evidence pack to the score profile across Responsibility, Auditability, Interpretability, Dependability, and Traceability. In effect, they are part of the classification layer that makes later judgement operationally meaningful.

Why this concept matters

Task and domain labels solve a basic but often neglected governance problem: organisations cannot review a run properly if they do not know what kind of activity the run represented. Without this classification, reviewers are forced to apply generic expectations to context-specific work. That weakens comparison across runs, distorts scoring, and makes it harder to justify why one case triggered stronger review than another.

The concept also prevents a common confusion between system capability and situated use. A model may be capable of many things, but RAIDT evaluates one configured run for one concrete task in one context. The task and domain labels make that unit of analysis visible. This is essential for moving from broad principles such as fairness, accountability, or safety toward operational governance that can be audited, challenged, and improved.

If the item is missing, several risks follow: sector-specific harms may be missed; inappropriate benchmarks may be used; reviewers with the wrong expertise may be assigned; and organisations may overstate governance maturity because they appear to have controls that are not actually matched to the run under review. In practice, the label is a small field with large downstream consequences.

Key idea: Task and domain labels matter because they tell RAIDT which governance expectations belong to a specific run, turning generic oversight into context-sensitive evidence and review.

What this item captures
Practical example / likely audience question

Audience question

Why classify tasks?

Answer

The concern behind the question is usually that classification sounds bureaucratic or redundant, as though a reviewer can simply inspect the prompt and output and work out what happened. In reality, that assumption breaks down quickly in organisational settings. A prompt may look similar across runs, while the governance significance differs because the intended task and domain differ.

The direct answer is that scoring anchors and risk thresholds depend on domain and task type. A model-generated summary for internal meeting notes is not governed in the same way as a model-generated summary of a patient record. Likewise, explanatory output in a finance setting may need to support stricter documentation and review than explanatory output in a low-stakes training context.

RAIDT handles this better than generic AI governance approaches because it binds the classification to a run-level evidence record. Instead of saying, "Our organisation uses GenAI in healthcare," RAIDT can say, "This run was healthcare summarisation, performed at this time, with this prompt lineage, this evidence pack, and these applicable review expectations." That shift makes the governance claim inspectable.

Practical example in RAIDT terms

Consider a hospital team using a GenAI assistant to summarise referral letters before a clinician review. The use case is summarisation, but the domain is healthcare, not generic office productivity. At run level, the issue is that the output may omit a clinically salient symptom, medication conflict, or safeguarding concern even if the summary appears fluent.

In RAIDT terms, the run record should therefore label the task as clinical summarisation and the domain as healthcare. The evidence needed includes the prompt version, retrieved context if any, output hash, reviewer notes, and a domain-appropriate judgement of completeness and risk. Responsibility is affected because accountability for use must be clearly allocated. Dependability is affected because the run must perform consistently under clinical expectations. Auditability and Traceability are affected because a reviewer must be able to reconstruct why this run was handled under healthcare controls rather than a generic summarisation rubric. The label improves governance readiness by ensuring that the run enters the right evidence pathway from the start.

Detailed link to RAIDT

Task and domain label links to RAIDT in four ways.

First, it links to RAIDT's core idea that governance should apply to situated runs rather than to abstract claims about a model or product.
Second, it links directly to run-level evidence because it classifies the specific use that the evidence record is meant to describe.
Third, it links to both the evidence pack and the score profile because task and domain determine which evidence is expected and how pillar judgements should be interpreted.
Fourth, it links to reviewability, contestability, audit readiness, and organisational learning because the classification makes runs comparable, challengeable, and reviewable across repeated use.

Task and domain label ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

In short, the item helps RAIDT decide what kind of run is being governed before asking whether that run is responsible, auditable, interpretable, dependable, or traceable.

Link to the five RAIDT pillars

Responsibility

Task and domain labels support Responsibility by clarifying the context in which obligations apply. A run in a high-stakes domain may require stricter approval, stronger human oversight, or clearer role allocation than a similar task in a low-stakes setting.

Example evidence / implication:

Auditability

This item strongly affects Auditability because reviewers need to know what category of work they are auditing. Without the label, it is difficult to judge whether the right standards, controls, and comparison cases were applied.

Example evidence / implication:

Interpretability

The effect on Interpretability is moderate but important. Knowing the task and domain helps explain what kinds of output characteristics matter and why certain explanation demands exist.

Example evidence / implication:

Dependability

This item strongly affects Dependability because reliability must be judged against task-specific and domain-specific expectations, not generic notions of acceptable performance.

Example evidence / implication:

Traceability

Task and domain labels strongly support Traceability by making the purpose and context of the run explicit in the evidence trail. They allow downstream artefacts to be interpreted in relation to the correct use case.

Example evidence / implication:

Why this item is more than a generic concept

In general AI governance, task or domain labels may be treated as broad categories used for portfolio reporting, project scoping, or policy statements. In RAIDT, they mean something more operational: they are run-level classification fields that determine how a specific evidence record should be interpreted and reviewed.

The RAIDT meaning is more operational because it is tied to run-level evidence. The label is not merely descriptive; it helps configure oversight. It influences what evidence is collected, which reviewers are appropriate, how scoring anchors are interpreted, and how the resulting case contributes to organisational learning. That is why this item belongs in the evidence architecture rather than only in a high-level taxonomy.

Common misunderstanding

Misunderstanding

A task and domain label is just an administrative tag added for filing convenience.

Correction

In RAIDT, the label is not just for filing. It helps decide which governance pathway the run enters. For example, two runs may both involve summarisation, but one may summarise workshop notes and the other may summarise legal witness material. Treating them as equivalent because they share a task verb would hide important differences in evidential burden, reviewer expertise, and acceptable failure tolerance.

Boundary and limitation

Task and domain labels do not prove that a run is safe, lawful, accurate, fair, or effective. They also do not replace substantive evaluation, human oversight, or domain expertise. A label can still be wrong, overly coarse, inconsistently applied, or strategically chosen to understate risk.

The item works best when the labelling scheme is defined clearly enough to support consistent use, but flexible enough to reflect real organisational variation. RAIDT handles this limitation by treating the label as one part of a broader evidence pack. The label guides review; it does not settle the review outcome. If a classification is contested, that itself becomes an inspectable governance issue.

Implementation levels

Manual implementation

A researcher, practitioner, or small team can apply this item manually by recording the intended task type and domain for each run in a structured note or evidence template. A short controlled vocabulary and a short explanation field are usually sufficient at this level.

Semi-automated implementation

Semi-automated implementation can use form fields, templates, dropdown taxonomies, or lightweight review workflows so that operators choose from standardised task and domain categories while still being able to add clarifying notes. This improves consistency and supports later aggregation across runs.

Fully automated implementation

At scale, a governance wrapper, orchestration layer, or platform can require task and domain labels before a run is executed or promoted. The system can then route the run into domain-specific logging, scoring, escalation, reviewer assignment, and dashboard views. In this form, the label becomes a control point inside the operational governance pipeline.

Practical use in the RAIDT project

Within the RAIDT project, this item is useful in several ways. In Paper 08 Foundations, it helps justify why run-level governance must classify use context before evaluating evidence quality. In Paper 09 Empirical Validation, it supports analysis of whether reviewers score runs differently when task and domain are specified clearly. In Paper 10 Policy Pathways, it helps translate abstract policy language into concrete administrative fields that organisations can actually record.

It is also useful in sector playbooks because each sector can define its own task families, domain sensitivities, and escalation expectations. In the evidence pack, the item helps structure comparability and reviewer briefing. In the scoring rubric, it supports context-sensitive anchors. In viva defence or supervision discussion, it is a strong example of RAIDT's central move from general principles to inspectable operational metadata.

Key audience questions to prepare for

Q1. Why not infer the task and domain from the prompt and output afterwards?

Inference after the fact is possible, but it is weaker than explicit capture. Prompts can be ambiguous, and outputs do not always reveal the stakes or intended use context. Explicit recording improves comparability, review assignment, and audit defensibility.

Q2. Isn't this just another taxonomy exercise?

Only if it stays abstract. In RAIDT, the value comes from linking the classification to run-level evidence, scoring anchors, review pathways, and governance decisions. That makes it operational rather than purely descriptive.

Q3. What happens when one run spans multiple tasks or domains?

The run should either record a primary label with secondary qualifiers or be decomposed into more precise sub-runs if governance consequences differ materially. RAIDT benefits from granularity when different controls would apply.

Q4. Could rigid labels oversimplify messy real-world use?

Yes, if implemented poorly. That is why the scheme should support controlled categories plus explanatory notes. The goal is structured clarity, not false neatness.

Q5. Why is this important for PhD-level argument rather than just system administration?

Because it demonstrates that governance quality depends on how the unit of analysis is specified. The item helps show that responsible AI is not only about values or model properties, but also about how situated use is classified and rendered reviewable.

Suggested citation concepts to support this item
Short explanation for presentation

Task and domain label is the field that tells RAIDT what kind of run is being governed. It records both the activity being performed, such as summarisation or triage, and the domain in which it occurs, such as healthcare, finance, or cybersecurity. That matters because governance expectations are not the same across contexts. In RAIDT, the label is not just descriptive metadata. It determines which evidence is relevant, which reviewers are appropriate, how scoring anchors should be interpreted, and what level of scrutiny is justified. The concept therefore helps move governance away from generic claims about AI use and toward run-level evidence that is inspectable, comparable, and contestable.

One-line takeaway

Task and domain label is the run-level classification of what the GenAI system was doing and where, because RAIDT needs that context to attach the right evidence, scoring logic, and governance review.

Related items in evidence architecture and artefacts
Anchored questions

No anchored questions were present in the source item, so this note preserves that absence rather than inventing new anchor prompts.

Powered by Forestry.md