C0.02 - Run
C0.02 ? Run
flowchart LR
A[Background: model-level policy and documentation are too general] --> B[RAIDT: run-level evidence framework]
B --> C[[Run: one configured GenAI use in context]]
C --> D[Run-level evidence]
D --> E[Evidence pack]
E --> F[Reviewer reconstruction]
C --> G[Score profile]
G --> H[Governance readiness]
F --> H
I[Healthcare drafting] --> C
J[Finance reporting] --> C
K[Public-service correspondence] --> C
L[Education support] --> C
M[Enterprise productivity] --> C← Star C0 - RAIDT Core, Definition, Values, Claims and Innovation
Star context: Defines the project identity of RAIDT by specifying the unit that governance actually attaches to: one concrete GenAI use in organisational work, examined through run-level evidence rather than abstract model claims.
Definition / background
In RAIDT, a run is one configured use of a generative AI system for a specific task, at a specific time, in a specific organisational context. It is not merely the final answer produced by the model. A run includes the task framing, the prompt or instruction, the model and tool configuration, any retrieved or supplied context, the generated output, the human or automated checks that followed, and the organisational setting in which the result was used.
Conceptually, the term matters because governance problems tend to arise at the level of actual use events. A policy may be well written and a model family may be well documented, yet a harmful, misleading, or non-compliant outcome usually emerges through one concrete execution in context. RAIDT therefore treats the run as the unit at which evidence should be gathered and governance should be assessed.
This distinguishes a run from related terms. It is narrower than a workflow, which may contain many AI and non-AI steps. It is more situated than a model card, which describes a system in general terms. It is more governable than a vague reference to "an AI use case", because it identifies a single reviewable instance. In RAIDT, that precision is what allows run-level evidence to be assembled into an evidence pack and translated into a score profile across the five pillars.
The run belongs centrally inside RAIDT because RAIDT's core claim is methodological: responsible governance becomes more credible when it is anchored in evidence about what was actually done. By defining the run clearly, RAIDT establishes the boundary of what is being evidenced, reviewed, contested, and improved.
Why this concept matters
The concept of the run solves a common governance failure in GenAI adoption: organisations talk about systems at too high a level of abstraction. They may know which model they procured and which policy they approved, but still be unable to explain what happened in a contested case, why a particular output was produced, or whether the right checks occurred before action was taken.
Treating the run as the unit of governance avoids several confusions. It prevents people from equating the model with the event, the output with the whole process, or policy compliance with evidential sufficiency. It also makes it possible to compare similar uses over time, identify recurring failure points, and show whether governance quality is improving across actual practice rather than rhetoric.
Without a clear concept of the run, GenAI governance becomes difficult to operationalise. Evidence collection becomes inconsistent, audit trails remain partial, and accountability is blurred across human decisions, prompts, tools, and model behaviour. RAIDT uses the run to move governance from principles and assertions towards evidence, reviewability, contestability, and continuous improvement.
Key idea: The run matters because it is the smallest meaningful unit of GenAI use that can be governed with evidence rather than assumption.
What this item captures
- The bounded unit of GenAI use that RAIDT evaluates.
- The relationship between prompt, configuration, context, output, and post-output checks.
- The organisational circumstances that give a model interaction governance significance.
- The point at which evidence can be collected, reconstructed, contested, and scored.
- The conceptual basis for connecting technical execution to organisational accountability.
- The starting unit from which patterns, risks, and readiness can later be analysed across many cases.
Practical example / likely audience question
Audience question
Why is RAIDT organised around the run rather than around the model, the policy, or the business process?
Answer
The concern behind this question is usually that the run may look too narrow. A supervisor, reviewer, or manager may worry that one isolated event cannot say much about governance quality. The direct answer is that RAIDT uses the run not because broader levels are irrelevant, but because governance failures become visible only when broad arrangements are instantiated in practice.
For example, an organisation might say that it uses an approved large language model under a responsible AI policy. That statement remains too general to resolve a dispute about one problematic summary, recommendation, or drafted response. To review the case properly, a reviewer needs to know which prompt was used, which context was retrieved, which model settings applied, what output was produced, what checks followed, and who acted on the result. That bounded event is the run.
RAIDT handles this better than a generic AI governance approach because it gives the reviewer an operational unit that can be evidenced. A generic approach may confirm that a policy exists or that a model was approved. RAIDT asks whether this particular use was sufficiently evidenced and governable. That is a much stronger basis for accountability, audit, and learning.
Practical example in RAIDT terms
Consider a healthcare trust using a GenAI assistant to draft discharge summaries from clinician notes. One clinician runs the system for a patient leaving hospital after a medication change. The model produces a discharge summary that omits a dosage adjustment. The governance issue is not simply that "the AI system was used in healthcare". The issue is that one specific run generated one specific clinical draft in one specific context.
The evidence needed includes the prompt template, the patient-note context supplied to the system, the model and version used, any retrieval or tool calls, the resulting draft, the clinician's edits, the approval step, and the final version sent onward. The most affected RAIDT pillars are Responsibility because a clinical actor must remain accountable for the use, Dependability because omission risk directly affects reliability, and Traceability because the run must be reconstructable after the event. Auditability also matters because a reviewer may need to test whether the omission arose from prompt design, missing context, or inadequate checking.
By defining the run clearly, RAIDT improves governance readiness. The organisation can review the exact event, identify where the failure entered the process, adjust templates or checks, and demonstrate a defensible improvement path rather than relying on generic assurances about safe deployment.
Detailed link to RAIDT
Run links to RAIDT in four ways.
First, it gives RAIDT its core unit of governance: one actual GenAI use in organisational context rather than a broad statement about a model or policy.
Second, it provides the boundary within which run-level evidence is gathered and interpreted.
Third, it is the object from which an evidence pack can be assembled and a score profile can be justified.
Fourth, it supports reviewability, contestability, audit readiness, and organisational learning because specific uses can be reconstructed, compared, and improved.
Run ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness
This chain is central to RAIDT's design. The run is the starting point; without a clearly bounded run, the later governance outputs become weaker, more subjective, and harder to defend.
Link to the five RAIDT pillars
Responsibility
The run is where responsibility becomes concrete. It identifies the task, the actor, the intended use, and the point at which human judgement should frame, review, or approve GenAI outputs.
Example evidence / implication:
- The run record shows who initiated the task and under what organisational role.
- The run clarifies where human approval or override was expected before consequential use.
Auditability
Auditability depends on being able to inspect one use event in enough detail to understand what happened and whether process requirements were followed.
Example evidence / implication:
- The run record contains prompt text, configuration details, timestamps, and review actions.
- A reviewer can reconstruct the path from instruction to output instead of relying on recollection alone.
Interpretability
Interpretability is supported when the run captures the context needed to explain why an output was plausible, surprising, or problematic in that case.
Example evidence / implication:
- The run shows what source material or retrieved context shaped the output.
- Reviewers can distinguish a model limitation from a poor prompt, missing context, or unclear task framing.
Dependability
Dependability is affected because repeatability, robustness, and quality assessment all depend on how a specific run was configured and checked.
Example evidence / implication:
- The run record reveals whether the system behaved consistently under defined settings.
- Failures can be traced to unstable prompting, weak validation, or unsuitable use conditions.
Traceability
Traceability is the pillar most directly enabled by the run concept. If the run is not clearly bounded and recorded, later tracing becomes fragmented or impossible.
Example evidence / implication:
- The run provides a unique event that can be linked to inputs, outputs, reviewers, and downstream actions.
- Organisational learning becomes possible because similar runs can be grouped and analysed over time.
The run affects all five pillars, but it has especially strong consequences for Auditability and Traceability because both depend on a clearly defined reviewable unit.
Why this item is more than a generic concept
In general AI or software discussions, a "run" may simply mean a system execution or a completed invocation. In RAIDT, the term is more operational and governance-oriented. It means a bounded unit of organisationally meaningful GenAI use for which evidence can be collected, reviewed, challenged, and scored.
That difference matters. A generic concept of execution is technically useful, but it does not by itself support accountability. The RAIDT meaning is stronger because it ties the run to evidence requirements, review processes, and governance outcomes. The run is therefore not just an event in a system log; it is the foundational object of responsible oversight.
Common misunderstanding
Misunderstanding
A run is just the model's output.
Correction
A run includes the output, but it is not reducible to the output. It also includes the prompt or task framing, the relevant context, configuration choices, and the checks or decisions that follow. For example, if a GenAI tool drafts a procurement summary, the summary alone does not tell a reviewer whether the system was given incomplete source material, whether retrieval failed, or whether a human verified the recommendation before action. RAIDT treats the whole use event as the run because governance depends on understanding the conditions that produced the output, not merely the output itself.
Boundary and limitation
The run does not by itself prove that a system is safe, fair, lawful, or effective in all circumstances. It provides the bounded unit on which those questions can be examined. A single well-documented run cannot substitute for wider evaluation across many tasks, users, or organisational settings.
The run also depends on capture quality. If prompts are not logged, context is missing, tool calls are opaque, or human review steps are undocumented, then the run may remain only partially reconstructable. RAIDT handles this limitation by making evidential completeness itself visible. Weakly documented runs should lead to weaker claims, lower scores, and clearer prioritisation for governance improvement.
Implementation levels
Manual implementation
A researcher, policy team, or small operational unit can define a run manually by recording a structured case note after each significant GenAI use. This can include the task, prompt, context, output, reviewer, decision, and observed issue. Manual implementation is labour-intensive, but it is sufficient to establish the governance logic and support early empirical work.
Semi-automated implementation
A semi-automated approach uses templates, metadata capture, form-based review, and standardised evidence fields. Prompts, timestamps, model names, and reviewer checkpoints can be captured automatically, while human judgement is added through structured annotations. This approach improves consistency without requiring full technical integration.
Fully automated implementation
At scale, the run can be implemented through a platform wrapper, orchestration layer, governance pipeline, or logging architecture that treats each GenAI invocation as a governed event. Inputs, outputs, tool usage, retrieved context, approvals, and downstream actions can be linked automatically to create machine-supported evidence packs, dashboard views, and comparative readiness analysis across many runs.
Practical use in the RAIDT project
Within the RAIDT project, this item is foundational for explaining the framework's conceptual architecture in papers, supervision meetings, and viva defence. In Paper 08 Foundations, the run helps justify why RAIDT uses a unit-of-analysis centred on actual organisational use rather than only principle-level governance language. In Paper 09 Empirical Validation, the run provides the case boundary for collecting evidence, comparing governance quality, and testing whether reviewers can reconstruct what occurred. In Paper 10 Policy Pathways, the run supports arguments about how policy can become more operational by specifying what organisations should retain and review.
The concept is also useful for sector playbooks, evidence-pack design, scoring rubrics, and governance interventions. When explaining RAIDT to supervisors or journal reviewers, the run provides a precise answer to the question, "What exactly is being governed?" That clarity strengthens positioning against broader responsible AI frameworks that remain less operational at the point of use.
Key audience questions to prepare for
Q1. Why not make the model, rather than the run, the main unit of governance?
Because models are used across many tasks and contexts, while governance failures usually arise in one specific use event. RAIDT still acknowledges model-level documentation, but it treats the run as the unit where governance claims become testable.
Q2. Is one run too small a unit to support meaningful governance?
No. One run is the minimum reviewable unit, not the only unit of analysis. RAIDT starts there so that larger patterns can later be built from defensible case-level evidence.
Q3. Does a run include human review, or only the AI interaction itself?
In RAIDT it includes the post-output checks and decisions that make the AI use organisationally consequential. Excluding review would weaken accountability and leave the governance record incomplete.
Q4. How is a run different from a workflow?
A workflow may contain many linked tasks, systems, and human decisions. A run is one bounded GenAI use within that wider workflow. RAIDT chooses this narrower unit because it is easier to evidence, compare, reconstruct, and score.
Q5. What happens if important parts of a run are not captured?
Then the governance claim should weaken accordingly. Missing run data is itself an evidential finding, indicating lower auditability, weaker traceability, and reduced governance readiness.
Suggested citation concepts to support this item
- unit of analysis in AI governance
- event-level accountability in sociotechnical systems
- AI audit logs and reconstruction of decision events
- documentation of human-AI interaction in organisational work
- responsible AI operationalisation at point of use
- provenance and traceability in generative AI systems
- process-based accountability for automated decision support
- empirical governance of large language model deployments
- human oversight in high-stakes AI-enabled workflows
- evidential approaches to trustworthy AI assurance
Short explanation for presentation
A run is the basic unit of governance in RAIDT. It means one specific use of a generative AI system for one task, at one time, in one organisational context. That matters because most governance disputes are not really about a model in the abstract; they are about a concrete event involving a prompt, a configuration, some context, an output, and whatever checking followed. By defining the run clearly, RAIDT can collect run-level evidence, assemble an evidence pack, and generate a score profile that supports reviewability, contestability, and audit readiness. In effect, the run is the bridge between technical system behaviour and organisational accountability. Without a clear run concept, governance remains too general to explain what actually happened in practice.
One-line takeaway
Run is the bounded unit of real GenAI use because RAIDT governs concrete evidence-bearing events, not abstract claims about systems.