S3.01 - Run_as_unit_of_governance

S3.01 ? Run as unit of governance

flowchart LR
    A[Background problem:
model-level assurance is too abstract for case review] --> B[RAIDT:
run-level evidence framework]
    P[Policy principles and supplier claims] --> B
    B --> C[[Run as unit of governance]]
    C --> D[Run-level evidence pack]
    C --> E[RAIDT five-pillar score profile]
    C --> F[Reconstructable organisational case]
    D --> G[Reviewer reconstruction and challenge]
    E --> H[Governance readiness]
    F --> H
    I[Prompt, retrieval, tools,
user role, timestamp, output] --> C
    J[Healthcare, finance, public services] --> C

? Star S3 - Run-Level Evidence Logic

Star context: Explains why RAIDT governs the concrete run rather than the model in abstraction, so that evidence can support reconstruction, comparison, challenge, and accountable organisational review.

Academic picture

Definition / background

In RAIDT, the run is the primary unit of governance because governance questions usually arise around one specific use of a generative AI system for one task, at one time, in one organisational context. A run is not merely the model invocation in isolation. It is the configured socio-technical event that includes the model or service used, the prompt or instruction structure, any retrieved or attached materials, tool calls, system settings, user role, workflow position, output, and relevant oversight conditions.

This matters because most governance failures do not emerge at the level of the model in abstraction. They emerge when a model is used in a concrete setting to draft, classify, advise, summarise, or recommend. The practical question in review is rarely "what can this model do in theory?" It is usually "what happened in this case, under these conditions, and can we justify it?" RAIDT therefore treats the run as the bounded proof-object around which governance evidence should be organised.

Conceptually, this distinguishes RAIDT from governance approaches that focus primarily on model cards, supplier claims, or enterprise AI principles. Those artefacts can still be useful, but they do not by themselves explain a contested organisational event. By centring the run, RAIDT links governance to what is reviewable in practice: a reconstructable use episode with evidence attached.

Within RAIDT, this item sits close to run-level evidence, evidence objects, reconstructability, replayability, audit trail, and minimum metadata. The run is the organising frame that makes the evidence pack coherent and makes the five-pillar score profile meaningful. Without an agreed unit of governance, evidence remains fragmented and scoring becomes too detached from the operational reality of use.

Why this concept matters

Treating the run as the governance unit solves a recurring problem in generative AI oversight: organisations often govern at the wrong level of abstraction. They may approve a model, publish a policy, or document a supplier assurance, yet still be unable to explain why a particular output was produced for a particular case. That gap is where operational risk, contestation, and audit difficulty appear.

The concept prevents confusion between model capability and situated use. A capable model may be used safely in one run and poorly in another because retrieval content, prompt framing, tool integration, time pressure, or human review differed. If governance ignores that variation, it cannot reliably diagnose failure, assign responsibility, or improve practice.

For organisations using GenAI, this matters because governance needs to support action after deployment. Reviewers need a unit they can inspect. Managers need a unit they can compare. Policy teams need a unit they can align with controls. RAIDT uses the run for all three purposes, which is why it shifts governance from principle statements towards evidence-backed operational judgement.

Key idea: RAIDT governs the run because risk, accountability, and evidence all materialise in the concrete use event rather than in the model alone.

What this item explains

Why the governed object in RAIDT is a bounded run rather than the model in abstraction.
How prompts, retrieved sources, tool calls, settings, timing, user role, and oversight become governance-relevant as parts of one run.
Why run-level evidence can be assembled into an evidence pack that supports reconstruction, review, and challenge.
How the five-pillar score profile becomes more defensible when it is anchored to a specific run.
Why comparison across runs is possible even when the underlying model remains the same.
How organisational learning improves when governance findings can be traced back to concrete use episodes.

Practical example / likely audience question

Audience question

Why is the run the best governance unit in RAIDT rather than the model, the user, or the whole workflow?

Answer

The concern behind this question is usually that the run may seem too narrow, while the model or workflow may seem more important. RAIDT's answer is that the run is the smallest unit that still contains the factors needed for meaningful governance judgement. A model is too broad because the same model can behave very differently across prompts, retrieved evidence, tools, and operational settings. A whole workflow is often too broad because it mixes multiple steps, actors, and controls, making it harder to identify where responsibility and evidence actually attach.

Consider a finance team using the same foundation model in two separate runs. In one run, the model drafts an internal summary using approved data and a standard prompt. In another, a user asks for a risk analysis, adds an external spreadsheet, and activates a tool that generates numerical commentary. The model may be identical, but the governance situation is not. The second run requires different evidence, different review expectations, and potentially a different RAIDT score profile.

RAIDT handles this better than generic AI governance because it does not stop at supplier-level assurance or policy language. It asks what happened in this configured use, what evidence exists for it, and whether that evidence is sufficient for reconstruction, comparison, challenge, and organisational action. That makes governance more precise, more reviewable, and more usable when questions are raised after deployment.

Practical example in RAIDT terms

A hospital uses a generative AI assistant to draft discharge summaries from clinician notes, recent observations, and medication records. The GenAI use case appears straightforward, but the run-level issue is that each draft depends on a specific prompt template, a particular patient record snapshot, current retrieval results, clinician edits, timing, and the version of the system in use at that moment.

If a discharge summary omits a medication warning, the governance question is not only whether the model is generally safe. The question is what happened in that run. RAIDT would therefore expect evidence such as the prompt template used, the retrieved record excerpts, the timestamp, the user role, the model and tool configuration, the output draft, and the human approval step. That evidence would support review of whether the omission arose from retrieval failure, prompt weakness, ambiguous source notes, or inadequate oversight.

The RAIDT pillars most affected would be Responsibility, Auditability, Dependability, and Traceability, with Interpretability also relevant if clinicians need to understand why the warning was missed. By treating the discharge-summary instance as the unit of governance, the organisation becomes more governance-ready: it can reconstruct the case, justify decisions, identify control gaps, and improve subsequent runs rather than relying on generic assurances about the model family.

Detailed link to RAIDT

Run as unit of governance links to RAIDT in four ways.

First, it expresses RAIDT's core idea that generative AI governance should centre on evidence-bearing use events rather than high-level assertions about models or principles.
Second, it anchors RAIDT directly to the run, which is the level at which evidence can be collected, reviewed, challenged, and compared.
Third, it provides the organising object for both the run-level evidence pack and the five-pillar score profile, ensuring that both outputs refer to the same bounded case.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning because governance findings can be attached to a reconstructable event rather than to a vague system description.

Run as unit of governance ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

This chain is central to RAIDT because each step depends on the previous one. If the run is not treated as the unit of governance, evidence collection becomes inconsistent, evidence packs become partial, score profiles lose interpretive grounding, and governance readiness is weakened.

Link to the five RAIDT pillars

Responsibility

Treating the run as the governance unit clarifies where responsibility attaches in practice. It allows reviewers to ask who initiated the run, under what authority, using which approved materials and controls.