S3.01 - Run_as_unit_of_governance

S3.01 ? Run as unit of governance

flowchart LR
    A[Background problem:
model-level assurance is too abstract for case review] --> B[RAIDT:
run-level evidence framework] P[Policy principles and supplier claims] --> B B --> C[[Run as unit of governance]] C --> D[Run-level evidence pack] C --> E[RAIDT five-pillar score profile] C --> F[Reconstructable organisational case] D --> G[Reviewer reconstruction and challenge] E --> H[Governance readiness] F --> H I[Prompt, retrieval, tools,
user role, timestamp, output] --> C J[Healthcare, finance, public services] --> C

? Star S3 - Run-Level Evidence Logic

Star context: Explains why RAIDT governs the concrete run rather than the model in abstraction, so that evidence can support reconstruction, comparison, challenge, and accountable organisational review.


Academic picture
Definition / background

In RAIDT, the run is the primary unit of governance because governance questions usually arise around one specific use of a generative AI system for one task, at one time, in one organisational context. A run is not merely the model invocation in isolation. It is the configured socio-technical event that includes the model or service used, the prompt or instruction structure, any retrieved or attached materials, tool calls, system settings, user role, workflow position, output, and relevant oversight conditions.

This matters because most governance failures do not emerge at the level of the model in abstraction. They emerge when a model is used in a concrete setting to draft, classify, advise, summarise, or recommend. The practical question in review is rarely "what can this model do in theory?" It is usually "what happened in this case, under these conditions, and can we justify it?" RAIDT therefore treats the run as the bounded proof-object around which governance evidence should be organised.

Conceptually, this distinguishes RAIDT from governance approaches that focus primarily on model cards, supplier claims, or enterprise AI principles. Those artefacts can still be useful, but they do not by themselves explain a contested organisational event. By centring the run, RAIDT links governance to what is reviewable in practice: a reconstructable use episode with evidence attached.

Within RAIDT, this item sits close to run-level evidence, evidence objects, reconstructability, replayability, audit trail, and minimum metadata. The run is the organising frame that makes the evidence pack coherent and makes the five-pillar score profile meaningful. Without an agreed unit of governance, evidence remains fragmented and scoring becomes too detached from the operational reality of use.

Why this concept matters

Treating the run as the governance unit solves a recurring problem in generative AI oversight: organisations often govern at the wrong level of abstraction. They may approve a model, publish a policy, or document a supplier assurance, yet still be unable to explain why a particular output was produced for a particular case. That gap is where operational risk, contestation, and audit difficulty appear.

The concept prevents confusion between model capability and situated use. A capable model may be used safely in one run and poorly in another because retrieval content, prompt framing, tool integration, time pressure, or human review differed. If governance ignores that variation, it cannot reliably diagnose failure, assign responsibility, or improve practice.

For organisations using GenAI, this matters because governance needs to support action after deployment. Reviewers need a unit they can inspect. Managers need a unit they can compare. Policy teams need a unit they can align with controls. RAIDT uses the run for all three purposes, which is why it shifts governance from principle statements towards evidence-backed operational judgement.

Key idea: RAIDT governs the run because risk, accountability, and evidence all materialise in the concrete use event rather than in the model alone.

What this item explains
Practical example / likely audience question

Audience question

Why is the run the best governance unit in RAIDT rather than the model, the user, or the whole workflow?

Answer

The concern behind this question is usually that the run may seem too narrow, while the model or workflow may seem more important. RAIDT's answer is that the run is the smallest unit that still contains the factors needed for meaningful governance judgement. A model is too broad because the same model can behave very differently across prompts, retrieved evidence, tools, and operational settings. A whole workflow is often too broad because it mixes multiple steps, actors, and controls, making it harder to identify where responsibility and evidence actually attach.

Consider a finance team using the same foundation model in two separate runs. In one run, the model drafts an internal summary using approved data and a standard prompt. In another, a user asks for a risk analysis, adds an external spreadsheet, and activates a tool that generates numerical commentary. The model may be identical, but the governance situation is not. The second run requires different evidence, different review expectations, and potentially a different RAIDT score profile.

RAIDT handles this better than generic AI governance because it does not stop at supplier-level assurance or policy language. It asks what happened in this configured use, what evidence exists for it, and whether that evidence is sufficient for reconstruction, comparison, challenge, and organisational action. That makes governance more precise, more reviewable, and more usable when questions are raised after deployment.

Practical example in RAIDT terms

A hospital uses a generative AI assistant to draft discharge summaries from clinician notes, recent observations, and medication records. The GenAI use case appears straightforward, but the run-level issue is that each draft depends on a specific prompt template, a particular patient record snapshot, current retrieval results, clinician edits, timing, and the version of the system in use at that moment.

If a discharge summary omits a medication warning, the governance question is not only whether the model is generally safe. The question is what happened in that run. RAIDT would therefore expect evidence such as the prompt template used, the retrieved record excerpts, the timestamp, the user role, the model and tool configuration, the output draft, and the human approval step. That evidence would support review of whether the omission arose from retrieval failure, prompt weakness, ambiguous source notes, or inadequate oversight.

The RAIDT pillars most affected would be Responsibility, Auditability, Dependability, and Traceability, with Interpretability also relevant if clinicians need to understand why the warning was missed. By treating the discharge-summary instance as the unit of governance, the organisation becomes more governance-ready: it can reconstruct the case, justify decisions, identify control gaps, and improve subsequent runs rather than relying on generic assurances about the model family.

Detailed link to RAIDT

Run as unit of governance links to RAIDT in four ways.

First, it expresses RAIDT's core idea that generative AI governance should centre on evidence-bearing use events rather than high-level assertions about models or principles.
Second, it anchors RAIDT directly to the run, which is the level at which evidence can be collected, reviewed, challenged, and compared.
Third, it provides the organising object for both the run-level evidence pack and the five-pillar score profile, ensuring that both outputs refer to the same bounded case.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning because governance findings can be attached to a reconstructable event rather than to a vague system description.

Run as unit of governance ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

This chain is central to RAIDT because each step depends on the previous one. If the run is not treated as the unit of governance, evidence collection becomes inconsistent, evidence packs become partial, score profiles lose interpretive grounding, and governance readiness is weakened.

Link to the five RAIDT pillars

Responsibility

Treating the run as the governance unit clarifies where responsibility attaches in practice. It allows reviewers to ask who initiated the run, under what authority, using which approved materials and controls.

Example evidence / implication:

Auditability

Auditability improves because the run defines a bounded case that can be inspected after the fact. Without that boundary, audit trails become diffuse and difficult to interpret.

Example evidence / implication:

Interpretability

Interpretability benefits because explanations can be linked to a specific output and its surrounding conditions. This is more useful than attempting to explain the entire model in the abstract for every governance question.

Example evidence / implication:

Dependability

Dependability becomes more measurable when repeated or comparable runs can be assessed under known conditions. A run-level focus helps distinguish stable performance from case-specific failure.

Example evidence / implication:

Traceability

Traceability is especially strengthened by this concept because the run provides the traceable object linking inputs, actions, outputs, and later review.

Example evidence / implication:

This item affects all five pillars, but it is especially foundational for Auditability and Traceability because both depend on having a clearly defined unit to inspect.

Why this item is more than a generic concept

In general AI governance, the idea that context matters is often acknowledged, but it frequently remains rhetorical. Governance may still be organised around policies, principles, supplier documentation, or model-level assessments. Those are useful, but they do not always identify the precise object to be reviewed when something goes wrong.

In RAIDT, run as unit of governance is more operational because it defines the exact evidential object around which governance work is performed. It determines what should be logged, what should be compared, what should be scored, and what should be contested. The concept therefore moves from a general statement about situated use to a practical method for building evidence packs, score profiles, and review procedures.

Common misunderstanding

Misunderstanding

If the organisation has already assessed or approved the model, then governance at run level is unnecessary duplication.

Correction

Model-level assessment and run-level governance answer different questions. Model assessment may tell you something about capabilities, broad risks, supplier assurances, or baseline controls. It does not tell you enough about one specific use event in which a particular prompt, retrieved document set, tool chain, and human workflow produced an output with consequences.

For example, two teams may use the same approved model. One uses a locked template and approved sources; the other uses ad hoc prompting and external documents. Treating both uses as governance-equivalent would hide meaningful risk differences. RAIDT therefore treats model-level assessment as relevant background, but the run remains the unit that supports operational accountability.

Boundary and limitation

This item does not claim that governing the run is sufficient on its own. It does not replace model evaluation, supplier assurance, policy design, or broader workflow governance. Some risks still sit above the run, such as procurement choices, training-data concerns, enterprise access controls, and strategic deployment decisions.

It also does not guarantee that every important feature of a run can be perfectly captured. In practice, evidence may be incomplete, tool integrations may be opaque, or retrospective reconstruction may be limited by poor logging. High-volume environments may also generate too many runs for exhaustive manual review.

RAIDT handles these limitations by treating the run as the core operational unit while still allowing model-level and system-level evidence to sit around it. The framework can combine manual review, selective sampling, structured metadata, and automation so that run-level governance remains practical without pretending to be the only layer that matters.

Implementation levels

Manual implementation

A researcher or small team can apply this concept by defining the run explicitly and recording the minimum evidence for each important use. That may include the prompt, task, input sources, user role, output, date and time, and any review decision. Even a spreadsheet or structured note template can establish the run as the governance object.

Semi-automated implementation

Semi-automated implementation adds templates, metadata capture, and structured review forms. Prompt wrappers, evidence forms, and run IDs can make it easier to collect comparable records across teams. This level supports routine evidence-pack assembly and more consistent RAIDT scoring.

Fully automated implementation

At scale, the concept can be implemented through orchestration layers, platform logging, governance dashboards, and policy-aware middleware. The system can automatically assign a run identifier, capture prompts and outputs, record tool calls and retrieval traces, store timestamps and versions, and route selected runs for review. In that form, the run becomes the atomic object inside a broader governance pipeline.

Practical use in the RAIDT project

This item is foundational for Paper 08 Foundations because it defines the primary object that RAIDT governs. It gives the framework conceptual precision by explaining why the run, rather than the model alone, is the correct level for evidence, review, and challenge.

It is equally important for Paper 09 Empirical Validation because empirical comparison requires stable units of analysis. If cases are scored or reviewed at inconsistent levels, findings become difficult to compare. The run-level framing supports repeatable evidence collection, clearer coding, and more defensible interpretation of results.

For Paper 10 Policy Pathways, the item helps translate abstract policy principles into operational governance requirements. It also supports sector playbooks, evidence-pack design, scoring rubrics, influence methods, and governance interventions by giving supervisors, reviewers, and practitioners a concrete answer to a basic question: what exactly is being governed?

In viva defence and journal positioning, this note helps articulate RAIDT's distinct contribution. The originality is not just that evidence matters, but that evidence is organised around the configured use event where accountability questions actually arise.

Key audience questions to prepare for

Q1. Why is the run a better governance unit than the model alone?

Because the run contains the situated configuration in which organisational consequences occur. The model alone does not capture the prompt, retrieval context, tool use, user role, timing, or oversight conditions that shape a specific outcome.

Q2. Does this mean model-level governance is no longer needed?

No. Model-level governance remains important, but it is insufficient for explaining or contesting concrete use events. RAIDT adds run-level governance so that operational accountability is possible.

Q3. Why not govern the whole workflow instead of individual runs?

Workflow governance is useful, but it can be too coarse for diagnosis. The run provides a bounded evidential object within the workflow, allowing more precise reconstruction, scoring, and remediation.

Q4. What makes a run sufficiently defined for governance purposes?

A run is sufficiently defined when the task, context, configuration, inputs, outputs, timing, and relevant oversight conditions can be identified well enough to support reconstruction and judgement.

Q5. How does this improve organisational readiness rather than just documentation?

It improves readiness by creating inspectable cases. Those cases support audits, internal reviews, contestation handling, training improvement, control redesign, and policy refinement based on evidence rather than assumption.

Suggested citation concepts to support this item
Short explanation for presentation

RAIDT treats the run as the unit of governance because that is where generative AI risk becomes concrete. A run is one configured use of a GenAI system for a specific task, at a specific time, in a specific organisational context. It includes not just the model, but also the prompt, retrieved materials, tools, settings, user role, output, and oversight conditions. This matters because most disputes and audits are about a particular event, not about the model in abstraction. By governing the run, RAIDT can assemble a defensible evidence pack, generate a meaningful five-pillar score profile, and support reconstruction, contestation, and organisational learning. The concept therefore turns governance from principle-level assurance into operational, reviewable judgement tied to actual use.

One-line takeaway

Run as unit of governance is RAIDT's core framing that treats each configured GenAI use event as the evidential object on which review, scoring, and accountability depend.

Related items in run-level evidence logic (9)
Anchored questions (7)
Powered by Forestry.md