C0.03 - Run-level_evidence

C0.03 ? Run-level evidence

flowchart LR
    A[Traditional governance artefacts
model cards, policy, supplier assurances] --> B[RAIDT
run-level evidence framework]
    H[Practical run fields
prompt, inputs, settings, outputs, review notes] --> C[[Run-level evidence
reconstructable proof of one run]]
    B --> C
    C --> D[Evidence pack]
    C --> E[RAIDT score profile]
    D --> F[Reviewer reconstruction and contestability]
    E --> G[Governance readiness and organisational learning]

← Star C0 - RAIDT Core, Definition, Values, Claims and Innovation

Star context: Defines the project identity of RAIDT by showing that responsible governance of GenAI in organisational work depends on evidence from the level of the individual run, not only from model descriptions or high-level policy claims.

Definition / background

Run-level evidence is the recorded proof needed to reconstruct, review, and evaluate one specific use of a generative AI system. In RAIDT, this means evidence tied to a single run: one configured use of GenAI for a defined task, at a particular time, in a particular organisational context. The concept matters because many governance artefacts describe systems in general terms, whereas governance failures, disputes, and improvements usually arise from what happened in a specific instance of use.

Conceptually, run-level evidence sits between raw technical logging and broad governance documentation. It is more specific than a policy, model card, or risk register, because it captures the actual circumstances of use rather than only the intended design or declared controls. At the same time, it is more governance-relevant than a narrow system log because it can include contextual, procedural, and human-review information needed for organisational accountability.

Within RAIDT, run-level evidence is foundational. It provides the material from which a run-level evidence pack can be assembled and from which a five-pillar RAIDT score profile can be justified. Without this evidence layer, scoring risks becoming impressionistic, governance claims remain difficult to test, and post hoc review becomes weak or incomplete.

This item therefore belongs in RAIDT Core because it defines the evidential basis of the whole framework. RAIDT does not primarily ask whether an organisation has a principle, nor whether a model provider has published a general description. It asks whether the organisation can show, for one real run, what happened, under what conditions, with what trace, and with what basis for review.

Why this concept matters

Run-level evidence solves a central governance problem in generative AI: organisations often know that they should govern AI use, but they lack a reliable unit of proof for examining an actual use event. When a questionable output appears, when a decision must be justified, or when a reviewer asks how a result was produced, abstract policy language is insufficient. A governance framework needs evidence that is granular enough to support reconstruction.

The concept also prevents a common confusion between system-level assurance and use-level accountability. A model may be documented and approved at a high level, yet still be used badly, inconsistently, or inappropriately in a particular run. Run-level evidence makes it possible to distinguish between what the system is said to be capable of and what was actually done with it in a real organisational setting.

If run-level evidence is missing, several risks follow: weak auditability, superficial assurance, poor contestability, limited learning from incidents, and difficulty defending practice to supervisors, regulators, clients, or internal governance bodies. RAIDT uses this concept to move governance from broad principles to operational scrutiny.

Key idea: Run-level evidence matters because responsible GenAI governance depends on being able to inspect one real use event rather than relying only on general descriptions or policy assertions.

What this item captures

The specific task, purpose, and context of one GenAI run.
The configured conditions of use, including relevant system settings, prompts, inputs, and constraints.
The output or outputs generated during that run.
Human actions around the run, such as review, editing, approval, escalation, or override.
The trace needed for later reconstruction, explanation, challenge, or audit.
The evidential basis for scoring the run across Responsibility, Auditability, Interpretability, Dependability, and Traceability.
The link between an individual use event and wider organisational governance readiness.

Practical example / likely audience question

Audience question

Why is run-level evidence needed if an organisation already has model documentation, AI policy, and standard operating procedures?

Answer

The concern behind this question is understandable: if governance artefacts already exist, why add another layer? The direct answer is that model documentation and policy documents usually describe the system or the organisation in general, whereas run-level evidence shows what occurred in one actual use event. Those are not interchangeable forms of assurance.

For example, a hospital may have an approved policy for using a GenAI drafting assistant and may rely on a vendor's technical documentation. Yet if a discharge-summary draft contains a misleading statement, the key governance question is not only whether the model was approved in principle. The question is what prompt was used, what patient information was supplied, what output was produced, who reviewed it, what changes were made, and whether the run met organisational safeguards. That requires run-level evidence.

RAIDT handles this better than a generic AI governance approach because it treats the run as the unit of governance. Instead of stopping at principles or provider claims, it asks whether the organisation can reconstruct and assess the exact event under review. This makes governance more operational, more reviewable, and more useful for learning and accountability.

Practical example in RAIDT terms

Consider a healthcare setting in which a clinician uses a GenAI system to draft a patient follow-up letter after an outpatient consultation. The GenAI use case is legitimate and time-saving, but the run-level issue is whether the generated letter accurately reflects the consultation, protects sensitive information, and was appropriately reviewed before being sent.

The evidence needed includes the task definition, the prompt template, any source notes used as input, the model or tool version, the generated draft, the clinician's edits, the final approved version, and a record of whether the output was checked against the patient record. Responsibility is affected because the organisation must show who was accountable for checking the draft. Auditability is affected because a reviewer must be able to reconstruct the run. Interpretability is affected because reviewers need to understand how the draft emerged from the prompt and source material. Dependability is affected because output quality and process reliability matter in patient communication. Traceability is affected because the run must be linked to time, actor, and artefacts.

In governance-readiness terms, run-level evidence improves the organisation's position because it allows a disputed output to be examined as a concrete case rather than as an anecdote. It supports internal assurance, supervisory review, incident analysis, and practical refinement of workflow controls.

Detailed link to RAIDT

Run-level evidence links to RAIDT in four ways.

First, it gives operational form to the RAIDT core idea that governance should be based on evidence from actual organisational use, not only on high-level claims.

Second, it is inseparable from the concept of the run, because the run is the unit of governance and run-level evidence is the proof attached to that unit.

Third, it provides the raw material for the run-level evidence pack and the justification for the RAIDT score profile across the five pillars.

Fourth, it supports reviewability, contestability, audit readiness, and organisational learning by making individual GenAI events reconstructable and examinable.

Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

Link to the five RAIDT pillars

Responsibility

Run-level evidence supports Responsibility by showing who initiated, reviewed, approved, or relied upon a GenAI run, and under what organisational purpose or authority.