S5.02 - Auditability

S5.02 ? Auditability

flowchart LR
    A[Background problem:
generic logging without reconstruction] --> B[RAIDT
run-level evidence framework]
    B --> C[[Auditability
reconstructable and reviewable run]]
    C --> D[Evidence pack
prompt settings output review record]
    C --> E[RAIDT score profile
Auditability pillar score]
    D --> F[Reviewer reconstruction
independent inspection]
    E --> G[Governance readiness
contestability and audit readiness]
    H[Healthcare finance public services] --> C
    I[Run IDs retrieval snapshots output hashes] --> C

? Star S5 - RAIDT Pillars and Scoring

Star context: Positions auditability as one of the five RAIDT governance pillars and explains how scoring turns reviewability into a measurable property of a specific GenAI run rather than a general organisational claim.

Academic picture

Definition / background

Auditability is the RAIDT pillar concerned with whether a GenAI run can be reconstructed and independently reviewed using retained evidence. In ordinary governance language, the term often refers loosely to logging, record keeping, or the existence of an audit trail. In RAIDT, the meaning is more precise. A run is audit-ready only when the evidence is sufficiently complete, usable, and retained to let another party understand how the run was configured, what information shaped it, what it produced, and how the organisation handled the result.

This matters because RAIDT treats the run, not the model in the abstract, as the unit of governance. A model may be well documented at supplier level and still be poorly governable in practice if a particular organisational use cannot be reconstructed afterwards. Auditability therefore sits at the point where technical traces, procedural records, and review decisions come together. It is closely related to traceability, but the two are not identical. Traceability is about being able to follow links across components, actors, and stages; auditability is about whether those links and records are sufficient for independent review.

Within RAIDT, auditability supports both practical outputs of the framework. It strengthens the run-level evidence pack by specifying what evidence must be retained, and it strengthens the five-pillar score profile by making one dimension of governance readiness explicit and comparable across runs. In that sense, auditability is both a governance concept and a scoring dimension: it defines what evidence quality should look like and enables structured judgement about whether that standard has been met.

Why this concept matters

Auditability solves a common governance failure in organisational GenAI use: people can describe what they think a system does, but they cannot reliably reconstruct what happened in a particular run. Without that reconstruction, post hoc review becomes weak, disputes become harder to resolve, quality failures become difficult to diagnose, and improvement depends on memory or anecdote rather than evidence.

The concept also avoids a frequent confusion between "we have logs" and "we can audit the run". Generic logs may capture timestamps or user activity, but still omit the prompt, model version, retrieval context, temperature setting, output variant, reviewer decision, or retention status. When those elements are missing, an organisation may appear compliant while remaining unable to explain or defend a consequential AI-supported action.

For organisations using GenAI in real work, auditability is what makes oversight credible. It allows managers to review exceptions, lets assurance teams test whether governance controls actually operated, supports appeals and incident investigation, and creates a basis for learning across repeated runs. RAIDT uses auditability to move governance away from broad principles and towards operational evidence that can be inspected, challenged, and improved.

Key idea: Auditability matters because governance cannot be reviewed or defended if a specific GenAI run cannot be reconstructed from retained evidence.

What this item enables

Reconstruction of a specific GenAI run after the event, including how it was configured and what it produced.
Independent review by supervisors, auditors, compliance teams, or external stakeholders.
Distinction between superficial logging and genuinely audit-ready evidence.
Inclusion of run IDs, prompts, versions, settings, retrieval snapshots, output hashes, review records, and retention metadata in the evidence pack.
Scoring of governance readiness on the basis of evidential completeness rather than assertion.
Organisational learning from incidents, appeals, exceptions, and repeated runs.

Practical example / likely audience question

Audience question

If we already keep system logs, why does RAIDT treat auditability as a separate pillar rather than assuming the logs are enough?

Answer

The concern behind that question is the common assumption that any technical record counts as governance evidence. RAIDT rejects that assumption because most logging is designed for operations, debugging, or security, not for independent review of a specific socio-technical run. A timestamp and user ID do not explain what prompt was used, which model version generated the output, whether retrieved material shaped the answer, what settings were active, what the output looked like at the time, or who reviewed and approved its use.

The direct answer is that auditability requires evidence that is reconstructive, not merely incidental. For example, if a compliance analyst uses a GenAI assistant to draft a suspicious activity report summary, an auditor later needs more than proof that the tool was opened at 10:43. They need the run identifier, the prompt or template version, the model and provider version, relevant retrieval or source snapshots, the generated output, any edits made by the analyst, the review outcome, and the applicable retention rule.

RAIDT handles this better than a generic AI governance approach because it defines the run as the unit that must be auditable. Instead of assuming that enterprise logging will somehow answer all later questions, RAIDT specifies the evidence pack needed for a particular use at a particular time and then scores whether that evidence is good enough for review. That makes the governance claim testable rather than rhetorical.

Practical example in RAIDT terms

Consider a hospital department using a GenAI assistant to draft discharge summaries from structured notes and local policy guidance. In one run, the generated draft omits a medication warning that should have appeared because the patient has a relevant allergy history. The immediate governance question is not simply whether the model is generally safe; it is whether that single run can be reconstructed and reviewed.

For RAIDT purposes, the evidence needed would include the run ID, clinician or staff role, prompt template version, model version, inference settings, source-note snapshot identifiers, any retrieved guideline excerpts, the draft output, the human edits made before sign-off, reviewer comments, and the retention record showing whether the evidence was preserved appropriately. Auditability is central here, but the case also affects Responsibility because someone must own the review decision, Interpretability because reviewers need to understand what shaped the draft, Dependability because missing warnings raise reliability concerns, and Traceability because the pathway from source data to output must be followed.

In governance-readiness terms, strong auditability improves the organisation's ability to investigate the omission, explain what happened to clinical governance staff, refine the prompt or workflow, and demonstrate to supervisors that the system is being governed at the level where harm can actually occur: the run.

Detailed link to RAIDT

Auditability links to RAIDT in four ways.

First, it operationalises RAIDT's core idea that governance should attach to a specific configured use of GenAI rather than to broad claims about a tool or provider.
Second, it focuses attention on the run itself by asking whether that run can be reconstructed from usable evidence after the event.
Third, it shapes the content and quality threshold of the evidence pack and provides one pillar in the RAIDT score profile.
Fourth, it supports reviewability, contestability, audit readiness, and organisational learning because evidence can be inspected, challenged, and compared across runs.

Auditability ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

In practical terms, this means that auditability is one of the mechanisms by which RAIDT converts governance from principle statements into reviewable records and actionable oversight.

Link to the five RAIDT pillars

Responsibility

Responsibility asks who is answerable for the run and who must act when a problem is identified. Auditability strengthens responsibility because accountability is weak if nobody can inspect what happened.