S5.02 - Auditability

S5.02 ? Auditability

flowchart LR
    A[Background problem:
generic logging without reconstruction] --> B[RAIDT
run-level evidence framework] B --> C[[Auditability
reconstructable and reviewable run]] C --> D[Evidence pack
prompt settings output review record] C --> E[RAIDT score profile
Auditability pillar score] D --> F[Reviewer reconstruction
independent inspection] E --> G[Governance readiness
contestability and audit readiness] H[Healthcare finance public services] --> C I[Run IDs retrieval snapshots output hashes] --> C

? Star S5 - RAIDT Pillars and Scoring

Star context: Positions auditability as one of the five RAIDT governance pillars and explains how scoring turns reviewability into a measurable property of a specific GenAI run rather than a general organisational claim.


Academic picture
Definition / background

Auditability is the RAIDT pillar concerned with whether a GenAI run can be reconstructed and independently reviewed using retained evidence. In ordinary governance language, the term often refers loosely to logging, record keeping, or the existence of an audit trail. In RAIDT, the meaning is more precise. A run is audit-ready only when the evidence is sufficiently complete, usable, and retained to let another party understand how the run was configured, what information shaped it, what it produced, and how the organisation handled the result.

This matters because RAIDT treats the run, not the model in the abstract, as the unit of governance. A model may be well documented at supplier level and still be poorly governable in practice if a particular organisational use cannot be reconstructed afterwards. Auditability therefore sits at the point where technical traces, procedural records, and review decisions come together. It is closely related to traceability, but the two are not identical. Traceability is about being able to follow links across components, actors, and stages; auditability is about whether those links and records are sufficient for independent review.

Within RAIDT, auditability supports both practical outputs of the framework. It strengthens the run-level evidence pack by specifying what evidence must be retained, and it strengthens the five-pillar score profile by making one dimension of governance readiness explicit and comparable across runs. In that sense, auditability is both a governance concept and a scoring dimension: it defines what evidence quality should look like and enables structured judgement about whether that standard has been met.

Why this concept matters

Auditability solves a common governance failure in organisational GenAI use: people can describe what they think a system does, but they cannot reliably reconstruct what happened in a particular run. Without that reconstruction, post hoc review becomes weak, disputes become harder to resolve, quality failures become difficult to diagnose, and improvement depends on memory or anecdote rather than evidence.

The concept also avoids a frequent confusion between "we have logs" and "we can audit the run". Generic logs may capture timestamps or user activity, but still omit the prompt, model version, retrieval context, temperature setting, output variant, reviewer decision, or retention status. When those elements are missing, an organisation may appear compliant while remaining unable to explain or defend a consequential AI-supported action.

For organisations using GenAI in real work, auditability is what makes oversight credible. It allows managers to review exceptions, lets assurance teams test whether governance controls actually operated, supports appeals and incident investigation, and creates a basis for learning across repeated runs. RAIDT uses auditability to move governance away from broad principles and towards operational evidence that can be inspected, challenged, and improved.

Key idea: Auditability matters because governance cannot be reviewed or defended if a specific GenAI run cannot be reconstructed from retained evidence.

What this item enables
Practical example / likely audience question

Audience question

If we already keep system logs, why does RAIDT treat auditability as a separate pillar rather than assuming the logs are enough?

Answer

The concern behind that question is the common assumption that any technical record counts as governance evidence. RAIDT rejects that assumption because most logging is designed for operations, debugging, or security, not for independent review of a specific socio-technical run. A timestamp and user ID do not explain what prompt was used, which model version generated the output, whether retrieved material shaped the answer, what settings were active, what the output looked like at the time, or who reviewed and approved its use.

The direct answer is that auditability requires evidence that is reconstructive, not merely incidental. For example, if a compliance analyst uses a GenAI assistant to draft a suspicious activity report summary, an auditor later needs more than proof that the tool was opened at 10:43. They need the run identifier, the prompt or template version, the model and provider version, relevant retrieval or source snapshots, the generated output, any edits made by the analyst, the review outcome, and the applicable retention rule.

RAIDT handles this better than a generic AI governance approach because it defines the run as the unit that must be auditable. Instead of assuming that enterprise logging will somehow answer all later questions, RAIDT specifies the evidence pack needed for a particular use at a particular time and then scores whether that evidence is good enough for review. That makes the governance claim testable rather than rhetorical.

Practical example in RAIDT terms

Consider a hospital department using a GenAI assistant to draft discharge summaries from structured notes and local policy guidance. In one run, the generated draft omits a medication warning that should have appeared because the patient has a relevant allergy history. The immediate governance question is not simply whether the model is generally safe; it is whether that single run can be reconstructed and reviewed.

For RAIDT purposes, the evidence needed would include the run ID, clinician or staff role, prompt template version, model version, inference settings, source-note snapshot identifiers, any retrieved guideline excerpts, the draft output, the human edits made before sign-off, reviewer comments, and the retention record showing whether the evidence was preserved appropriately. Auditability is central here, but the case also affects Responsibility because someone must own the review decision, Interpretability because reviewers need to understand what shaped the draft, Dependability because missing warnings raise reliability concerns, and Traceability because the pathway from source data to output must be followed.

In governance-readiness terms, strong auditability improves the organisation's ability to investigate the omission, explain what happened to clinical governance staff, refine the prompt or workflow, and demonstrate to supervisors that the system is being governed at the level where harm can actually occur: the run.

Detailed link to RAIDT

Auditability links to RAIDT in four ways.

First, it operationalises RAIDT's core idea that governance should attach to a specific configured use of GenAI rather than to broad claims about a tool or provider.
Second, it focuses attention on the run itself by asking whether that run can be reconstructed from usable evidence after the event.
Third, it shapes the content and quality threshold of the evidence pack and provides one pillar in the RAIDT score profile.
Fourth, it supports reviewability, contestability, audit readiness, and organisational learning because evidence can be inspected, challenged, and compared across runs.

Auditability ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

In practical terms, this means that auditability is one of the mechanisms by which RAIDT converts governance from principle statements into reviewable records and actionable oversight.

Link to the five RAIDT pillars

Responsibility

Responsibility asks who is answerable for the run and who must act when a problem is identified. Auditability strengthens responsibility because accountability is weak if nobody can inspect what happened.

Example evidence / implication:

Auditability

Auditability is the direct focus here: whether the run can be reconstructed for independent review using retained, usable evidence rather than fragmentary system traces.

Example evidence / implication:

Interpretability

Interpretability concerns whether stakeholders can understand the basis of the output well enough to use or review it responsibly. Auditability supports interpretability by preserving the materials needed to explain what shaped the run.

Example evidence / implication:

Dependability

Dependability concerns whether the run performs consistently and robustly in the relevant setting. Auditability does not guarantee dependable performance, but it makes failures diagnosable and repeatability testable.

Example evidence / implication:

Traceability

Traceability follows the links between actors, components, data sources, and outputs. Auditability depends heavily on traceability, but it goes further by asking whether the resulting record is sufficient for review.

Example evidence / implication:

Auditability most strongly affects Auditability itself and Traceability, but its practical value depends on how well it connects with the other three pillars.

Why this item is more than a generic concept

In general AI governance, auditability may mean little more than keeping records that show a system was used. In RAIDT, it means that one specific run can be reconstructed in a form that supports independent review. That is a stricter and more operational standard.

The RAIDT meaning is more useful because it is tied to run-level evidence. Instead of asking whether an organisation claims to log activity, RAIDT asks whether the evidence pack for a run would let another person understand the context, inspect the inputs and outputs, review the decisions taken, and test whether the governance process actually worked.

Common misunderstanding

Misunderstanding

Auditability is just another word for logging.

Correction

Logging is only one possible ingredient of auditability. A system may log API calls, timestamps, and user actions while still failing to preserve the evidence needed for meaningful review. For instance, if a public-sector caseworker uses GenAI to draft a citizen response and the platform keeps only an access log, the organisation cannot later determine which prompt template was used, what policy excerpts were retrieved, what draft was generated, or how the human reviewer changed it. RAIDT therefore treats auditability as evidence adequacy for reconstruction, not as the bare existence of logs.

Boundary and limitation

Auditability does not prove that a run was correct, fair, lawful, or wise. A run can be perfectly auditable and still produce a poor or harmful result. What auditability provides is the evidential basis for reviewing that result and deciding what happened, who was involved, and what should change.

It also does not remove practical constraints. Some evidence may be sensitive, retention periods may be limited, provider platforms may not expose all relevant metadata, and organisations may not be able to preserve full inputs in raw form. RAIDT handles these limitations by making evidence requirements explicit, documenting gaps, and scoring the run on the basis of what review is realistically possible rather than pretending complete reconstruction is always available.

Implementation levels

Manual implementation

A researcher or small team can implement auditability manually by assigning a run ID and storing the prompt, date, model name, settings, output, reviewer notes, and retention decision in a structured template. Even a disciplined spreadsheet-plus-folder approach can create a basic audit trail if it is consistent and reviewed.

Semi-automated implementation

Semi-automated implementation adds structured metadata capture, templated evidence forms, version-controlled prompts, and lightweight review workflows. A wrapper, form, or notebook can automatically capture parts of the run while humans still add contextual notes, approval records, and exception handling.

Fully automated implementation

At scale, auditability is implemented through a platform layer that records run metadata automatically, stores hashes and snapshots, links prompts to version histories, captures retrieval context, routes outputs for review, applies retention rules, and surfaces the resulting evidence in dashboards or governance pipelines. In this form, auditability becomes a continuous organisational capability rather than an ad hoc documentation exercise.

Practical use in the RAIDT project

Within the RAIDT project, auditability is important across theory, empirical work, and policy translation. In Paper 08 Foundations, it helps define why run-level governance is needed and why claims about responsible AI remain weak without reconstructable evidence. In Paper 09 Empirical Validation, it provides a dimension that can be operationalised, observed, and scored across real runs. In Paper 10 Policy Pathways, it translates into a concrete governance requirement that organisations and regulators can understand: if a run cannot be reconstructed, oversight is structurally weakened.

The concept also supports the evidence pack design, the scoring rubric, and any sector playbooks because it clarifies what evidence has to be retained in healthcare, finance, education, public services, or enterprise productivity settings. For supervision and viva defence, auditability is especially useful because it gives a clear answer to the question of what makes RAIDT more than another principle framework: RAIDT demands evidence that a run can be reviewed after the fact.

Key audience questions to prepare for

Q1. Is auditability mainly a technical logging issue?

No. Technical logging helps, but RAIDT treats auditability as a socio-technical property of the run. The evidence must support review of configuration, context, output, human handling, and retention, not only system events.

Q2. Why is auditability a separate pillar if traceability already exists?

Traceability follows links across the workflow. Auditability asks whether the resulting record is adequate for independent review. A process can be partly traceable without being genuinely auditable.

Q3. Can a run be auditable even if the model is a black box?

Yes, to a degree. RAIDT does not require full access to internal model weights. It requires enough run-level evidence to reconstruct use, inspect context, review outputs, and document limits. Black-box models can still be more or less auditable depending on what evidence is retained.

Q4. What is the minimum evidence for an audit-ready run?

The minimum depends on context, but typically includes a run identifier, prompt or task instruction, model and version, relevant settings, output record, contextual or retrieval references where applicable, reviewer action, and retention metadata.

Q5. What happens if auditability is weak but the output seems acceptable?

The immediate output may still look fine, but governance readiness is low. If a complaint, incident, or policy challenge arises later, the organisation may be unable to explain what happened or improve the process with confidence.

Suggested citation concepts to support this item
Short explanation for presentation

Auditability in RAIDT asks whether a specific GenAI run can be reconstructed and independently reviewed after it happens. That is more demanding than saying a system is logged. For RAIDT, the key question is whether the evidence pack contains enough usable material to show how the run was configured, what information shaped it, what output it produced, who reviewed it, and how long the record is retained. This matters because governance failures usually emerge in particular runs, not in abstract model descriptions. By making auditability one of the five pillars, RAIDT turns reviewability into something that can be evidenced and scored. That helps organisations move from broad responsible-AI claims towards concrete audit readiness, contestability, and learning from real uses.

One-line takeaway

Auditability is the RAIDT pillar that asks whether a specific GenAI run can be reconstructed and independently reviewed because governance only becomes credible when evidence is attached to the run.

Related items in RAIDT pillars and scoring
Mentioned in reference-paper summaries (5)

Paper summaries live in Port/93-References/pdf_summaries/. Each file listed below contains the key term at least once.

Anchored questions
Powered by Forestry.md