S2.05 - Reviewability

S2.05 ? Reviewability

flowchart LR
    A[Policy claims without inspectable evidence] --> B[RAIDT
Run-level evidence framework]
    A2[Weak logging and missing run context] --> B
    H[Healthcare, finance, public services, legal review, enterprise productivity] --> C
    H2[Prompt capture, metadata, approval records, review notes] --> C
    B --> C[[Reviewability
Later inspection of a specific run]]
    C --> D[Evidence pack]
    C --> E[RAIDT score profile]
    C --> F[Reviewer reconstruction]
    C --> G[Complaint handling and organisational learning]
    D --> I[Governance readiness]
    E --> I
    F --> I
    G --> I

? Star S2 - Governance Meaning and Problem Context

Star context: Clarifies governance as oversight, control, accountability, reviewability and continuous improvement rather than a vague ethics label. In RAIDT, reviewability makes governance inspectable at the level of the individual run rather than leaving oversight at the level of policy aspiration.

Academic picture

Definition / background

Reviewability means that a run can be examined later by a person who was not present when it occurred, using sufficient evidence to understand what was done, under what conditions, and with what consequences. In governance terms, it is the capacity for retrospective inspection. In RAIDT, this matters because run-level governance only becomes credible if a later reviewer can inspect a run without relying on memory, informal explanation, or unverified assurance.

Conceptually, reviewability sits close to auditability, traceability, accountability, and reconstructability, but it is not identical to any of them. Traceability helps link artefacts across a run. Reconstructability helps rebuild the sequence and context of events. Auditability helps support formal assurance processes. Accountability assigns responsibility for what happened. Reviewability is the practical capability that lets those governance functions be exercised by an independent or later-facing reviewer.

This concept belongs inside RAIDT because RAIDT is explicitly designed to move governance from principles and assertions towards inspectable evidence. A run-level evidence pack provides the materials that make review possible, while the RAIDT score profile summarises how well the run met the framework's expectations across Responsibility, Auditability, Interpretability, Dependability, and Traceability. Reviewability therefore links the capture of evidence to the actual exercise of governance.

Reviewability also matters because GenAI systems often operate in fluid, prompt-driven contexts where decisions, outputs, and intermediate judgements can shift rapidly. Without disciplined evidence capture at run level, later inspection becomes partial, selective, or impossible. RAIDT addresses this problem by treating reviewability as a design requirement rather than an afterthought.

Why this concept matters

Reviewability solves a basic governance problem: organisations often discover the need for scrutiny only after a problem has already occurred. If a run cannot be reviewed later, incident investigation becomes weak, audit sampling becomes superficial, complaint resolution becomes contested, and process improvement becomes guesswork. In that condition, governance remains performative rather than evidential.

The concept also prevents a common confusion in AI governance. Many governance programmes focus on policies, principles, and high-level controls but do not ensure that a specific use of a system can be examined after the event. Reviewability closes that gap. It turns governance into something that can be demonstrated through artefacts, not merely described in governance documents.

For organisations using GenAI, this matters because outputs are often generated quickly, used by different staff, and embedded into wider workflows. The risks are therefore not only technical but organisational. A system may produce a problematic answer, a user may rely on it inappropriately, or a process may fail to record key contextual information. Reviewability provides the mechanism for understanding which of these occurred and what should change next.

Key idea: Reviewability matters because responsible GenAI governance is only credible if a later reviewer can inspect a specific run using evidence rather than relying on assertion.

What this item enables

later inspection of a run by a supervisor, auditor, investigator, or researcher who was not present at the time
comparison between what policy requires and what actually occurred in a specific GenAI-supported task
more defensible incident review, complaint handling, and exception analysis
evidence-based justification for RAIDT evidence packs and score profiles
cross-run learning about recurring weaknesses in prompts, controls, user behaviour, or workflow design
stronger contestability because challenged outputs can be examined rather than merely defended
continuous improvement because past runs become a source of organisational learning

Practical example / likely audience question

Audience question

What fails without reviewability?

Answer

The concern behind this question is that organisations may believe they have governed GenAI adequately when in fact they have only documented intentions. The direct answer is that without reviewability, incident investigation, audit sampling, complaint handling, and process improvement all become weak because later reviewers cannot inspect what actually happened in a specific run.

A practical example is a staff member using a GenAI system to draft a summary for a citizen complaint. If the summary later appears misleading or biased, a reviewer needs to know the task framing, prompt, source material, model version, user role, output, edits, approval path, and any warning or validation checks. If those materials were not captured, the organisation cannot determine whether the problem arose from the model, the prompt, the user, the workflow, or the policy environment.

RAIDT handles this issue better than a generic AI governance approach because it makes the run the unit of analysis. Instead of asking only whether the organisation has an AI policy, RAIDT asks whether this specific use at this specific time in this specific context is reviewable through evidence. That makes governance materially stronger and more defensible.

Practical example in RAIDT terms

Consider a healthcare administration team using a GenAI tool to draft discharge communication for patients after a complex hospital stay. One run produces a summary that omits an important follow-up instruction, and the omission is later raised in a complaint.

The run-level issue is not only whether the model produced a weak output, but whether the organisation can review the run properly. The evidence needed includes the prompt, the patient-information source set provided to the model, the model and version used, the user role, the time of generation, the output presented, any edits by staff, any review step before release, and the final communication sent.

The most affected RAIDT pillars are Auditability and Traceability, but Responsibility, Interpretability, and Dependability are also implicated. If the run is reviewable, the organisation can determine whether the omission was caused by incomplete source material, poor prompting, weak human review, over-trust in the tool, or an unreliable system behaviour. Reviewability therefore improves governance readiness by allowing the organisation to investigate the complaint, explain its process, and redesign the workflow using evidence.

Detailed link to RAIDT

Reviewability links to RAIDT in four ways.

First, it supports RAIDT's core idea that governance should be grounded in evidence about actual runs rather than abstract statements about responsible AI.

Second, it depends on the run as the unit of inspection. A run can only be reviewed if its context, configuration, actions, outputs, and surrounding controls are captured with enough fidelity to support later scrutiny.

Third, it strengthens both RAIDT outputs. The evidence pack provides the documentary basis for review, while the RAIDT score profile indicates how robustly the run performed across the five pillars and where weaknesses may need follow-up.

Fourth, it directly supports contestability, audit readiness, and organisational learning. Reviewability enables challenged outcomes to be examined, governance decisions to be defended, and repeated failure patterns to be identified across runs.

Reviewability ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

Link to the five RAIDT pillars

Responsibility

Reviewability supports Responsibility by making it possible to examine who initiated, checked, approved, or relied on a run. It helps distinguish tool behaviour from human judgement and organisational process.