S2.05 - Reviewability

S2.05 ? Reviewability

flowchart LR
    A[Policy claims without inspectable evidence] --> B[RAIDT
Run-level evidence framework]
    A2[Weak logging and missing run context] --> B
    H[Healthcare, finance, public services, legal review, enterprise productivity] --> C
    H2[Prompt capture, metadata, approval records, review notes] --> C
    B --> C[[Reviewability
Later inspection of a specific run]]
    C --> D[Evidence pack]
    C --> E[RAIDT score profile]
    C --> F[Reviewer reconstruction]
    C --> G[Complaint handling and organisational learning]
    D --> I[Governance readiness]
    E --> I
    F --> I
    G --> I

? Star S2 - Governance Meaning and Problem Context

Star context: Clarifies governance as oversight, control, accountability, reviewability and continuous improvement rather than a vague ethics label. In RAIDT, reviewability makes governance inspectable at the level of the individual run rather than leaving oversight at the level of policy aspiration.


Academic picture
Definition / background

Reviewability means that a run can be examined later by a person who was not present when it occurred, using sufficient evidence to understand what was done, under what conditions, and with what consequences. In governance terms, it is the capacity for retrospective inspection. In RAIDT, this matters because run-level governance only becomes credible if a later reviewer can inspect a run without relying on memory, informal explanation, or unverified assurance.

Conceptually, reviewability sits close to auditability, traceability, accountability, and reconstructability, but it is not identical to any of them. Traceability helps link artefacts across a run. Reconstructability helps rebuild the sequence and context of events. Auditability helps support formal assurance processes. Accountability assigns responsibility for what happened. Reviewability is the practical capability that lets those governance functions be exercised by an independent or later-facing reviewer.

This concept belongs inside RAIDT because RAIDT is explicitly designed to move governance from principles and assertions towards inspectable evidence. A run-level evidence pack provides the materials that make review possible, while the RAIDT score profile summarises how well the run met the framework's expectations across Responsibility, Auditability, Interpretability, Dependability, and Traceability. Reviewability therefore links the capture of evidence to the actual exercise of governance.

Reviewability also matters because GenAI systems often operate in fluid, prompt-driven contexts where decisions, outputs, and intermediate judgements can shift rapidly. Without disciplined evidence capture at run level, later inspection becomes partial, selective, or impossible. RAIDT addresses this problem by treating reviewability as a design requirement rather than an afterthought.

Why this concept matters

Reviewability solves a basic governance problem: organisations often discover the need for scrutiny only after a problem has already occurred. If a run cannot be reviewed later, incident investigation becomes weak, audit sampling becomes superficial, complaint resolution becomes contested, and process improvement becomes guesswork. In that condition, governance remains performative rather than evidential.

The concept also prevents a common confusion in AI governance. Many governance programmes focus on policies, principles, and high-level controls but do not ensure that a specific use of a system can be examined after the event. Reviewability closes that gap. It turns governance into something that can be demonstrated through artefacts, not merely described in governance documents.

For organisations using GenAI, this matters because outputs are often generated quickly, used by different staff, and embedded into wider workflows. The risks are therefore not only technical but organisational. A system may produce a problematic answer, a user may rely on it inappropriately, or a process may fail to record key contextual information. Reviewability provides the mechanism for understanding which of these occurred and what should change next.

Key idea: Reviewability matters because responsible GenAI governance is only credible if a later reviewer can inspect a specific run using evidence rather than relying on assertion.

What this item enables
Practical example / likely audience question

Audience question

What fails without reviewability?

Answer

The concern behind this question is that organisations may believe they have governed GenAI adequately when in fact they have only documented intentions. The direct answer is that without reviewability, incident investigation, audit sampling, complaint handling, and process improvement all become weak because later reviewers cannot inspect what actually happened in a specific run.

A practical example is a staff member using a GenAI system to draft a summary for a citizen complaint. If the summary later appears misleading or biased, a reviewer needs to know the task framing, prompt, source material, model version, user role, output, edits, approval path, and any warning or validation checks. If those materials were not captured, the organisation cannot determine whether the problem arose from the model, the prompt, the user, the workflow, or the policy environment.

RAIDT handles this issue better than a generic AI governance approach because it makes the run the unit of analysis. Instead of asking only whether the organisation has an AI policy, RAIDT asks whether this specific use at this specific time in this specific context is reviewable through evidence. That makes governance materially stronger and more defensible.

Practical example in RAIDT terms

Consider a healthcare administration team using a GenAI tool to draft discharge communication for patients after a complex hospital stay. One run produces a summary that omits an important follow-up instruction, and the omission is later raised in a complaint.

The run-level issue is not only whether the model produced a weak output, but whether the organisation can review the run properly. The evidence needed includes the prompt, the patient-information source set provided to the model, the model and version used, the user role, the time of generation, the output presented, any edits by staff, any review step before release, and the final communication sent.

The most affected RAIDT pillars are Auditability and Traceability, but Responsibility, Interpretability, and Dependability are also implicated. If the run is reviewable, the organisation can determine whether the omission was caused by incomplete source material, poor prompting, weak human review, over-trust in the tool, or an unreliable system behaviour. Reviewability therefore improves governance readiness by allowing the organisation to investigate the complaint, explain its process, and redesign the workflow using evidence.

Detailed link to RAIDT

Reviewability links to RAIDT in four ways.

First, it supports RAIDT's core idea that governance should be grounded in evidence about actual runs rather than abstract statements about responsible AI.

Second, it depends on the run as the unit of inspection. A run can only be reviewed if its context, configuration, actions, outputs, and surrounding controls are captured with enough fidelity to support later scrutiny.

Third, it strengthens both RAIDT outputs. The evidence pack provides the documentary basis for review, while the RAIDT score profile indicates how robustly the run performed across the five pillars and where weaknesses may need follow-up.

Fourth, it directly supports contestability, audit readiness, and organisational learning. Reviewability enables challenged outcomes to be examined, governance decisions to be defended, and repeated failure patterns to be identified across runs.

Reviewability ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

Link to the five RAIDT pillars

Responsibility

Reviewability supports Responsibility by making it possible to examine who initiated, checked, approved, or relied on a run. It helps distinguish tool behaviour from human judgement and organisational process.

Example evidence / implication:

Auditability

Reviewability has its strongest direct connection to Auditability because later inspection is a precondition for meaningful audit work. If a run cannot be reviewed, it cannot be audited in any substantive sense.

Example evidence / implication:

Interpretability

Reviewability supports Interpretability by preserving the materials needed to understand how an output was framed, generated, and used. It does not guarantee full model transparency, but it improves practical understanding of the run.

Example evidence / implication:

Dependability

Reviewability supports Dependability by revealing patterns of failure, inconsistency, or workflow weakness across runs. A dependable process is easier to improve when problematic runs can be reviewed closely.

Example evidence / implication:

Traceability

Reviewability depends heavily on Traceability because later review requires linked artefacts and a clear record of what happened. Traceability provides the connective structure that makes review feasible.

Example evidence / implication:

Reviewability is therefore strongest when Auditability and Traceability are both mature, but it has practical implications for all five pillars.

Why this item is more than a generic concept

In general AI governance, reviewability may simply mean that a system or process can be looked at later in some broad sense. In RAIDT, it has a more operational meaning. It means that a specific run has sufficient evidence to support later inspection by someone who was not present, and that this inspection can feed directly into an evidence pack, a score profile, and a governance judgement.

The RAIDT meaning is therefore more disciplined than a generic governance slogan. It asks not whether review is desirable in principle, but whether review is actually possible in practice for a concrete run. That makes the concept more demanding, more testable, and more useful for governance design.

Common misunderstanding

Misunderstanding

Reviewability means keeping a log somewhere so that the organisation can say it recorded the use of AI.

Correction

A log alone is not enough. Reviewability requires enough structured evidence for a later reviewer to understand what happened, why it happened, how the output was used, and whether controls were followed. For example, a timestamp and model name do not by themselves explain why a flawed output was accepted into a workflow. RAIDT corrects this by tying reviewability to run-level evidence, contextual metadata, and governance interpretation rather than to bare logging alone.

Boundary and limitation

Reviewability does not prove that a run was correct, fair, lawful, or safe. It only makes later scrutiny possible. A run may be highly reviewable and still reveal poor judgement, weak controls, or harmful outcomes.

It also does not replace real-time oversight, human competence, or domain-specific assurance. If evidence capture is selective, inconsistent, or badly governed, reviewability will be partial. Confidentiality, retention limits, and access controls can also constrain what can be examined later.

RAIDT handles this limitation by treating reviewability as one governance capability within a wider evidence framework. It must work alongside accountability, contestability, traceability, and dependable workflow design to produce meaningful assurance.

Implementation levels

Manual implementation

A researcher or small team can implement reviewability manually by using a structured run template, preserving prompts and outputs, recording task context and user role, and attaching brief review notes after consequential runs. Even a spreadsheet or markdown-based evidence log can improve reviewability if the fields are consistent.

Semi-automated implementation

Semi-automated implementation adds templates, metadata capture, version fields, linked evidence folders, and standard review prompts. In this model, parts of the record are generated automatically, while a human still completes contextual judgements, risk notes, and approval information.

Fully automated implementation

At scale, reviewability can be implemented through a platform wrapper, orchestration layer, logging pipeline, or governance dashboard that captures run metadata, prompt and output snapshots, model identifiers, workflow status, reviewer actions, and exception flags automatically. This creates a reliable basis for evidence packs, score calculation, sampling, and trend analysis across many runs.

Practical use in the RAIDT project

Within RAIDT, reviewability helps explain why the run is the correct unit of governance in Paper 08 Foundations: governance claims become stronger when they can be tested against inspectable runs rather than organisational rhetoric. In Paper 09 Empirical Validation, it provides a practical criterion for assessing whether RAIDT produces evidence that independent reviewers can actually use. In Paper 10 Policy Pathways, it supports the argument that policy should encourage evidence-bearing governance processes, not only broad ethical declarations.

Reviewability is also useful across sector playbooks because different fields will ask different questions of the same basic capability. In healthcare, the concern may be complaint investigation. In finance, it may be audit sampling or model-risk review. In public services, it may be case defensibility and procedural fairness. In viva discussion and journal positioning, reviewability helps articulate what RAIDT adds beyond generic AI governance frameworks: it operationalises governance through inspectable run-level evidence.

Key audience questions to prepare for

Q1. Is reviewability just another word for auditability?

Not quite. Auditability is the broader capacity to support formal assurance and audit processes. Reviewability is the more immediate practical condition that a specific run can be inspected later by someone who was not present. In RAIDT, reviewability is one of the conditions that makes auditability real.

Q2. Why does RAIDT focus on reviewability at run level rather than system level?

Because many governance failures emerge in situated use rather than in abstract system descriptions. A system may appear compliant at a high level, while a specific run is poorly evidenced and impossible to inspect. RAIDT addresses this gap by treating the run as the actionable unit of governance.

Q3. Does reviewability require storing everything?

No. It requires storing enough relevant evidence to support later scrutiny in a proportionate and governed way. Good reviewability depends on disciplined evidence design, not indiscriminate retention. Retention, confidentiality, and minimisation still matter.

Q4. What is the difference between reviewability and reconstructability?

Reconstructability is the stronger ability to rebuild the sequence and context of a run in detail. Reviewability is the broader capacity to inspect the run meaningfully after the fact. Reconstruction often strengthens review, but the concepts are not identical.

Q5. Why is reviewability important for organisational learning?

Because without reviewable runs, organisations cannot reliably identify recurring weaknesses in prompts, controls, workflows, or user practice. Reviewability turns past runs into evidence for improvement rather than leaving failures as isolated anecdotes.

Suggested citation concepts to support this item
Short explanation for presentation

Reviewability is the ability to inspect a specific GenAI run after the event using enough evidence to understand what happened, under what conditions, and with what consequences. In RAIDT, this is critical because governance is anchored at run level rather than at the level of abstract policy. If a later reviewer cannot examine the prompt, context, model choice, output, human checks, and downstream use, then responsible AI claims remain largely rhetorical. Reviewability therefore connects run-level evidence to practical governance outcomes: incident investigation, audit readiness, complaint handling, contestability, and continuous improvement. It is one of the clearest ways RAIDT shifts governance from principle to proof, because it asks whether a concrete organisational use of GenAI can actually be examined and defended.

One-line takeaway

Reviewability is the capacity to inspect a specific GenAI run after the fact because RAIDT ties governance to run-level evidence rather than organisational assertion.

Related items in governance meaning and problem context (9)
Anchored questions

Audience question: What fails without reviewability? Answer: incident investigation, audit sampling, complaint handling and process improvement all become weak.

Mentioned in reference-paper summaries (1)

Paper summaries live in Port/93-References/pdf_summaries/. Each file listed below contains the key term at least once.

Powered by Forestry.md