S4.16 - Review_decision_and_reviewer_notes

S4.16 ? Review decision and reviewer notes

flowchart LR
    A[Traditional limitation:
output retained but review reasoning missing] --> B[RAIDT:
run-level evidence framework]
    B --> C[[S4.16 Review decision
and reviewer notes]]
    H[Decision category
reviewer role
timestamp
rationale note
escalation link] --> C
    C --> D[Evidence pack]
    C --> E[RAIDT score profile]
    C --> I[Reviewer reconstruction]
    D --> F[Reviewability and contestability]
    E --> G[Governance readiness]
    I --> J[Organisational learning and policy alignment]

? Star S4 - Evidence Architecture and Artefacts

Star context: Defines the concrete review fields and human oversight artefacts that make a RAIDT run record inspectable, challengeable, and governable rather than merely logged.

Academic picture

Definition / background

Review decision and reviewer notes record the formal human judgement applied to a generated output at the point of oversight. The item captures both the decision itself, such as accepted, accepted with edits, rejected, escalated, or linked to an incident, and the reviewer notes that explain the grounds for that decision. In governance terms, this is the evidential bridge between model behaviour and organisational action.

Conceptually, this item draws on established practices from quality assurance, editorial control, safety review, audit trails, and professional sign-off. Those traditions distinguish between what a system produced and what an authorised human decided to do with it. RAIDT brings that distinction into generative AI governance by making the review outcome a first-class part of the run record rather than a separate, informal, or forgotten step.

This item is not the same as the generated output, the output hash, or the final business outcome. The output shows what the model produced. The review decision shows how that output was treated. The reviewer notes explain why. That distinction matters because governance failures often arise not only from poor outputs, but also from weak review, inconsistent judgement, or absent documentation of human intervention.

Within RAIDT, the item belongs inside run-level evidence because the framework treats each run as the unit that must be reviewable, contestable, and reconstructable. It strengthens the evidence pack by showing how oversight operated in practice, and it informs the score profile by indicating whether responsibility, auditability, and traceability were actually supported by recorded human judgement.

Why this concept matters

Organisations often claim that a human reviewed AI output, but without a recorded decision and notes that claim remains weak. It is difficult to distinguish genuine oversight from a nominal sign-off, difficult to understand why an output was allowed through, and difficult to learn from near misses or harmful failures. This item solves that problem by turning review from an assertion into evidence.

It also avoids a common confusion in GenAI governance: the assumption that keeping the model output is enough. In reality, oversight is not visible from the output alone. A clean final document may conceal substantial reviewer edits, a serious concern, or an escalation path. Review records therefore preserve the organisational reasoning that sits between machine generation and operational use.

If this item is missing, several risks appear at once: inconsistent approval decisions across teams, poor defensibility during audit, weak contestability when users challenge outcomes, and limited organisational learning after incidents or complaints. RAIDT addresses these risks by locating review decisions inside the run record itself, where they can be connected to the wider evidence architecture.

Key idea: Review decision and reviewer notes matter because RAIDT treats human oversight as evidence that must be inspectable at run level, not as an informal claim made after the fact.

What this item captures

The formal review outcome applied to a specific generated output.
The reviewer notes explaining the rationale, concerns, or edits behind that outcome.
Whether the output was accepted, modified, rejected, escalated, or linked to an incident.
The practical basis for later reconstruction of human judgement during audit or dispute.
Evidence of how organisational policy, risk tolerance, or domain standards were applied in the run.
Signals for scoring RAIDT pillars, especially responsibility, auditability, and traceability.
Inputs for continuous improvement, such as recurring error patterns, reviewer disagreement, or escalation hotspots.

Practical example / likely audience question

Audience question

Why are reviewer notes part of the evidence pack rather than just internal working notes?

Answer

The concern behind this question is the belief that evidence should be limited to technical artefacts such as prompts, model identifiers, and outputs. That view is too narrow for responsible GenAI governance. A run becomes organisationally meaningful only when somebody decides whether the generated output is safe, accurate enough, policy-compliant, and fit for use. If that judgement is undocumented, a critical part of governance disappears from the record.

In RAIDT, reviewer notes are evidence because they document how oversight was actually exercised. Suppose a reviewer receives a generated draft policy summary and notices that the system omitted an important legal exception. The reviewer edits the text, marks the decision as accepted with edits, and records a note stating that the omission would otherwise have created compliance risk. That note is not incidental. It explains the governance intervention that changed the risk profile of the run.

RAIDT handles this better than a generic AI governance approach because it ties the review decision to the same run-level record as the prompt version, model configuration, and output trace. Rather than saying only that human review exists in principle, RAIDT shows how it occurred in a specific run, by a specific role, for a specific reason.

Practical example in RAIDT terms

Consider a healthcare trust using a GenAI assistant to draft discharge instructions after a clinician has completed the core medical record. In one run, the model generates instructions that incorrectly simplify a medication timing requirement. A pharmacist reviewer checks the draft before release.

The run-level issue is not only that the output contained a potentially harmful simplification, but also that the organisation must be able to show how the error was detected and addressed. The relevant evidence includes the output itself, the output hash, the reviewer role, the review decision of accepted with edits, the reviewer note stating that dosage timing was clinically ambiguous, and any escalation or incident linkage if the issue suggests a wider pattern.

This directly affects RAIDT pillars. Responsibility is implicated because a named review role applied professional judgement. Auditability is strengthened because the rationale is recorded. Dependability is improved because the error was corrected before deployment. Traceability is preserved because the decision can be linked to the precise run. In governance-readiness terms, this item shows that the organisation can demonstrate not only generation, but safe intervention and accountable approval.

Detailed link to RAIDT

Review decision and reviewer notes links to RAIDT in four ways.

First, it connects directly to RAIDT's core idea that governance should rest on inspectable evidence rather than broad assurances about responsible use.
Second, it links to the run because the review outcome is part of what happened in that specific configured use of the system, at that specific time, in that specific context.
Third, it enriches the evidence pack and informs the score profile by showing whether meaningful human oversight was documented and how review judgements were made.
Fourth, it supports reviewability, contestability, audit readiness, and organisational learning by making the human reasoning around a run visible and reconstructable.

Review decision and reviewer notes ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

Link to the five RAIDT pillars

Responsibility

This item is strongly relevant to Responsibility because it records who exercised oversight and how that responsibility was enacted in practice.