S8.10 - Reviewer_forms

S8.10 ? Reviewer forms

flowchart LR
    A[Human oversight often asserted but weakly documented] --> B[RAIDT
Run-level evidence framework]
    B --> C[[Reviewer forms
Structured human judgement for a specific run]]
    H[Run ID
Reviewer role
Rubric criteria
Evidence pointers
Rationale
Escalation flags] --> C
    C --> D[Evidence pack]
    C --> E[RAIDT score profile]
    C --> F[Reviewer reconstruction
Disagreement handling]
    D --> G[Reviewability and audit readiness]
    E --> I[Governance readiness]
    F --> I

? Star S8 - Implementation and Operations

Star context: Shows how RAIDT can be implemented as a real governance routine, including the human review instruments that make run-level judgement visible, comparable and operationally accountable.

Academic picture

Definition / background

Reviewer forms are structured records used by human reviewers to document how a specific GenAI run was assessed. In RAIDT, they capture scores, evidence pointers, decisions, disagreements, and rationale in a way that is tied to the run record rather than treated as detached commentary. Their function is to make human judgement inspectable, comparable across runs, and usable within a governance process.

Conceptually, reviewer forms sit between informal review notes and formal audit records. They are more structured than ad hoc comments, because they require defined fields, criteria and decision statements; but they are more operational than a high-level policy document, because they are completed in relation to an actual run, model output, task context and evidence set. This is why they fit naturally inside RAIDT, which treats the run as the unit of governance.

In GenAI governance, organisations often claim that there is human oversight, but the evidence for that oversight is weak. A reviewer may approve or reject an output without leaving a reconstructable explanation of what was reviewed, against which criteria, with what level of confidence, and on the basis of which evidence. Reviewer forms address that gap. They convert human oversight into a recorded governance artefact that can travel with the evidence pack and inform the five-pillar score profile.

Reviewer forms also differ from generic quality-assurance checklists. A checklist may confirm that a step was completed, whereas a RAIDT reviewer form records the reasoning, evidence trail and judgement applied to a specific run. That makes it relevant not only to compliance but also to contestability, post-run review, corrective action, and organisational learning.

Why this concept matters

Reviewer forms solve a recurring operational problem in responsible AI governance: human oversight is frequently asserted but poorly evidenced. Without a structured review instrument, organisations struggle to show who reviewed a run, what they saw, how they judged quality or risk, why they accepted or rejected the output, and what happened when reviewers disagreed. This weakens accountability and makes retrospective review difficult.

The concept also avoids a common confusion between human presence and meaningful human oversight. A person looking at an output is not enough. Governance requires a record of the judgement process, including the criteria used and the basis for the decision. Reviewer forms help organisations move from a principle such as ?a human remains in the loop? to an operational reality in which that human intervention is documented and reviewable.

If reviewer forms are missing, several risks appear. Decisions become hard to explain, disputed cases become hard to resolve, scoring becomes inconsistent across reviewers or teams, and the organisation loses the ability to learn systematically from borderline or problematic runs. In practice, this means weaker audit readiness, weaker defensibility in front of supervisors or regulators, and weaker evidence for continuous improvement.

Key idea: Reviewer forms matter because they turn human oversight from an assertion into run-level evidence that can be reviewed, challenged and used for governance.

What this item captures

The identity, role, and authority of the reviewer for a specific run.
The run identifier, task context, and output under review.
The rubric criteria or assessment dimensions used in the review.
Scores, ratings, or qualitative judgements linked to those criteria.
Evidence pointers showing what artefacts or logs informed the judgement.
Acceptance, rejection, revision, or escalation decisions.
Disagreement notes where multiple reviewers interpret the run differently.
The rationale explaining why the reviewer reached that conclusion.
Follow-up actions such as corrective action, monitoring flags, or post-run review triggers.

Practical example / likely audience question

Audience question

Why use reviewer forms when an expert can simply approve the output directly?

Answer

The concern behind this question is usually that reviewer forms look bureaucratic or duplicative. If a qualified human has already checked the output, it can seem unnecessary to ask that person to complete additional documentation. The direct answer is that approval alone is not enough for governance. A later reviewer, supervisor, auditor, or policy stakeholder needs to know what was reviewed, what standard was applied, what evidence was considered, and why the final decision was justified.

A practical example is a clinician reviewing a GenAI-assisted discharge summary. If the clinician simply clicks ?approved?, the organisation knows only that a review happened. If the clinician completes a reviewer form, the organisation can see whether the review addressed factual accuracy, omission risk, patient-safety concerns, source-document consistency, and any edits made before release. That produces a record that can be revisited if a problem emerges later.

RAIDT handles this better than a generic AI governance approach because it does not treat the review as a detached compliance ritual. The reviewer form is tied to a specific run, integrated into the evidence pack, and capable of influencing the score profile across pillars such as Responsibility, Auditability and Traceability. In other words, RAIDT makes reviewer judgement operational, not merely symbolic.

Practical example in RAIDT terms

Consider a public-service team using a GenAI system to draft benefit-eligibility explanation letters for citizens. One run produces a letter that is fluent and apparently helpful, but the output omits an important explanation of appeal rights. The run-level issue is not just whether the model was generally useful; it is whether this particular run produced an output that could mislead a recipient in a legally and procedurally significant way.

The evidence needed includes the run ID, the prompt and input context, the generated letter, the relevant policy or legal guidance, the reviewer form, and the reviewer?s rationale for revision or rejection. The reviewer form records that the output was understandable but incomplete, notes the missing appeal-rights explanation, links that judgement to the relevant guidance, and marks the run for corrective action and template adjustment.

The RAIDT pillars affected are Responsibility, because a documented human decision is required; Auditability, because the review can be reconstructed; Interpretability, because the rationale explains what made the output unacceptable; Dependability, because repeated omissions can be tracked across runs; and Traceability, because the judgement is linked back to the run and the supporting evidence. In governance-readiness terms, the reviewer form turns a potentially disputable review event into a defensible record.

Detailed link to RAIDT

Reviewer forms link to RAIDT in four ways.

First, they operationalise RAIDT?s central claim that governance should be based on evidence about a specific run rather than on broad assurance statements about a system in general.
Second, they connect human judgement directly to the run-level evidence, so that oversight is attached to the exact output, context and decision being governed.
Third, they feed the evidence pack and help justify elements of the RAIDT score profile by recording how reviewers interpreted quality, risk, adequacy and acceptability.
Fourth, they strengthen reviewability, contestability, audit readiness and organisational learning by leaving a structured trail of judgement that others can revisit later.

Reviewer forms ? Run-level review ? Evidence pack ? RAIDT score profile ? Governance readiness

Link to the five RAIDT pillars

Reviewer forms have their strongest effect on Responsibility, Auditability and Traceability, but they also support Interpretability and Dependability when used consistently.

Responsibility

Reviewer forms clarify who exercised judgement, under what authority, and with what decision outcome. They help show that responsibility is not abstractly assigned but operationally enacted for a particular run.