Q136 - Why_do_reviewer_notes_and_decisions_matter

Q136 — Why do reviewer notes and decisions matter?

← RAIDT · Star S4 - Evidence Architecture and Artefacts · primary item: S4.16 · Review decision and reviewer notes

Appears in sources

integrated_82#Q3.10

Answer

Reviewer notes and decisions matter because RAIDT is concerned with reconstructable accountability, not merely with preserving outputs. The papers repeatedly argue that model cards, lifecycle logs, or raw traces are insufficient if they do not show how judgement was distributed in a specific run. Reviewer notes make that judgement visible. They preserve what a reviewer found problematic or acceptable, what was amended, which constraints or uncertainties were recognised, and why an output was approved, rejected, or escalated. In that sense, notes are not administrative decoration; they are evidence-bearing artefacts that make review reasoning inspectable.

Their importance is both analytical and organisational. Analytically, reviewer notes strengthen Responsibility and Auditability because they connect system behaviour to documented human oversight decisions. Organisationally, they support contestability, post-hoc challenge, incident learning, audit sampling, and comparison across runs and configurations. The review paper is especially clear that a run-level evidence object should capture what humans reviewed, what they changed, what they approved, and what was escalated, while the foundations paper treats review notes as part of the artefacts that make governance observable and support learning after incidents.

This is also why reviewer notes matter for the score profile. A run may have rich technical provenance yet still remain weakly governable if nobody can tell why it was accepted or what reservations were recorded. Reviewer notes therefore help move a run-level evidence pack from thin documentation towards evidence that could plausibly sustain anchors 1=missing / 3=partial / 5=audit-ready across the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability).

Practical example

Consider a public-service eligibility-advice workflow. A GenAI system drafts an explanation of whether a claimant appears eligible under current rules. The reviewer notes that one retrieved rule excerpt is current, but the claimant’s residency evidence is incomplete and the draft explanation overstates certainty. The reviewer edits the wording, records that the answer is provisional, and escalates the case for manual verification.

Those notes matter later if the claimant challenges the advice. The organisation can show not only the generated text, but also that the reviewer spotted the weakness, constrained reliance on the output, and triggered escalation. If the same pattern appears repeatedly across runs, the notes also support organisational learning by indicating that the retrieval setup or prompt design needs improvement. Without the notes, the organisation would retain only a polished answer and lose the reasoning that made its use governable.

Sources in RAIDT papers

08-RAIDT_Foundations_M_V50
13-RAIDT-Evidence-Review_M_v10