Q257 - Post-run_review_definition_example_and_why_it_matters_in_RAI

Q257 — Post-run review — definition, example, and why it matters in RAIDT

← RAIDT · Star S8 - Implementation and Operations · primary item: S8.06 · Post-run review

G. Implementation & Operations | Ordered by mind-map priority: inner circles first, then operational detail.

Appears in sources

workshop_dense_100#slide 80

Answer

Post-run review in RAIDT is a governance routine for examining one completed generative AI use after the fact, usually because it was sampled, disputed, high-impact, or operationally unusual. Its formal purpose is not simply to ask whether the output looked good. Rather, it reconstructs the run-level evidence pack for that case and evaluates whether governance readiness is demonstrable for that specific use. This follows RAIDT's central claim that the run as the unit of governance is the point at which prompts, retrieval context, tool use, configuration, oversight, and organisational consequence meet. The reviewer therefore inspects both technical traces and organisational review records, then produces a score profile across the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability).

It matters because generative AI behaviour is shaped at run time, so model cards, lifecycle dashboards, or periodic audits alone cannot show what happened in one contested case. RAIDT uses post-run review to make governance comparable, contestable, and improvable through anchors 1=missing / 3=partial / 5=audit-ready. The papers also show why this matters organisationally: review supports audit sampling, post-incident reconstruction, escalation, and continuous improvement, while exposing whether influence methods as governance interventions have strengthened or weakened evidence quality. In that sense, post-run review is both a definition and a control mechanism: it converts fragmented logs, provenance fields, and oversight notes into a bounded judgement about whether reliance on that run was well governed and what should change next.

Practical example

Consider the healthcare note-summarisation scenario described in the papers. A hospital samples one generated summary for post-run review because the case was clinically high risk. The reviewer finds that the run-level evidence pack contains the prompt template ID, model deployment ID, output hash, and a recorded safety check, so Auditability is relatively strong. However, the summary does not communicate uncertainty clearly and the escalation flag is missing even though pending tests remained unresolved. The resulting score profile shows stronger Auditability than Responsibility and Interpretability. That matters because the review does not stop at description: it leads to prompt revision, explicit uncertainty language, mandatory escalation fields, and closer oversight on similar future runs. In RAIDT terms, the review improves governance not by abstract reassurance, but by changing evidence capture and use conditions for subsequent runs.

Sources in RAIDT papers

08-RAIDT_Foundations_M_V50
18-RAIDT-Technical-Foundation_M_v04