Q085 - What_happens_in_post-run_review
Q085 — What happens in post-run review?
← RAIDT · Star S8 - Implementation and Operations · primary item: S8.06 · Post-run review
Post-run review turns a completed run into a documented governance judgement for assurance, learning, and challenge handling.
Appears in sources
qa_deck_100#slide 87 · Gating, monitoring, review, and corrective action
Answer
In RAIDT, post-run review is the structured examination of a sampled or flagged run after generation, with the run as the unit of governance. The reviewer does not rely on memory or a model-level document; instead, they inspect the run-level evidence pack assembled from the run record. That pack is expected to preserve the concrete artefacts needed for reconstruction and challenge: run ID and time, prompt template and version, model deployment identifier, configuration settings, retrieved context snapshots and hashes where relevant, output record and integrity markers, and any logged checks or oversight decisions. The papers treat this as the practical shift from narrative assurance to evidence-based governance, so the question becomes what happened in this configured use, under which controls, and with what reviewable evidence.
The review then scores the evidence across the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability) and records a score profile using anchors 1=missing / 3=partial / 5=audit-ready. Reviewers assess evidence completeness, integrity, explanation quality, provenance, and whether monitoring or repeat-run evidence supports dependable use. Just as importantly, post-run review turns influence methods as governance interventions into inspectable objects: prompting, retrieval augmentation, PEFT/LoRA, and alignment controls must be logged and judged as governed configuration rather than treated as informal engineering choices. Its output is therefore both evaluative and corrective: low scores can trigger instrumentation fixes, tighter prompting constraints, human-review escalation, stronger provenance capture, or configuration stabilisation and monitoring in later runs.
Practical example
In a public-service eligibility workflow, a run is flagged because the generated advice affected a claimant-facing explanation. During post-run review, the reviewer opens the run-level evidence pack and checks whether the exact policy clause used in generation was preserved with its version, whether retrieval snapshot identifiers and hashes exist, which prompt template was active, and whether the output and oversight decision were logged. Suppose the answer cites a rule but the underlying retrieval snapshot was not stored. The run may still appear plausible, yet the review would mark Auditability and Traceability as only partial rather than audit-ready, because the advice cannot be reconstructed robustly. The improvement action is concrete: require immutable policy snapshots, preserve retrieval identifiers by default, and route similar future runs to human review until the evidence gap is closed.
Sources in RAIDT papers
08-RAIDT_Foundations_M_V5018-RAIDT-Technical-Foundation_M_v04