S8.06 - Post-run_review

S8.06 ? Post-run review

flowchart LR
    A1[Ad hoc retrospective checks]
    A2[Weak reconstruction of completed runs]
    A3[Policy claims without case-level evidence]
    B[RAIDT - run-level evidence framework]
    C[[Post-run review]]
    D1[Evidence pack strengthened]
    D2[Score profile checked or revised]
    D3[Reviewer reconstruction]
    D4[Corrective action and learning]
    E[Governance move: reviewability, contestability, audit readiness]
    F1[Reviewer forms]
    F2[Monitoring dashboard flags]
    F3[Escalation workflow]
    F4[Public services and enterprise use cases]

    A1 --> B
    A2 --> B
    A3 --> B
    B --> C
    C --> D1
    C --> D2
    C --> D3
    C --> D4
    D1 --> E
    D2 --> E
    D3 --> E
    D4 --> E
    F1 --> C
    F2 --> C
    F3 --> C
    F4 --> C

? Star S8 - Implementation and Operations

Star context: Shows how RAIDT can be adopted manually, semi-automatically or through orchestration, and how it becomes part of real governance routines through structured review after real runs have taken place.

Academic picture

Definition / background

Post-run review is the structured examination of a completed generative AI run after the system has produced an output and the run can be reconstructed from available evidence. In RAIDT, it is typically applied to sampled, flagged, high-risk, or contested runs rather than assumed to occur identically for every single run. The purpose is to determine whether the run was appropriately configured, sufficiently evidenced, responsibly used, and suitably documented for governance, learning, and possible challenge.

Conceptually, post-run review sits between simple monitoring and full incident investigation. Monitoring watches patterns over time; incident investigation responds to a known failure or harm; post-run review examines a particular completed run in enough depth to judge evidence quality, contextual appropriateness, user impact, and next actions. It therefore functions as an operational governance checkpoint rather than as a purely technical evaluation step.

This matters in generative AI governance because many important issues only become visible after a run has occurred: weak prompts, missing provenance, insufficient human oversight, overconfident interpretation of an output, or a mismatch between policy expectations and operational reality. Without post-run review, organisations may claim assurance in the abstract while lacking a defensible account of what actually happened in a specific case.

Inside RAIDT, post-run review belongs naturally because RAIDT treats the run as the unit of governance. The review draws on run-level evidence, tests the completeness of the evidence pack, and may confirm or challenge the run's five-pillar score profile across Responsibility, Auditability, Interpretability, Dependability, and Traceability. In that sense, post-run review is one of the main ways RAIDT turns evidence into reviewability and reviewability into governance readiness.

Why this concept matters

Post-run review solves a practical governance problem: organisations need a credible way to look back at a completed GenAI use episode and decide whether it was acceptable, well-evidenced, and organisationally defensible. It prevents governance from stopping at high-level principles or pre-deployment intentions.

It also avoids a common confusion. A system may be deployed with policies, model cards, and general guidance, yet still produce runs that cannot later be reconstructed or justified. Post-run review makes the question more concrete: what happened in this run, what evidence supports the account, and what should happen next?

If post-run review is missing, several risks follow. Errors may be noticed but not formally learned from. High-risk uses may proceed without retrospective scrutiny. Weak evidence practices remain hidden. Score profiles may become performative rather than evidential. Most seriously, the organisation may be unable to respond convincingly when a supervisor, auditor, regulator, or affected user asks how a specific output was produced and reviewed.

For organisations using GenAI in real work, post-run review is therefore a bridge from aspiration to operation. It helps move governance from "we intend to be responsible" toward "we can show how this run was reviewed, what the evidence showed, and what we changed as a result".

Key idea: Post-run review matters because it turns a completed GenAI run into a reviewable governance object rather than an unexamined historical event.

What this item enables

Structured retrospective examination of completed runs, especially sampled, flagged, high-risk, or contested cases.
Checking whether the run-level evidence pack is complete enough for reconstruction and justified review.
Testing whether provisional pillar scores remain credible when examined against real evidence.
Identifying user-impact issues, policy deviations, prompt weaknesses, documentation gaps, or unsafe workarounds.
Triggering corrective action, escalation, retraining, workflow redesign, or tighter gating where needed.
Producing organisational learning that can improve future runs rather than treating each problematic run as isolated.

Practical example / likely audience question

Audience question

Is every GenAI run supposed to be reviewed by an expert after it happens?

Answer

The concern behind this question is usually one of feasibility. If governance appears to require expert review of every output, the process looks too expensive, too slow, and too intrusive for real organisational use. That concern is reasonable, but it misstates what post-run review means in RAIDT.

The direct answer is no. RAIDT does not assume that every run receives the same depth of retrospective expert scrutiny. Instead, post-run review is typically risk-based and evidence-based. Some runs are sampled routinely; some are reviewed because monitoring or user feedback flags them; some are reviewed because they involve sensitive domains, unusual prompts, important decisions, or downstream consequences.

A practical example is a public-sector team using GenAI to draft responses to citizen enquiries. Most routine, low-risk drafts may only require lightweight logging and periodic sample review. By contrast, a run that generated a potentially misleading eligibility statement, relied on outdated policy text, or was challenged by a caseworker would merit deeper post-run review. RAIDT handles this better than a generic AI governance approach because it does not rely on vague assurances that "oversight exists"; it specifies what run-level evidence should be available, how the run can be reconstructed, how the pillar profile can be checked, and what governance action follows from the review.

Practical example in RAIDT terms

Consider a local authority using a GenAI assistant to draft housing support letters for caseworkers. One completed run is flagged because the draft letter appears to imply that an applicant is ineligible for support when the underlying policy is more nuanced.

The run-level issue is not simply that the output may be wrong. The deeper governance question is whether the run can be reconstructed and judged properly. Reviewers need the prompt, the model or service version, any retrieved policy material, time and context of use, user edits, the final issued text if applicable, and any reviewer notes or policy checks associated with that run.

In RAIDT terms, the evidence pack for that run should allow the reviewer to ask: Was the user relying on approved guidance? Was the output interpretable enough for a caseworker to challenge it? Was the system dependable in this task context? Can the reasoning chain from prompt to final decision be traced? The most affected pillars are likely to be Responsibility, Dependability, and Traceability, with Auditability also central because the organisation must show how the review took place.

Post-run review improves governance readiness here by converting a potentially problematic draft into a documented learning event. The organisation can update guidance, adjust prompts, refine escalation rules, and record why this kind of run now receives closer attention in future.

Detailed link to RAIDT

Post-run review links to RAIDT in four ways.

First, it supports RAIDT's core idea that governance should attach to actual runs rather than only to abstract system descriptions or policy statements.
Second, it depends on the run being reconstructable from run-level evidence, including context, configuration, inputs, outputs, and review metadata.
Third, it tests and strengthens the quality of the evidence pack and may confirm, challenge, or refine the run's five-pillar score profile.
Fourth, it advances reviewability, contestability, audit readiness, and organisational learning by making completed runs open to disciplined retrospective scrutiny.

Post-run review ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

This chain matters because post-run review is one of the clearest moments where RAIDT becomes operational. A run is no longer only generated; it becomes examinable, discussable, and governable.

Link to the five RAIDT pillars

Responsibility

Post-run review strengthens Responsibility by checking whether appropriate human judgement, role clarity, and escalation expectations were actually present in the run, not merely described in policy.