Q080 - How_do_gating_monitoring_post-run_review_and_corrective_acti

Q080 — How do gating, monitoring, post-run review, and corrective action work across implementation modes?

← RAIDT · Star S8 - Implementation and Operations · primary item: S8.04 · Gating

The routines stay the same across modes, but their speed, consistency, and evidential depth depend on how RAIDT is implemented.

Appears in sources

qa_deck_100#slide 82 · Implementation modes and deployment choices

Answer

In RAIDT, gating, monitoring, post-run review, and corrective action form one operating cycle built around the principle that the run as the unit of governance. Each significant run produces a run-level evidence pack and a score profile across the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability). A gate is then applied to that evidence object, not to a vague claim about the model. In practice, the organisation sets risk-calibrated thresholds using anchors 1=missing / 3=partial / 5=audit-ready. If the required minimum is not met, the run or workflow does not proceed to deployment, publication, or user-facing release, and may instead be routed to human review or blocked pending remediation.

Across implementation modes, the logic remains stable while the mechanics change. In manual mode, a reviewer scores the run-level evidence pack, records evidence pointers, and decides whether the gate is passed. In partial automation, the system can verify objective fields such as run IDs, prompt versions, retrieval snapshot pointers, output hashes, and recorded checks, then suggest provisional auditability or traceability scores for reviewer confirmation. In higher automation, rule-based completeness checks and repeat-run stability tests support continuous monitoring, but the evidence pack remains the source of truth. Post-run review samples routine, high-risk, or disputed runs, and corrective action follows the weak pillar: low auditability or traceability triggers instrumentation fixes; low responsibility triggers stronger constraints or human review; and low dependability triggers configuration stabilisation and monitoring. This is how influence methods as governance interventions are governed operationally rather than rhetorically.

Practical example

Consider a healthcare note-summary workflow in an emergency setting. In an early pilot, a clinician-governance reviewer manually checks whether the run-level evidence pack contains the prompt template ID, model deployment ID, output text, safety check, and an oversight flag before the summary can enter the patient record. In a more mature deployment, the system automatically checks those fields and flags missing retrieval evidence or absent escalation guidance, while the clinician reviews responsibility and interpretability.

If repeated runs begin to show unstable summaries or missing uncertainty statements, the workflow is not waved through. The gate blocks release, post-run review reconstructs the affected runs, and corrective action may include tightening the prompt, adding retrieval snapshots from the clinical guideline corpus, or requiring mandatory human sign-off until dependability recovers.

Sources in RAIDT papers

08-RAIDT_Foundations_M_V50
09-RAIDT_Empirical_M_V50.docx