Q084 - How_does_RAIDT_support_monitoring_after_a_run

Q084 — How does RAIDT support monitoring after a run?

← RAIDT · Star S8 - Implementation and Operations · primary item: S8.05 · Monitoring

Monitoring extends governance beyond release by checking whether evidence quality and risk signals stay stable.

Appears in sources

qa_deck_100#slide 86 · Gating, monitoring, review, and corrective action

Answer

RAIDT supports monitoring after a run by treating the run as the unit of governance and preserving a run-level evidence pack for each material use. That pack records the prompt and template identifiers, active configuration settings, retrieved context where relevant, outputs, checks, and oversight decisions, so reviewers can reconstruct what happened after the event rather than rely on memory or model-level documentation. RAIDT then converts that evidence into a score profile across the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability), using anchors 1=missing / 3=partial / 5=audit-ready. In practice, post-run monitoring therefore becomes evidence-based comparison of governance readiness, not a loose impression that a system usually works.

The framework is explicitly designed so that scoring outputs feed monitoring and control updates over time. The papers describe run sampling, variance-aware repeat-run testing, monitoring signals for dependability, and review of change-controlled artefacts such as prompt templates, retrieval policies, adapter versions, and alignment settings. This matters because influence methods as governance interventions can alter both behaviour and evidentiary quality: a configuration may improve interpretability or traceability while also introducing new logging and versioning duties. RAIDT supports post-run monitoring by making evidence completeness, drift, recurring errors, and configuration changes visible across successive runs, and by linking weak scores to action. Low Auditability or Traceability indicates instrumentation or retention fixes; low Responsibility indicates stronger constraints or more human review; and low Dependability indicates stabilisation, repeat-run testing, and closer monitoring before further reliance.

Practical example

In a healthcare note summarisation service, each discharge-summary draft can be reviewed after the run using the stored prompt template ID, model deployment ID, decoding settings, retrieval snapshot hash, output hash, recorded safety check, and clinician oversight flag. Suppose a weekly audit sample shows that recent runs score lower on Auditability and Traceability because a retrieval update stopped preserving snapshots, while Dependability also falls because summaries vary more under repeat runs. RAIDT lets the team connect those changes to a concrete configuration shift rather than treating them as isolated mistakes. The governance response is then specific: restore snapshot logging, reintroduce the structured prompt with uncertainty disclosure, increase clinician review temporarily, and monitor later runs for score recovery. That is post-run monitoring as governed remediation, not merely dashboard watching.

Sources in RAIDT papers

08-RAIDT_Foundations_M_V50
18-RAIDT-Technical-Foundation_M_v04