Q272 - How_implementation_modes_change_evidence_burden

Q272 — How implementation modes change evidence burden

← RAIDT · Star S8 - Implementation and Operations · primary item: S8.02 · Semi-automated implementation

RAIDT keeps the same conceptual core while allowing different levels of organisational maturity and tooling.

Appears in sources
Answer

Implementation modes change the evidence burden chiefly by redistributing who captures, checks, and assembles governance evidence, not by changing what RAIDT ultimately requires. In manual mode, the burden sits heavily with reviewers or operators: they must save prompts, outputs, and metadata, record evidence pointers, and complete the scoring sheet themselves. In semi-automated or partial-automation mode, the system carries more of the clerical burden by logging stable identifiers, retrieval snapshot pointers, hashes, and recorded checks, and by proposing completeness-based scores for Auditability and Traceability. Human reviewers then focus more of their effort on confirming the record and judging Responsibility, Interpretability, and Dependability. In higher or fully automated modes, orchestration layers and repositories capture run evidence directly and can add automated repeat-run testing for stability, but the papers stress that these automations are trusted only when grounded in logs and stable identifiers.

This means that greater automation lowers the handling burden, improves consistency, and can reduce omitted fields, but it does not lower the substantive evidence threshold. The same run-level evidence pack remains necessary if a run is to be reviewable, contestable, and reconstructable. Indeed, the papers explicitly warn that evidence capture can itself be burdensome and must be proportionate to risk, with access controls, minimisation, and hashing where full storage is inappropriate. RAIDT therefore uses implementation mode as a governance design choice, not as an excuse to weaken scrutiny. The practical implication is that modes change the path to the evidence, not the evidential standard: the score profile is still grounded in the run-level evidence pack, and runs still sort across anchors 1=missing / 3=partial / 5=audit-ready according to completeness and reviewability.

Practical example

In a public-service eligibility workflow, a manual RAIDT implementation might require a caseworker to save the prompt, generated explanation, policy reference, and review notes into a scoring sheet after each run. The evidence burden is high because the human must assemble both the record and the evaluation. In a semi-automated version, the system logs the run ID, policy clause version, retrieval snapshot, output hash, and review flag automatically, while the model drafts an evidence summary for human checking. In a fuller implementation, an orchestration layer stores those fields directly in an evidence repository and runs automated completeness checks before the case reaches review.

Across all three modes, the governance question is the same: can the run be reconstructed and justified later? What changes is whether the burden falls mostly on frontline staff, on shared instrumentation, or on a mature governance platform.

Sources in RAIDT papers
Powered by Forestry.md