Q135 - Why_do_output_hashes_matter

Q135 — Why do output hashes matter?

← RAIDT · Star S4 - Evidence Architecture and Artefacts · primary item: S4.15 · Output hash

Appears in sources

integrated_82#Q3.9

Answer

Output hashes matter in RAIDT because governance readiness is evidential, not declarative. The papers repeatedly argue that organisations cannot rely on model cards, narrative assurance, or plausible explanations alone when a contested outcome must be reviewed. If a run-level record can be edited silently, then later reconstruction is weak, audit sampling is compromised, and accountability claims become difficult to defend. For that reason, the RAIDT framework treats hashes as practical integrity markers within the run-level evidence pack, alongside identifiers, timestamps, retrieval snapshots, configuration provenance, and oversight records.

Their importance is both technical and organisational. Technically, the hash makes tampering or unintended alteration detectable. Organisationally, it allows the recorded output to travel across review settings as a stable reference point: internal audit, incident investigation, compliance review, dispute resolution, and post-run learning. This is why the Evidence Review paper places output hashes under the design requirement of output integrity and retention, and why the Foundations paper presents output hash as a representative auditability field. The Technical Foundation paper adds that provenance without integrity markers remains fragmented and insufficiently governance-ready.

In RAIDT terms, output hashes help convert raw traces into a bounded governance object that can support a score profile. They do not prove that an output is correct or lawful. Rather, they help make scoring against the anchors 1=missing / 3=partial / 5=audit-ready more defensible, because reviewers can check whether the evidence itself is stable, attributable, and linked to the exact run under examination.

Practical example

Consider an HR performance-appraisal workflow in which a manager uses GenAI to draft appraisal language before final review. An employee later disputes the wording and alleges that the generated draft was harsher than the version retained in the record. If the organisation kept only a copied text field, reviewers would struggle to know whether the evidence pack still reflects the original run.

With an output hash, the organisation can compare the stored artefact against the recorded fingerprint for that run. If the hash matches, the output under review is demonstrably the same one linked to the prompt template, model settings, and oversight actions. If it does not match, the discrepancy becomes visible immediately. In RAIDT, that matters because the run as the unit of governance must remain reconstructable and contestable, not merely plausible from memory or narrative summary.

Sources in RAIDT papers

08-RAIDT_Foundations_M_V50
13-RAIDT-Evidence-Review_M_v10
18-RAIDT-Technical-Foundation_M_v04