Q132 - Why_do_retrieval_query_index_identifiers_retrieved_document_
Q132 — Why do retrieval query, index identifiers, retrieved document IDs, and hashes matter?
← RAIDT · Star S4 - Evidence Architecture and Artefacts · primary item: S4.10 · Retrieval query and index ID
Appears in sources
integrated_82#Q3.6
Answer
In RAIDT, retrieval query, index identifiers, retrieved document IDs, and hashes matter because together they form a complete provenance chain for a grounded run. Each field answers a different governance question. The retrieval query shows what the system asked the source space to return. The index identifier shows which corpus or retrieval state was searched. Retrieved document IDs show which records actually informed the output. Hashes show whether those artefacts, and the output itself, remained intact. Evidence Review distributes these requirements across configuration provenance, input and source provenance, and output integrity. Foundations operationalises them by requiring query or corpus identifiers, preserved snapshots of retrieved passages, document identifiers, and cryptographic hashes. Technical Foundation explains why this bundle matters: provenance turns governance from memory into evidence only when identifiers, source references, and integrity markers are preserved in a bounded governance object.
The significance is organisational, not merely technical. RAIDT treats the run as the unit of governance, so a later reviewer must be able to reconstruct and contest one material use without relying on staff recollection or on the current live system. If any link in the chain is missing, the run-level evidence pack becomes weaker. A reviewer may know that retrieval happened, but still be unable to show what was searched, where it was searched, what was returned, or whether the supporting evidence was altered. That weakens the score profile across the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability), particularly Auditability and Traceability. In anchors 1=missing / 3=partial / 5=audit-ready terms, these fields are what move RAG from an opaque pipeline to one of the influence methods as governance interventions that can be inspected, compared, and challenged across runs.
Practical example
Take a healthcare note summarisation workflow that retrieves hospital policy and local escalation guidance before drafting a discharge note. If the evidence pack stores only the final text, reviewers cannot later test whether the assistant searched the appropriate guidance, whether it relied on the current or superseded policy set, which documents shaped the summary, or whether those documents have been silently altered since the note was produced.
If the pack stores the retrieval query, the index identifier, the retrieved document IDs, and hashes, the hospital can reconstruct the grounding path. A clinical governance reviewer can inspect whether the assistant searched the right corpus, whether the retrieved guidance matched the patient context, and whether the stored artefacts still match the originals. That makes post-incident review possible in a way that citations alone do not.
Sources in RAIDT papers
08-RAIDT_Foundations_M_V5013-RAIDT-Evidence-Review_M_v1018-RAIDT-Technical-Foundation_M_v04