Q139 - What_is_the_Interpretability_pillar_and_what_evidence_suppor

Q139 — What is the Interpretability pillar and what evidence supports it?

← RAIDT · Star S5 - RAIDT Pillars and Scoring · primary item: S5.03 · Interpretability

Appears in sources

integrated_82#Q3.13

Answer

The Interpretability pillar is one of the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability) used by RAIDT to score governance readiness of a run. It asks whether the response and its limitations are understandable enough for the intended stakeholder and task. In the foundations paper, interpretability is defined as the extent to which stakeholders can understand, evaluate, and appropriately rely on an output in context; the evidence-review paper reinforces this by arguing that documentation becomes credible only when it is tied to a reconstructable use event rather than left at model level. Accordingly, RAIDT scores interpretability from the run-level evidence pack, not from a generic claim that the system is explainable.

The supporting evidence is concrete. The foundations paper states that run-level interpretability evidence includes the prompt and output structures used to elicit explanations, the uncertainty disclosures produced, and any explanation constraints or templates applied. It also notes that interpretability evidence should include structured prompts and output schemas, uncertainty communication, and links to recorded sources rather than free-form rationales. The worked scenarios add domain-specific exemplars: in credit adverse action, reason statements should be linked to documented criteria and separated from assumptions; in finance more generally, the scoring paper notes that interpretability anchors may require explicit reason codes and provenance. Within the score profile, the anchors 1=missing / 3=partial / 5=audit-ready indicate whether these evidential supports are absent, incomplete, or strong enough for reconstruction, review, and justified reliance.

Practical example

In HR, consider a manager using GenAI to draft a performance appraisal that may influence pay or promotion. If the assistant produces a persuasive summary without showing which appraisal criteria were used, where evidence came from, or which statements are tentative, the Interpretability pillar is weak even if the prose is polished. A later dispute would expose that staff cannot explain the basis of the wording.

A stronger run would require a structured appraisal template: each judgement linked to a documented competency criterion, supporting notes or retrieved policy text referenced, and uncertainty or missing evidence flagged explicitly. The prompt template, output, reviewer edits, and approvals are retained in the run-level evidence pack. That evidence supports a higher interpretability score because the output can be understood, checked, and challenged in context.

Sources in RAIDT papers

08-RAIDT_Foundations_M_V50
00-RAIDT_Scoring_v1
13-RAIDT-Evidence-Review_M_v10