Q055 - What_does_interpretability_mean_in_RAIDT_at_run_level

Q055 — What does interpretability mean in RAIDT at run level?

← RAIDT · Star S5 - RAIDT Pillars and Scoring · primary item: S5.03 · Interpretability

Interpretability asks whether a reviewer can understand how one output was produced and why reliance is plausible.

Appears in sources
Answer

At run level, interpretability in RAIDT concerns whether the output of one configured use can be understood, evaluated, and relied on appropriately by the people using it in that context. The foundations paper defines it more broadly than narrow model interpretability: the issue is not whether the underlying model can be mathematically explained, but whether a stakeholder can see what the answer means, what it is based on, where uncertainty remains, and what limitations apply. This follows RAIDT's use of the run as the unit of governance, so interpretability is judged on the concrete run-level evidence pack for that use, not on generic model documentation.

This means interpretability is evidenced through run artefacts such as structured prompts, constrained output schemas, explanation templates, recorded sources, and explicit uncertainty or limitation statements. RAIDT's credit adverse-action example is instructive: a run is more interpretable when reason statements are linked to documented criteria and clearly separated from assumptions. In scoring terms, interpretability contributes to the score profile across the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability). The relevant question is whether the specific run gives an intended user enough disciplined explanation to use, review, or challenge the output sensibly. Hence the anchors 1=missing / 3=partial / 5=audit-ready are applied to the evidence supporting that explanation, not to how fluent or convincing the text happens to sound.

Practical example

In finance, suppose a GenAI assistant drafts an adverse-action explanation after a credit application is refused. A low-interpretability run would produce polished prose such as "application did not meet policy expectations" with no mapping to actual criteria, no separation between verified facts and inferred risk, and no uncertainty statement. Staff may understand the words but still be unable to justify or contest them.

A higher-interpretability run would use a constrained template that lists the documented criteria triggered, distinguishes applicant-provided data from model-generated phrasing, and records any missing evidence or assumptions. Those prompt and template choices are preserved in the run-level evidence pack. A reviewer can then see why the explanation was produced, whether it matches policy, and whether it should be sent, amended, or escalated.

Sources in RAIDT papers
Powered by Forestry.md