Q056 - Why_does_RAIDT_treat_interpretability_as_a_governance_questi

Q056 — Why does RAIDT treat interpretability as a governance question rather than a model feature?

← RAIDT · Star S5 - RAIDT Pillars and Scoring · primary item: S5.03 · Interpretability

Interpretability changes with the configured run, so governance must ask what this use made understandable to a later reviewer.

Appears in sources

qa_deck_100#slide 58 · Interpretability, dependability, and traceability

Answer

RAIDT treats interpretability as a governance question because organisational risk arises from how a generative system is used in a particular run, not from model properties in the abstract. The papers repeatedly argue that model cards, factsheets, and explainability methods are useful but insufficient when a disputed outcome depends on prompt wording, retrieval context, tool choices, alignment settings, or human review steps. In that sense, interpretability is produced socio-technically: it depends on the configured workflow, the task, the intended audience, and the evidence retained for later inspection. This is why RAIDT insists on the run as the unit of governance.

The governance framing also reflects the practical function of interpretability. In RAIDT, interpretability is about enabling appropriate reliance, review, contestation, and improvement. Those are governance tasks, not merely model diagnostics. A response may look clear while still being poorly governed if its reasons are unsupported, its uncertainty is hidden, or its source links are absent. Conversely, interpretability can be improved by design choices such as structured prompting, reason-code templates, or recorded source links; RAIDT therefore treats influence methods as governance interventions whose effects must be captured in the run-level evidence pack. The point is not to certify an intrinsically interpretable model, but to show that a specific run was made understandable enough for the stated task and user, and that this claim can be checked later through the score profile.

Practical example

In a public-service eligibility workflow, the same base model may appear interpretable in one deployment and opaque in another. If caseworkers receive a free-text recommendation without the policy clause version, source snapshot, or uncertainty flag, they cannot judge whether the advice reflects current rules or the model's own speculation. The problem is not just the model; it is the governed configuration and the evidence retained around the run.

If the organisation instead requires a template that cites the exact retrieved rule text, labels any assumptions, records reviewer actions, and stores these artefacts in the run-level evidence pack, interpretability becomes governable. Managers can audit why a caseworker relied on the advice, citizens can contest the reasoning, and improvement action can target the workflow settings that reduced clarity. That is why RAIDT locates interpretability inside governance.

Sources in RAIDT papers

08-RAIDT_Foundations_M_V50
13-RAIDT-Evidence-Review_M_v10