Q140 - What_is_the_Dependability_pillar_and_what_evidence_supports_

Q140 — What is the Dependability pillar and what evidence supports it?

← RAIDT · Star S5 - RAIDT Pillars and Scoring · primary item: S5.04 · Dependability

Appears in sources

integrated_82#Q3.14

Answer

The Dependability pillar is RAIDT's run-level measure of behavioural stability and robustness. Within the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability), it asks whether a configured system use behaves reliably under expected conditions, across repeated use, and under foreseeable variation and change. The pillar is therefore narrower than broad claims that a model is "good" and broader than simple accuracy. In RAIDT, a run may appear persuasive yet still score weakly on Dependability if the same configuration produces materially different outputs, degrades under small prompt perturbations, or fails to surface uncertainty when evidence is incomplete.

The papers support this pillar with both conceptual and operational evidence. The Foundations paper specifies repeated-run outputs, measures of dispersion, prompt perturbation tests, and documented monitoring signals as core Dependability evidence. The Evidence Review adds why this is necessary: runtime configurations, monitoring needs, and context change make governance claims fragile unless organisations record configuration provenance and post-run signals. The scoring appendix then turns this into an assessable control: the run-level evidence pack is the scored object, Dependability is one dimension of the score profile, and anchors 1=missing / 3=partial / 5=audit-ready are applied conservatively in high-impact settings. Because influence methods as governance interventions can change stability, RAIDT expects evidence on prompt templates, retrieval snapshots, adapters, and alignment layers, so reviewers can judge not only whether behaviour was stable, but whether instability could be traced and governed.

Practical example

In a public-service advice workflow, a caseworker uses GenAI to draft an eligibility explanation based on policy text. Strong Dependability evidence would include fixed policy-version identifiers, a preserved retrieval snapshot, repeated-run outputs under the same settings, and a record of whether minor prompt changes alter the explanation or policy interpretation. If small wording changes produce different entitlement advice, the Dependability pillar should remain low even when Auditability or Traceability are reasonably strong.

That distinction is useful in practice. The organisation can see from the score profile that the run-level evidence pack is sufficient for reconstruction, yet the behaviour itself is unstable. The improvement action is then targeted: tighten prompting, review retrieval quality, or suspend the workflow from high-impact use until repeat-run stability improves.

Sources in RAIDT papers

08-RAIDT_Foundations_M_V50
00-RAIDT_Scoring_v1
13-RAIDT-Evidence-Review_M_v10