Q057 - How_does_RAIDT_define_dependability_for_a_run

Q057 — How does RAIDT define dependability for a run?

← RAIDT · Star S5 - RAIDT Pillars and Scoring · primary item: S5.04 · Dependability

Dependability concerns whether the configured use is stable and well controlled enough for justified organisational reliance.

Appears in sources

qa_deck_100#slide 59 · Interpretability, dependability, and traceability

Answer

RAIDT defines Dependability for a run as the extent to which a configured GenAI system produces stable, reliable outcomes under expected conditions of use. Crucially, RAIDT treats run as the unit of governance, so dependability is judged for one concrete system use in context rather than for a model in the abstract. The Foundations paper states that, for generative systems, Dependability includes stability across repeated runs, robustness to minor prompt variation, and resilience to known failure modes such as hallucination, inconsistency, and unsafe completions. This definition is motivated by prompt sensitivity, runtime configuration, and underspecification: benchmark performance or a single fluent output is not enough to justify dependable use.

For RAIDT, Dependability is assessed through the run-level evidence pack rather than by narrative assurance. Relevant evidence includes repeated-run outputs under controlled settings, measures of dispersion, records of prompt perturbation tests, documented monitoring signals, and configuration capture for model version, decoding settings, retrieval state, and any adapters or alignment layers. In the scoring rubric, Dependability sits within the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability) and contributes to the run's score profile. Using the shorthand anchors 1=missing / 3=partial / 5=audit-ready, a strong Dependability judgement requires evidence that instability can be observed, explained, and managed, not merely that one output looked plausible.

Practical example

In a cybersecurity alert triage workflow, an analyst asks a GenAI assistant to summarise the same suspicious-login event across repeated runs. If the system alternates between "benign", "needs review", and "probable compromise" under fixed settings, the run is not dependable even if one answer appears sensible. Under RAIDT, reviewers would examine the run-level evidence pack for the prompt version, model deployment, retrieval snapshot, output hashes, and repeat-run variance results.

A more dependable configuration would keep the retrieval snapshot fixed, record uncertainty explicitly when evidence is incomplete, and show stable recommendations across controlled reruns and minor wording changes. That matters because triage outputs shape investigation effort, escalation speed, and potentially incident containment.

Sources in RAIDT papers

08-RAIDT_Foundations_M_V50
00-RAIDT_Scoring_v1