S2.10 - GenAI_failure_modes

S2.10 ? GenAI failure modes

flowchart LR
    A[Problem landscape:
hallucination, overconfidence,
instability, opacity, drift,
missing provenance, over-reliance] --> B[RAIDT:
run-level evidence framework]
    B --> C[[GenAI failure modes:
governance problem context
made operational at run level]]
    H[Applied settings:
healthcare, public services,
finance, education, enterprise work] --> C
    C --> D[Run-level evidence pack]
    C --> E[RAIDT score profile]
    C --> F[Reviewer reconstruction
and contestability]
    C --> G[Governance readiness,
organisational learning,
policy alignment]
    D --> G
    E --> G

? Star S2 - Governance Meaning and Problem Context

Star context: Positions GenAI failure modes as the practical problem context that makes governance necessary: RAIDT treats them not as abstract model weaknesses, but as operational risks requiring oversight, control, accountability, reviewability and continuous improvement.

Academic picture

Definition / background

GenAI failure modes are the recurring ways in which a generative AI system can produce unsafe, misleading, unreliable, unreviewable, or poorly governed outcomes in practice. They include obvious output problems such as hallucination and overconfidence, but also governance-relevant failures such as unstable behaviour across similar runs, source opacity, missing provenance, hidden configuration drift, and user over-reliance. The concept therefore extends beyond simple model error. It concerns the conditions under which organisational users cannot adequately justify, reconstruct, challenge, or control what the system has done.

Conceptually, this item sits at the intersection of AI safety, reliability engineering, human factors, information governance, and accountability studies. A failure mode is not identical to a software defect, because the same system may appear acceptable in one context and problematic in another. Nor is it reducible to a single bad answer. In governance terms, a failure mode is a recurrent pattern of breakdown that creates risk for decision quality, assurance, responsibility allocation, and institutional legitimacy.

This matters especially for generative AI because many organisational failures are not fully visible in the output alone. A polished answer may conceal missing sources, outdated retrieved material, prompt changes, undocumented model updates, or a level of confidence that the evidence does not support. For that reason, RAIDT treats failure modes as governance objects rather than merely technical anomalies. The framework asks what happened in a specific run, under what configuration, using which inputs, with what traceability, and with what implications for oversight and review.

Within RAIDT, this item belongs in the problem-context star because it explains why a run-level evidence framework is needed at all. Failure modes motivate the move from principles and assurances to documented evidence packs and structured score profiles. A run-level evidence pack can capture prompts, model versions, tools, retrieved sources, timestamps, reviewer actions, and exceptions. The five-pillar profile then shows how these failure modes affect Responsibility, Auditability, Interpretability, Dependability, and Traceability. In that sense, GenAI failure modes are not peripheral to RAIDT; they are one of the main reasons the framework exists.

Why this concept matters

Without a clear account of GenAI failure modes, governance discussions become vague. Organisations may talk about responsible AI in general terms while missing the concrete ways in which a seemingly useful system can mislead staff, weaken accountability, or create untraceable decisions. Naming the failure modes prevents governance from collapsing into broad principle statements with no operational grip.

This concept also avoids a common confusion between performance evaluation and governance readiness. A model can achieve superficially acceptable results while still being poorly governed because its runs cannot be reconstructed, its sources cannot be checked, its configuration changes are undocumented, or users become over-reliant on plausible outputs. RAIDT matters here because it makes these otherwise hidden risks visible at the level where they actually occur: the run.

For organisations, this means failure modes become a basis for control design, evidence capture, reviewer training, escalation rules, and continuous improvement. Rather than asking only whether the model is good in general, RAIDT asks whether this particular use was governable, reviewable, and contestable in context. That shift is critical for operational governance.

Key idea: GenAI failure modes matter because they turn abstract AI risk into specific run-level governance questions that RAIDT can evidence, review, and improve.

What this item captures

The main recurrent ways generative AI use can fail in organisational contexts, including hallucination, overconfidence, instability, opacity, drift, missing provenance, and over-reliance.
The translation of technical weaknesses into governance risks such as poor oversight, weak accountability, low reviewability, and reduced contestability.
The need to inspect each run in relation to task, context, prompt, model configuration, retrieved material, and human review activity.
The evidence requirements needed to detect, explain, and govern failure, rather than merely notice that an output was unsatisfactory.
The rationale for RAIDT's evidence pack and score profile as mechanisms for operationalising governance around these risks.

Practical example / likely audience question

Audience question

If GenAI failure modes are already widely known, why does RAIDT need to make them so explicit?

Answer

The underlying concern behind this question is that failure modes may seem too obvious or generic to deserve a dedicated place in a governance framework. People often assume that once risks such as hallucination or over-reliance are recognised, standard policy language or basic human review is enough. The difficulty is that recognition alone does not tell an organisation how to inspect, reconstruct, or contest a specific problematic use of a GenAI system.

RAIDT makes failure modes explicit because generic awareness does not create operational control. For example, an organisation may know that hallucination is possible, yet still be unable to determine whether a harmful statement arose from a prompt change, a retrieval problem, a model update, a missing source, or a reviewer who over-trusted fluent output. Those are different governance problems with different interventions. RAIDT handles them better than a generic AI governance approach because it links the risk to run-level evidence rather than leaving it at the level of policy aspiration.

In practice, this means RAIDT does not simply say that GenAI can fail. It asks what kind of failure occurred, in which run, under which configuration, using which evidence, and with what implications for responsibility and remediation. That makes the framework more useful for supervision, audit, and organisational learning.

Practical example in RAIDT terms

Consider a healthcare trust using a GenAI assistant to draft discharge summaries from clinician notes. In one run, the system generates a medication instruction that sounds authoritative but was not present in the underlying notes. The immediate output problem looks like hallucination, but the run-level governance issue is broader: the retrieval layer pulled an outdated note, the prompt template had recently changed, the model version was updated without prominent notice, and the reviewing clinician assumed the answer reflected source-grounded synthesis.

In RAIDT terms, the evidence needed would include the exact prompt template, task description, model and tool versions, retrieved documents, timestamps, user identity or role, reviewer sign-off, and any exception or escalation notes. Responsibility is affected because approval authority must be clear. Auditability and Traceability are affected because reviewers need to reconstruct what sources were available and what configuration produced the draft. Interpretability is affected because the rationale for the generated text must be understandable enough for checking. Dependability is affected because similar runs should not produce materially inconsistent outputs without explanation.

This example shows how failure modes motivate governance readiness. RAIDT improves the situation not by assuming such failures can be eliminated, but by ensuring that the run can be inspected, questioned, and improved. That is a stronger governance response than a generic rule such as ?a human must review the output?.

Detailed link to RAIDT

GenAI failure modes links to RAIDT in four ways.

First, it clarifies the core governance problem RAIDT is trying to solve: organisational use of generative AI is risky not only because outputs can be wrong, but because failures are often difficult to evidence, explain, and contest.
Second, it justifies RAIDT's decision to treat the run as the unit of governance, since failure modes usually emerge through a particular combination of task, context, prompt, model, toolchain, and human interaction.
Third, it gives practical purpose to the evidence pack and score profile, because these outputs make failure-relevant evidence visible and show which governance pillars are weak in a given run.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning by enabling reviewers to reconstruct what happened and decide what should change.

GenAI failure modes ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

This chain is central to RAIDT. Failure modes identify what can go wrong; run-level evidence records what actually happened; the evidence pack organises that record for inspection; the score profile evaluates governance quality across the five pillars; and governance readiness improves because the organisation can review, challenge, and learn from concrete cases rather than abstract concerns.

Link to the five RAIDT pillars

Responsibility

Failure modes matter for Responsibility because they reveal where role clarity and answerability may collapse. If an output is misleading, overconfident, or unsupported, an organisation needs to know who initiated the run, who approved its use, and who was expected to challenge it.