S2.10 - GenAI_failure_modes

S2.10 ? GenAI failure modes

flowchart LR
    A[Problem landscape:
hallucination, overconfidence,
instability, opacity, drift,
missing provenance, over-reliance] --> B[RAIDT:
run-level evidence framework] B --> C[[GenAI failure modes:
governance problem context
made operational at run level]] H[Applied settings:
healthcare, public services,
finance, education, enterprise work] --> C C --> D[Run-level evidence pack] C --> E[RAIDT score profile] C --> F[Reviewer reconstruction
and contestability] C --> G[Governance readiness,
organisational learning,
policy alignment] D --> G E --> G

? Star S2 - Governance Meaning and Problem Context

Star context: Positions GenAI failure modes as the practical problem context that makes governance necessary: RAIDT treats them not as abstract model weaknesses, but as operational risks requiring oversight, control, accountability, reviewability and continuous improvement.


Academic picture
Definition / background

GenAI failure modes are the recurring ways in which a generative AI system can produce unsafe, misleading, unreliable, unreviewable, or poorly governed outcomes in practice. They include obvious output problems such as hallucination and overconfidence, but also governance-relevant failures such as unstable behaviour across similar runs, source opacity, missing provenance, hidden configuration drift, and user over-reliance. The concept therefore extends beyond simple model error. It concerns the conditions under which organisational users cannot adequately justify, reconstruct, challenge, or control what the system has done.

Conceptually, this item sits at the intersection of AI safety, reliability engineering, human factors, information governance, and accountability studies. A failure mode is not identical to a software defect, because the same system may appear acceptable in one context and problematic in another. Nor is it reducible to a single bad answer. In governance terms, a failure mode is a recurrent pattern of breakdown that creates risk for decision quality, assurance, responsibility allocation, and institutional legitimacy.

This matters especially for generative AI because many organisational failures are not fully visible in the output alone. A polished answer may conceal missing sources, outdated retrieved material, prompt changes, undocumented model updates, or a level of confidence that the evidence does not support. For that reason, RAIDT treats failure modes as governance objects rather than merely technical anomalies. The framework asks what happened in a specific run, under what configuration, using which inputs, with what traceability, and with what implications for oversight and review.

Within RAIDT, this item belongs in the problem-context star because it explains why a run-level evidence framework is needed at all. Failure modes motivate the move from principles and assurances to documented evidence packs and structured score profiles. A run-level evidence pack can capture prompts, model versions, tools, retrieved sources, timestamps, reviewer actions, and exceptions. The five-pillar profile then shows how these failure modes affect Responsibility, Auditability, Interpretability, Dependability, and Traceability. In that sense, GenAI failure modes are not peripheral to RAIDT; they are one of the main reasons the framework exists.

Why this concept matters

Without a clear account of GenAI failure modes, governance discussions become vague. Organisations may talk about responsible AI in general terms while missing the concrete ways in which a seemingly useful system can mislead staff, weaken accountability, or create untraceable decisions. Naming the failure modes prevents governance from collapsing into broad principle statements with no operational grip.

This concept also avoids a common confusion between performance evaluation and governance readiness. A model can achieve superficially acceptable results while still being poorly governed because its runs cannot be reconstructed, its sources cannot be checked, its configuration changes are undocumented, or users become over-reliant on plausible outputs. RAIDT matters here because it makes these otherwise hidden risks visible at the level where they actually occur: the run.

For organisations, this means failure modes become a basis for control design, evidence capture, reviewer training, escalation rules, and continuous improvement. Rather than asking only whether the model is good in general, RAIDT asks whether this particular use was governable, reviewable, and contestable in context. That shift is critical for operational governance.

Key idea: GenAI failure modes matter because they turn abstract AI risk into specific run-level governance questions that RAIDT can evidence, review, and improve.

What this item captures
Practical example / likely audience question

Audience question

If GenAI failure modes are already widely known, why does RAIDT need to make them so explicit?

Answer

The underlying concern behind this question is that failure modes may seem too obvious or generic to deserve a dedicated place in a governance framework. People often assume that once risks such as hallucination or over-reliance are recognised, standard policy language or basic human review is enough. The difficulty is that recognition alone does not tell an organisation how to inspect, reconstruct, or contest a specific problematic use of a GenAI system.

RAIDT makes failure modes explicit because generic awareness does not create operational control. For example, an organisation may know that hallucination is possible, yet still be unable to determine whether a harmful statement arose from a prompt change, a retrieval problem, a model update, a missing source, or a reviewer who over-trusted fluent output. Those are different governance problems with different interventions. RAIDT handles them better than a generic AI governance approach because it links the risk to run-level evidence rather than leaving it at the level of policy aspiration.

In practice, this means RAIDT does not simply say that GenAI can fail. It asks what kind of failure occurred, in which run, under which configuration, using which evidence, and with what implications for responsibility and remediation. That makes the framework more useful for supervision, audit, and organisational learning.

Practical example in RAIDT terms

Consider a healthcare trust using a GenAI assistant to draft discharge summaries from clinician notes. In one run, the system generates a medication instruction that sounds authoritative but was not present in the underlying notes. The immediate output problem looks like hallucination, but the run-level governance issue is broader: the retrieval layer pulled an outdated note, the prompt template had recently changed, the model version was updated without prominent notice, and the reviewing clinician assumed the answer reflected source-grounded synthesis.

In RAIDT terms, the evidence needed would include the exact prompt template, task description, model and tool versions, retrieved documents, timestamps, user identity or role, reviewer sign-off, and any exception or escalation notes. Responsibility is affected because approval authority must be clear. Auditability and Traceability are affected because reviewers need to reconstruct what sources were available and what configuration produced the draft. Interpretability is affected because the rationale for the generated text must be understandable enough for checking. Dependability is affected because similar runs should not produce materially inconsistent outputs without explanation.

This example shows how failure modes motivate governance readiness. RAIDT improves the situation not by assuming such failures can be eliminated, but by ensuring that the run can be inspected, questioned, and improved. That is a stronger governance response than a generic rule such as ?a human must review the output?.

Detailed link to RAIDT

GenAI failure modes links to RAIDT in four ways.

First, it clarifies the core governance problem RAIDT is trying to solve: organisational use of generative AI is risky not only because outputs can be wrong, but because failures are often difficult to evidence, explain, and contest.
Second, it justifies RAIDT's decision to treat the run as the unit of governance, since failure modes usually emerge through a particular combination of task, context, prompt, model, toolchain, and human interaction.
Third, it gives practical purpose to the evidence pack and score profile, because these outputs make failure-relevant evidence visible and show which governance pillars are weak in a given run.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning by enabling reviewers to reconstruct what happened and decide what should change.

GenAI failure modes ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

This chain is central to RAIDT. Failure modes identify what can go wrong; run-level evidence records what actually happened; the evidence pack organises that record for inspection; the score profile evaluates governance quality across the five pillars; and governance readiness improves because the organisation can review, challenge, and learn from concrete cases rather than abstract concerns.

Link to the five RAIDT pillars

Responsibility

Failure modes matter for Responsibility because they reveal where role clarity and answerability may collapse. If an output is misleading, overconfident, or unsupported, an organisation needs to know who initiated the run, who approved its use, and who was expected to challenge it.

Example evidence / implication:

Auditability

Failure modes strongly affect Auditability because a problematic output cannot be meaningfully audited without access to the configuration, inputs, retrieved materials, and review history associated with the run.

Example evidence / implication:

Interpretability

Failure modes affect Interpretability when outputs appear persuasive without making their basis understandable. Hallucination and source opacity are especially problematic here because users may not be able to distinguish grounded synthesis from unsupported generation.

Example evidence / implication:

Dependability

Failure modes are central to Dependability because instability, drift, and overconfidence undermine the reliability of organisational use over time. A dependable system should not behave materially differently across comparable runs without traceable explanation.

Example evidence / implication:

Traceability

Failure modes strongly affect Traceability because governance breaks down when the path from output back to source material, configuration, and review action is incomplete or missing.

Example evidence / implication:

This item touches all five pillars, but it is especially influential for Auditability, Dependability, and Traceability because many GenAI failures become serious governance issues precisely when they cannot be reconstructed or checked.

Why this item is more than a generic concept

In general AI governance, GenAI failure modes may simply mean a list of recognised risks associated with large language models or related systems. In RAIDT, the term has a more operational meaning. It refers to failure patterns that must be evidenced at the level of a specific run so that governance claims can be tested rather than assumed.

That makes the RAIDT meaning more practical. Instead of saying only that hallucination or over-reliance is possible, RAIDT asks what evidence would show whether that risk materialised, how it was reviewed, which controls were present, and what should be improved. The concept is therefore not just descriptive. It is tied directly to evidence design, score interpretation, and governance readiness.

Common misunderstanding

Misunderstanding

If a human is in the loop, GenAI failure modes are no longer a serious governance issue.

Correction

Human involvement reduces some risks but does not remove them. A reviewer can be rushed, may over-trust fluent output, may not have access to the underlying sources, or may be unable to see that the configuration changed between runs. For example, a caseworker using a GenAI draft may approve an inaccurate summary because it sounds coherent and arrives in a familiar format. The governance problem is therefore not solved merely by inserting a human step; it is solved only when the run is sufficiently evidenced and structured for meaningful review.

Boundary and limitation

This item does not provide a complete taxonomy of every possible GenAI failure, nor does it prove that all failures can be detected in advance. It also does not replace domain-specific evaluation, risk assessment, assurance testing, or professional judgment. Some failures remain subtle, emergent, or visible only after downstream use.

RAIDT addresses this limitation by treating failure modes as a practical governance lens rather than a claim of total prediction or control. The value of the item is that it helps organisations notice what kinds of evidence are needed, where governance may be weak, and how runs can be reviewed after the fact. It improves readiness and learning, but it does not guarantee correctness or eliminate uncertainty.

Implementation levels

Manual implementation

A researcher or small team can apply this item manually by defining a checklist of likely failure modes and reviewing each important run against that list. They can record prompts, outputs, sources, reviewer comments, and any concerns about instability, opacity, or over-reliance in a structured evidence note.

Semi-automated implementation

Semi-automated implementation can use templates, metadata fields, structured review forms, and standardised evidence-pack components. These supports make it easier to flag missing provenance, prompt changes, absent reviewer notes, or inconsistent handling across comparable runs.

Fully automated implementation

At scale, a platform or orchestration layer can automatically log model versions, prompt templates, retrieved documents, timestamps, user roles, run identifiers, and review events. A dashboard or governance pipeline can then detect drift, compare runs, highlight recurring failure patterns, and generate evidence-pack elements and pillar-level scoring signals for governance review.

Practical use in the RAIDT project

This item helps position the whole RAIDT project. In Paper 08 Foundations, it supports the argument that governance must move from principle-level discussion to run-level evidence because GenAI failures are often situational, opaque, and difficult to reconstruct without structured records. In Paper 09 Empirical Validation, it provides a concrete basis for evaluating whether RAIDT actually improves visibility, reviewability, and consistency around recognised failure risks. In Paper 10 Policy Pathways, it helps explain why policy language should require evidential governance practices rather than broad statements of safe or responsible use.

It is also useful in sector playbooks because failure modes can be translated into practical review prompts for healthcare, public services, education, law, cybersecurity, supply chain, and enterprise productivity contexts. For the evidence pack and scoring rubric, this item clarifies what kinds of data and judgement criteria matter. For influence methods and governance interventions, it helps show decision-makers why run logging, reviewer guidance, escalation routes, and reconstruction capability are not administrative extras but core governance controls.

In supervision, viva defence, and journal positioning, this note gives a concise way to justify the relevance of RAIDT. It explains that RAIDT is not solving an abstract ethics problem; it is responding to identifiable organisational failure modes that require a more reviewable and operational form of governance.

Key audience questions to prepare for

Q1. Are GenAI failure modes mainly a technical issue or a governance issue?

They are both, but RAIDT focuses on the governance dimension: how failures are evidenced, reviewed, attributed, challenged, and improved in organisational use.

Q2. Why is hallucination not just an accuracy problem?

Because a hallucinated output can also expose weak provenance, poor reviewability, unclear accountability, and inadequate controls over how the run was configured and checked.

Q3. Why does run-level evidence matter for failure modes?

Because many failures cannot be understood from the final output alone. The surrounding run context explains whether the issue arose from prompts, sources, configuration, drift, or human interaction.

Q4. Does RAIDT prevent GenAI failure modes from happening?

Not entirely. RAIDT improves governance readiness by making failures easier to detect, reconstruct, contest, and learn from, but it does not eliminate the underlying possibility of failure.

Q5. Why is over-reliance included alongside technical failures?

Because organisational harm can arise not only from what the model produces, but from how humans interpret, trust, and act on that production within real workflows.

Suggested citation concepts to support this item
Short explanation for presentation

GenAI failure modes are the recurring ways generative AI use can break down in organisational settings, including hallucination, overconfidence, unstable outputs, opaque sourcing, hidden drift, missing provenance, and human over-reliance. In RAIDT, these are not treated as abstract warnings. They are the concrete governance problems that justify collecting run-level evidence. By treating the run as the unit of governance, RAIDT makes it possible to inspect what happened, reconstruct the conditions of use, and assess the quality of oversight across Responsibility, Auditability, Interpretability, Dependability, and Traceability. This matters because a plausible output may still be poorly governed. The contribution of RAIDT is therefore to turn known GenAI risks into reviewable evidence, practical scoring, and stronger organisational readiness.

One-line takeaway

GenAI failure modes are the recurrent ways generative AI use breaks down in practice because RAIDT turns those risks into run-level evidence for governance, review, and improvement.

Related items in governance meaning and problem context
Anchored questions
Powered by Forestry.md