S11.02 - Limitations

S11.02 ? Limitations

flowchart LR
    A[Traditional overclaims
Guarantee correctness
Score equals compliance
Evidence removes risk] --> B[RAIDT
Run-level evidence framework] B --> C[[Limitations
Bounded claims about what RAIDT can and cannot establish]] C --> D[Run-level evidence pack] C --> E[RAIDT score profile] C --> F[Reviewer reconstruction] D --> G[Reviewability and contestability] E --> H[Governance readiness] F --> I[Organisational learning] J[Healthcare drafting] K[Finance compliance support] L[Public-sector casework] M[Prompts outputs timestamps reviewer notes approvals] J --> C K --> C L --> C M --> C

? Star S11 - Boundaries, Limitations and Future Questions

Star context: Clarifies the disciplined boundary of RAIDT by showing that the framework improves governance readiness around specific GenAI runs, but does not guarantee truth, remove domain risk, certify legality, or replace expert judgement.


Academic picture
Definition / background

In RAIDT, limitations are the explicit constraints on what the framework can validly claim, infer, or assure. They specify that RAIDT cannot guarantee factual correctness, remove domain-specific risk, provide legal certification, or replace expert professional judgement. This is not a weakness in the casual sense; it is a disciplined statement about the scope of a governance framework whose purpose is to improve evidence, reviewability, and accountability around real uses of generative AI.

Conceptually, limitations differ from failures, errors, and boundary conditions. A failure is something that goes wrong in practice. An error is a wrong output, judgement, or process step. A boundary condition defines the context within which a framework can sensibly operate. Limitations, by contrast, describe what RAIDT cannot deliver even when it is working as intended. They therefore protect the framework from overclaiming and protect users, reviewers, and supervisors from misunderstanding the meaning of RAIDT outputs.

This matters in GenAI governance because evidence and scoring can easily be misread as proof of correctness. RAIDT produces a run-level evidence pack and a five-pillar score profile across Responsibility, Auditability, Interpretability, Dependability, and Traceability. Those outputs can show whether a run is better documented, more reviewable, or more governance-ready than another. They do not prove that the generated content is true, fair, lawful, clinically safe, or fit for deployment in all contexts.

Limitations therefore belong centrally within RAIDT. They frame how run-level evidence should be interpreted, how evidence packs should be used, and how score profiles should be discussed in supervision, organisational review, and academic argument. Without an explicit limitations item, RAIDT could be misunderstood as a guarantee system rather than as an evidence-based governance framework.

Why this concept matters

This concept matters because governance frameworks fail intellectually and practically when they promise certainty that they cannot deliver. In GenAI contexts, organisations often want a method that reduces ambiguity, especially in high-stakes domains. RAIDT helps by making individual runs more reviewable and contestable, but it must also make clear that better governance evidence is not the same thing as guaranteed correctness.

The concept also avoids a serious confusion between governance readiness and outcome validity. A run may be well documented, properly reviewed, and appropriately scored, yet still contain a substantive mistake that only a domain expert can detect. Conversely, a technically strong output can emerge from a poorly governed run. The limitations item helps supervisors, practitioners, and reviewers keep those dimensions analytically separate.

If this item is missing, organisations may treat a strong RAIDT profile as a proxy for truth, legal sufficiency, or professional adequacy. That creates a false sense of assurance and can lead to misuse of scores, weak escalation practice, and poor communication with decision-makers. By stating its own limitations, RAIDT moves governance from vague confidence to disciplined operational realism.

Key idea: Limitations matter because RAIDT improves the quality of governance evidence, not the certainty of the underlying world that the evidence describes.

What this item explains
Practical example / likely audience question

Audience question

If RAIDT captures rich evidence and gives a five-pillar score, why can it not guarantee that the output is correct or compliant?

Answer

The concern behind this question is that structured evidence and scoring can look like a seal of approval. The direct answer is that RAIDT evaluates governance readiness around a specific run, not the ultimate truth-value or legal validity of the generated content. A well-evidenced run can still produce a wrong answer, and a well-scored process can still require human correction, escalation, or rejection.

For example, a university might use GenAI to draft guidance for a student visa query. RAIDT can capture the prompt, policy sources consulted, tool version, reviewer comments, edits made, and approval decision. It can therefore show whether the institution used the tool in a reviewable and accountable way. What RAIDT cannot do is certify that the guidance is legally correct under immigration law. That judgement still depends on expert review, current policy interpretation, and, where necessary, formal legal advice.

RAIDT handles this better than a generic AI governance approach because it makes the boundary explicit at run level. Rather than implying that governance artefacts automatically settle correctness or compliance questions, RAIDT shows precisely what evidence exists, how the run was governed, and where human expertise must still intervene.

Practical example in RAIDT terms

Consider a healthcare setting in which a clinician uses a GenAI assistant to draft discharge instructions after a hospital visit. The use case is efficient and operationally plausible, but the run-level issue is that the drafted instructions may still contain a clinically misleading statement, an omitted warning, or phrasing inappropriate for the patient's condition.

The evidence needed for RAIDT would include the task purpose, prompt template, relevant de-identified source notes, model or tool version, timestamp, generated draft, clinician edits, final approved discharge text, and notes on whether review or escalation occurred. Responsibility is affected because a named clinician or team must remain accountable for approving the communication. Auditability and Traceability are affected because the run must be reconstructable after the event. Interpretability is affected because reviewers need to understand how the draft arose from the available instructions and inputs. Dependability is affected because repeated safe performance cannot be assumed simply because documentation exists.

The limitations item improves governance readiness here by forcing the organisation to say something precise: RAIDT can show whether the discharge-instruction run was well governed and well evidenced, but it cannot itself certify that the final clinical advice was medically correct. That distinction is exactly what makes the governance claim credible.

Detailed link to RAIDT

Limitations links to RAIDT in four ways.

First, it protects the core RAIDT idea from being overstated by making clear that the framework is about evidence-based governance of GenAI use, not about producing certainty from uncertainty.

Second, it ties directly to the run because limitations apply to what can be inferred from one governed use event, even when that event is well documented.

Third, it disciplines interpretation of the evidence pack and the RAIDT score profile by clarifying that these outputs support judgement, reconstruction, and comparison rather than guaranteeing substantive validity.

Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning because reviewers can assess both what the evidence shows and what remains outside the framework's evidential reach.

Limitations ? bounded run-level claims ? evidence pack ? RAIDT score profile ? governance readiness without overclaiming

Link to the five RAIDT pillars

Responsibility

Limitations strongly affect Responsibility because the item makes clear that human and organisational accountability cannot be delegated to the framework itself. RAIDT can support responsible oversight, but it cannot become the accountable actor.

Example evidence / implication:

Auditability

This item has a strong connection to Auditability because audits depend on understanding what a framework can demonstrate and what it cannot. Explicit limitations prevent auditors from treating a score as a substitute for substantive review.

Example evidence / implication:

Interpretability

Limitations support Interpretability by preventing false expectations about explainability. RAIDT can improve practical interpretability of a run by documenting prompts, context, and reviewer reasoning, but that does not mean the system becomes fully transparent or fully intelligible in all respects.

Example evidence / implication:

Dependability

This item matters for Dependability because dependable governance is not the same as infallible performance. Limitations keep dependability claims realistic and tied to observed process quality rather than absolute output reliability.

Example evidence / implication:

Traceability

Limitations also affect Traceability because traceable evidence can show what happened, but traceability alone does not settle whether what happened was normatively or substantively right. The framework must preserve that distinction.

Example evidence / implication:

Limitations shape all five pillars indirectly, but they are especially important for Responsibility and Auditability because those pillars are easily overstated when governance outputs are mistaken for final guarantees.

Why this item is more than a generic concept

In general AI governance, limitations are often treated as a brief caveat section or a defensive disclaimer. In RAIDT, the concept is more operational. It specifies the bounded meaning of run-level evidence, evidence packs, and score profiles, and therefore governs how the framework should be used in evaluation, assurance, and organisational decision-making.

The RAIDT meaning is more practical because it is tied to what can be reconstructed from a specific run. It does not simply say, in abstract terms, that no framework is perfect. It says precisely that even a well-evidenced run cannot convert governance quality into guaranteed factual, legal, or professional correctness. That makes the concept central to responsible deployment, not peripheral commentary.

Common misunderstanding

Misunderstanding

If a RAIDT run receives a strong score profile, the output is effectively validated and can be treated as correct or compliant.

Correction

A strong score profile means that the run appears well governed against RAIDT's criteria, not that the substantive content is automatically right. For example, a finance team may use GenAI to draft an anti-money-laundering case summary. The run may have excellent traceability, clear reviewer sign-off, and strong auditability. Even so, the summary could still omit an important contextual fact or reflect an incorrect legal interpretation. RAIDT helps the organisation see how the run was governed and whether the process was reviewable; it does not eliminate the need for specialist financial-crime judgement.

Boundary and limitation

This item does not solve the limitations it names; it makes them explicit and manageable. RAIDT does not prove factual truth, legal compliance, fairness, safety, or domain adequacy in a final sense. It also does not replace model evaluation, procurement controls, policy design, staff training, or expert review. Its role is narrower and more defensible: to improve the evidence available for reviewing one concrete GenAI run.

The item may fail if organisations interpret limitation language as merely rhetorical and then continue to use RAIDT scores as decision substitutes. It may also fail if the framework is applied in settings where no meaningful human review, escalation, or contextual judgement is possible. RAIDT handles this by pairing limitation statements with proportional evidence capture, explicit reviewer roles, and a clear separation between governance evidence and substantive domain authority.

Implementation levels

Manual implementation

A researcher or small team can implement this item manually by adding a standard limitation statement to RAIDT notes, evidence packs, and scoring discussions. In practice, this means recording what the framework helped to evidence in a run and what still required external judgement, policy interpretation, or expert validation.

Semi-automated implementation

Semi-automated implementation can embed limitations into templates, scoring rubrics, and reviewer forms. For example, an evidence-pack template might include mandatory fields such as "What this run evidence can support" and "What remains outside RAIDT's evidential claim" so that bounded interpretation becomes routine rather than optional.

Fully automated implementation

At scale, a governance platform, wrapper, or dashboard can automatically display limitation notices alongside run summaries, scores, approvals, and alerts. A mature pipeline could require reviewers to confirm that high-stakes outputs still need human or domain-specific validation before release, thereby preventing score profiles from being operationally misused as automatic approvals.

Practical use in the RAIDT project

Within the RAIDT project, this item is important for Paper 08 Foundations because it helps define the scope conditions of the framework's theoretical claim. It shows that RAIDT is not a universal truth engine but a run-level governance method grounded in evidence, reviewability, and bounded inference.

It is equally important for Paper 09 Empirical Validation because empirical studies must distinguish between improvement in governance readiness and improvement in objective task correctness. That distinction is methodologically important when interpreting pilot results, scoring consistency, reviewer agreement, and sector-specific outcomes.

For Paper 10 Policy Pathways and related sector playbooks, the limitations item helps position RAIDT credibly for policy audiences. It supports careful language around assurance, organisational learning, and oversight without implying formal certification powers that the framework does not have. In viva defence and journal positioning, this item is useful because it demonstrates conceptual maturity: RAIDT has a defined contribution, but also a clearly stated boundary.

Key audience questions to prepare for

Q1. Does admitting limitations make RAIDT look weak?

No. It makes RAIDT more defensible. A framework becomes weaker when its claims outrun its evidence. Explicit limitations show conceptual discipline and improve credibility with supervisors, reviewers, and organisational stakeholders.

Q2. If RAIDT cannot guarantee correctness, what is its main value?

Its value is governance readiness. RAIDT helps organisations reconstruct runs, justify decisions, compare practices, learn from failures, and show evidence of responsible oversight.

Q3. Can a high RAIDT score still coexist with a bad output?

Yes. A run can be well governed yet still produce a flawed result. That is precisely why RAIDT must be combined with domain review, escalation practice, and appropriate human judgement.

Because those frameworks often operate at policy, vendor, or system level. RAIDT adds a missing layer by examining one concrete use event and the evidence surrounding it.

Q5. How should this be explained to organisational decision-makers?

Explain that RAIDT improves the quality of governance evidence around GenAI use. It helps an organisation show what happened, who reviewed it, and how ready the run is for scrutiny, but it does not substitute for expert approval in high-stakes contexts.

Suggested citation concepts to support this item
Short explanation for presentation

Limitations in RAIDT are the explicit boundaries on what the framework can claim. RAIDT can improve governance around a specific GenAI run by capturing evidence, supporting reviewer reconstruction, and producing a five-pillar score profile. What it cannot do is guarantee that an output is factually correct, legally compliant, clinically safe, or professionally adequate in every case. That distinction matters because governance quality and substantive correctness are not the same thing. A well-governed run may still need expert correction, and a high score should never be treated as automatic approval. By stating its limitations clearly, RAIDT becomes more credible for supervision, policy, and organisational adoption. It presents itself as an evidence-based governance framework, not as a magical certainty machine.

One-line takeaway

Limitations is the item that defines what RAIDT cannot guarantee, because RAIDT governs evidence and reviewability at run level rather than replacing domain truth or expert judgement.

Related items in boundaries, limitations and future questions
Mentioned in reference-paper summaries (5)

Paper summaries live in Port/93-References/pdf_summaries/. Each file listed below contains the key term at least once.

Anchored questions
Powered by Forestry.md