S10.03 - 20_scenarios_per_domain

S10.03 — 20 scenarios per domain

flowchart LR
    A[Problem:
too few examples, selective demos,
weak domain coverage] --> B[RAIDT
run-level evidence framework]
    H[Domain playbooks:
healthcare, finance, law,
education, cyber, supply chain] --> C[[20 scenarios per domain
structured scenario portfolio]]
    B --> C
    C --> D[Run-level evidence packs]
    C --> E[RAIDT score profiles]
    C --> I[Comparison across configurations
and repeated runs]
    D --> F[Reviewer reconstruction
and contestability]
    E --> G[Governance readiness]
    I --> G

← Star S10 - Empirical Programme, Domains and Sector Playbooks

Star context: Shows how RAIDT is tested across realistic organisational settings by using a sufficiently broad but still manageable portfolio of domain-specific scenarios. The item sits inside the empirical programme because it connects abstract governance claims to repeated, comparable run-level evidence within sector playbooks.

Academic picture

Definition / background

In RAIDT, 20 scenarios per domain refers to the use of a structured set of twenty realistic task situations within each domain playbook so that GenAI use can be examined under varied but comparable conditions. A scenario is not simply a prompt. It is a designed governance test case that specifies a task context, a likely user purpose, relevant source conditions, and the kinds of failure modes or assurance questions that reviewers need to surface.

The conceptual importance of this item is that it gives the empirical programme a repeatable unit of domain testing. If RAIDT were applied to only one or two examples per sector, the resulting evidence would be too anecdotal to support strong claims about governance readiness. By contrast, a portfolio of twenty scenarios offers breadth across routine, borderline, and more demanding cases while remaining manageable for practical evaluation.

This item also differs from a generic benchmark. A benchmark often aims to compare model performance against a fixed answer key. RAIDT uses scenarios for governance-oriented examination: whether the run can be reconstructed, whether provenance is visible, whether human oversight is meaningful, whether outputs remain dependable across repetitions, and whether the evidence produced is strong enough to support review and contestation.

Within RAIDT, this matters because each scenario generates runs, each run can produce a run-level evidence pack, and patterns across those runs inform the five-pillar score profile. The twenty-scenario structure therefore belongs inside the empirical programme and sector playbooks: it is the mechanism that turns domain variation into comparable evidence rather than leaving governance claims at the level of theory.

Why this concept matters

This concept solves a practical methodological problem for GenAI governance. Organisations often want evidence that a framework works across domains, but they either test too narrowly or test too loosely. Too narrow a test gives false confidence; too loose a test makes comparison impossible. Twenty scenarios per domain provides a middle path: enough variety to expose meaningful differences, but enough structure to support systematic review.

It also avoids confusion between domain adaptation and ad hoc improvisation. Without a scenario set, a sector playbook can become a collection of examples chosen for convenience or rhetorical effect. With a defined scenario portfolio, RAIDT can show that the same governance framework is being challenged by a coherent range of tasks within healthcare, finance, law, education, cybersecurity, and other fields.

If this item is missing, the empirical programme risks becoming selective, fragile, and difficult to defend in supervision, peer review, or organisational scrutiny. A small number of examples cannot credibly support claims about repeatability, failure modes, or readiness for deployment. RAIDT therefore uses scenario portfolios to move from principles and isolated demonstrations towards structured operational evidence.

Key idea: Twenty scenarios per domain matter because RAIDT needs a domain-sensitive but comparable way to produce enough run-level evidence for credible governance assessment.

What this item enables

A standardised portfolio of realistic task situations within each domain playbook.
Coverage of routine, ambiguous, edge-case, and risk-sensitive GenAI uses rather than a single showcase example.
Comparable testing across domains, configurations, and repeated runs.
More robust evidence packs because multiple runs can be compared within the same domain logic.
Better-founded RAIDT score profiles because pillar judgements can be based on patterns rather than on isolated impressions.
Stronger reviewer challenge, contestability, and organisational learning about failure modes.
Clearer translation from sector playbooks into empirical validation and policy argument.

Practical example / likely audience question

Audience question

Why do you need as many as twenty scenarios per domain rather than one or two representative examples?

Answer

The concern behind this question is whether the design is unnecessarily heavy. The direct answer is that one or two examples may illustrate a concept, but they do not provide enough variation to test governance performance across the kinds of tasks, ambiguities, and failure modes that organisations actually face. A domain such as healthcare or finance contains routine tasks, borderline cases, information-quality problems, conflicting source conditions, and different expectations of oversight. A small sample can easily flatter the framework.

In RAIDT, the purpose of the scenario set is not volume for its own sake. The purpose is structured coverage. Twenty scenarios give enough spread to observe whether governance evidence remains reviewable when the task changes, when provenance becomes uncertain, when human judgement is needed, or when the system becomes overconfident. This is especially important because RAIDT is concerned with run-level evidence, not just abstract compliance statements.

A practical example is a finance playbook in which some scenarios involve straightforward drafting from clean source data, while others involve missing documentation, ambiguous risk signals, or pressure to summarise complex evidence quickly. Generic AI governance might note that the organisation has a policy and an approval process. RAIDT goes further by asking whether those controls still produce reconstructable, scoreable evidence across a wider scenario portfolio. That is why the twenty-scenario design is methodologically defensible rather than arbitrary.

Practical example in RAIDT terms

Consider a finance domain playbook for a team using GenAI to assist with suspicious activity report preparation. One scenario may involve a straightforward case with complete source material. Another may introduce partial provenance, inconsistent transaction notes, and a prompt that could tempt the model to present conclusions too confidently. Across a portfolio of twenty scenarios, the organisation can observe how the same workflow behaves under varied but realistic pressure.

The run-level issue is not merely whether the model can generate plausible text. It is whether each run leaves sufficient evidence to show what data were used, how the prompt framed the task, what uncertainty was present, what the model produced, how the analyst edited or rejected the draft, and whether escalation occurred when required. The evidence needed includes prompt text, source-document references, model and configuration details, draft outputs, reviewer annotations, approval records, and reasons for any override or non-use.

This scenario portfolio affects all five RAIDT pillars. Responsibility is engaged because roles and approval thresholds must remain clear across cases. Auditability and Traceability are strengthened when reviewers can reconstruct each scenario run. Interpretability is tested by whether analysts can explain why an output was accepted or challenged. Dependability is tested most strongly, because the point of multiple scenarios is to see whether governance quality holds across changing task conditions. Governance readiness improves when the organisation can show not one successful demonstration, but a body of structured evidence across realistic finance scenarios.

Detailed link to RAIDT

Twenty scenarios per domain links to RAIDT in four ways.

First, it operationalises the RAIDT core idea that GenAI governance should be examined through real use situations rather than through abstract principle statements alone.

Second, it creates a structured portfolio of runs, because each scenario becomes a context in which run-level evidence can be generated, compared, and challenged.

Third, it strengthens the evidence pack and the score profile by ensuring that pillar judgements are informed by patterned evidence across a domain rather than by a single anecdotal case.

Fourth, it supports reviewability, contestability, audit readiness, and organisational learning because reviewers can inspect where controls succeed, where they weaken, and how governance interventions should be refined.

20 scenarios per domain → Structured run portfolio → Evidence packs and RAIDT score profiles → Governance readiness

Link to the five RAIDT pillars

Responsibility

This item supports Responsibility by making domain playbooks explicit about who is expected to act, review, escalate, or approve across different kinds of task situations rather than only in ideal cases.