S9.08 - Assurance

S9.08 ? Assurance

flowchart LR
    A1[Policy claims and AI principles]
    A2[Supplier declarations and model cards]
    A3[Traditional limitation:
confidence without run-level evidence]
    A1 --> B[RAIDT
run-level evidence framework]
    A2 --> B
    A3 --> B

    B --> C[[Assurance
reviewable confidence for a specific run]]
    C --> D[Run-level evidence pack]
    C --> E[RAIDT score profile]
    C --> F[Reviewer reconstruction]
    C --> G[Audit readiness and contestability]
    D --> G
    E --> G
    F --> H[Organisational learning]
    G --> H

    I1[Healthcare]
    I2[Finance]
    I3[Public services]
    I4[Education]
    I5[Templates, logging, review workflow]
    I1 --> C
    I2 --> C
    I3 --> C
    I4 --> C
    I5 --> C

? Star S9 - Policy, Standards and Assurance

Star context: Connects RAIDT to policy instruments, standards, assurance, procurement, audit and organisational accountability. In this star, assurance is the mechanism that turns governance claims into reviewable organisational practice rather than leaving them as policy statements.

Academic picture

Definition / background

Assurance is the structured process of making claims about the responsible use of AI reviewable through evidence, monitoring, controls, and documented reasoning. In governance language, assurance asks whether an organisation can justify its claim that a system is being used in an acceptable, accountable, and properly governed manner. In the context of generative AI, this is especially important because outputs are variable, context-sensitive, and shaped by prompts, settings, user behaviour, retrieved information, and downstream decisions.

Conceptually, assurance sits close to audit, compliance, oversight, and risk management, but it is not identical to any of them. Audit is typically a formal review activity; compliance checks alignment against requirements; risk management identifies and treats risks. Assurance is broader and more connective: it is the condition in which stakeholders can have warranted confidence because relevant evidence, controls, and explanations are available for inspection. That distinction matters because organisations often mistake possession of policies for possession of assurance.

Within RAIDT, assurance becomes run-specific. RAIDT treats a run as one configured use of a GenAI system for a specific task, at a specific time, in a specific context. This matters because many governance failures arise not from the existence of a model in general, but from the way a particular use was configured, prompted, reviewed, and acted upon. RAIDT therefore gives assurance an operational unit. Instead of asking only whether the organisation has a responsible AI framework, it asks whether a given run can be justified with evidence.

This is why assurance belongs centrally inside RAIDT. The run-level evidence pack provides the materials needed for assurance, and the five-pillar score profile offers a structured summary of governance quality across Responsibility, Auditability, Interpretability, Dependability, and Traceability. Assurance is therefore not an extra layer added after the fact. It is one of the reasons RAIDT exists: to move from principle-level claims to evidence-backed, reviewable governance of actual organisational GenAI use.

Why this concept matters

Assurance matters because organisations increasingly make strong claims about their GenAI systems while lacking a disciplined way to show what those claims mean in practice. A policy may say that human oversight exists, that outputs are checked, or that sensitive uses are controlled, yet none of that is persuasive if a reviewer cannot inspect what happened in a concrete case. Without assurance, governance remains rhetorical.

The concept also prevents a common confusion between possession of governance artefacts and possession of governance capability. A supplier questionnaire, a model card, or a responsible AI principle statement may all be useful, but they do not by themselves show whether a specific organisational use was appropriately governed. Assurance fills that gap by demanding evidence that a concrete run can be reconstructed, assessed, and challenged.

If assurance is missing, several risks appear. Organisations may overstate compliance, rely on undocumented human review, fail to notice drift in practice, and struggle to respond when a decision is contested or harm is alleged. In such settings, the governance problem is not only technical failure; it is the absence of an evidential basis for accountability.

For GenAI-using organisations, assurance is therefore the bridge between policy intent and operational governance. RAIDT makes that bridge practical by specifying what evidence should exist at run level, how it can be assembled into an evidence pack, and how it can be interpreted through a consistent score profile.

Key idea: Assurance matters because RAIDT turns claims that GenAI is responsibly governed into claims that can be inspected, challenged, and defended at run level.

What this item enables

It enables organisations to justify governance claims with run-specific evidence rather than broad policy assertions.
It enables reviewers to reconstruct how a GenAI-supported task was configured, executed, checked, and acted upon.
It enables procurement promises and policy commitments to be tested against actual operational practice.
It enables contestability by giving auditors, managers, regulators, or affected stakeholders something concrete to inspect.
It enables longitudinal learning across repeated runs by showing where controls are working and where governance quality is weak.
It enables RAIDT score profiles to function as assurance summaries rather than isolated numerical labels.

Practical example / likely audience question

Audience question

How does RAIDT make assurance stronger than simply having an AI policy, a supplier declaration, or a model card?

Answer

The concern behind this question is that many organisations already possess governance documents, so RAIDT might appear to duplicate existing assurance work. The direct answer is that RAIDT strengthens assurance by shifting attention from generic claims about the system to inspectable evidence about specific organisational runs.

A policy can state that staff must review outputs before use. A supplier declaration can state that a model has been evaluated. A model card can describe intended use and known limitations. All of those are useful, but none of them proves that, in a given organisational case, the prompt was appropriate, the context was understood, the output was checked, the user acted within policy, and the resulting action can later be defended. RAIDT addresses exactly that gap.

For example, an auditor reviewing a contested GenAI-assisted decision does not mainly need a slogan that the organisation follows responsible AI principles. The auditor needs the run evidence: the task definition, prompt or workflow configuration, model version, human review step, exception notes, output record, and any score or rationale attached to that run. RAIDT handles this better than a generic governance approach because it makes assurance operational at the point where real decisions and risks materialise.

Practical example in RAIDT terms

Consider a public-service housing team using a GenAI assistant to draft case summaries and proposed response letters for residents requesting emergency accommodation. The organisational claim is that the tool is used safely, with human review, and in line with public-sector accountability requirements.

The run-level issue is that each case differs. A summary generated for one resident may omit a vulnerability, misstate urgency, or frame the case in a way that influences the caseworker's judgement. Assurance cannot rest on the general statement that the tool is approved for use. It has to rest on whether this run, for this case, under these settings, was conducted with adequate controls.

The evidence needed would include the task description, prompt template, any retrieved case information, applicable policy constraints, the generated draft, reviewer edits, approval checkpoint, and a record of whether the output informed communication or decision support. The most affected RAIDT pillars would be Responsibility, Auditability, and Traceability, with strong implications for Dependability and Interpretability as well.

In governance-readiness terms, assurance improves because a supervisor or auditor can inspect the run pack and determine whether the organisation merely used GenAI, or used it in a way that can be justified under scrutiny. That is the practical difference between abstract responsible-AI language and RAIDT-style evidence-led oversight.

Detailed link to RAIDT

Assurance links to RAIDT in four ways.

First, it connects directly to RAIDT's core idea that governance should be grounded in evidence about actual organisational use rather than abstract principles alone.
Second, it depends on the run as the unit of analysis, because assurance becomes meaningful only when a reviewer can examine a specific configured use in a specific context.
Third, it is operationalised through the run-level evidence pack and summarised through the RAIDT score profile, which together provide both detail and structured judgement.
Fourth, it supports reviewability, contestability, audit readiness, and organisational learning by making claims about GenAI use open to reconstruction and challenge.

Assurance ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

In this chain, assurance is the governance purpose, run-level evidence is the raw material, the evidence pack is the review object, the score profile is the structured summary, and governance readiness is the institutional outcome.

Link to the five RAIDT pillars

Responsibility

Assurance supports Responsibility by showing whether duties, approvals, review roles, and decision ownership are actually defined and enacted for a run.