S5.08 - High_score

S5.08 ? High score

flowchart LR
    A[Governance problem:
good output mistaken for good governance
weak reconstruction
assertion-heavy assurance] --> B[RAIDT
run-level evidence framework]
    B --> C[[High score
evidence sufficiency for justified review and use]]
    C --> D[Strong evidence pack]
    C --> E[Credible five-pillar score profile]
    C --> F[Reviewability and contestability]
    D --> G[Reviewer reconstruction]
    E --> H[Governance readiness]
    F --> H
    I[Healthcare drafting]
    J[Finance reporting]
    K[Education support]
    L[Public service casework]
    M[Enterprise knowledge work]
    I --> C
    J --> C
    K --> C
    L --> C
    M --> C

? Star S5 - RAIDT Pillars and Scoring

Star context: Places RAIDT's five pillars into a usable scoring logic so governance readiness can be judged from evidence rather than claimed in the abstract.

Academic picture

Definition / background

In RAIDT, a high score means that the available evidence for a particular run is sufficiently complete, coherent, and reviewable to support reconstruction, scrutiny, and justified use within the stated risk context. It signals that the run has been documented well enough for others to understand what was done, why it was done, what evidence supports it, and how the resulting judgement was reached.

Conceptually, this differs from treating scoring as a crude maturity label or a generic performance measure. A high RAIDT score is not primarily about whether the model produced an impressive answer, nor is it a blanket claim that the system is trustworthy in every setting. Instead, it is an evidence-based judgement that the run can be examined responsibly across the five RAIDT pillars: Responsibility, Auditability, Interpretability, Dependability, and Traceability.

This matters in generative AI governance because organisations often confuse good outputs with good governance. A polished output may still have poor provenance, weak accountability, limited traceability, or no usable audit trail. RAIDT corrects that confusion by tying the meaning of a high score to run-level evidence and to the evidence pack produced around a specific use instance.

Within RAIDT, the concept belongs directly to the scoring layer. The evidence pack provides the material basis for evaluation, the score profile converts that material into a structured governance judgement, and a high score indicates that the evidence is strong enough to support informed oversight. The term therefore sits at the intersection of operational documentation and governance decision-making.

Why this concept matters

A high score matters because organisations need a disciplined way to distinguish between runs that are merely successful-looking and runs that are genuinely governable. Without that distinction, governance becomes vulnerable to optimism bias, selective reporting, and post hoc justification.

The concept also avoids a common practical confusion: people often assume that if a generative AI system appears useful, then its governance is already adequate. RAIDT rejects that assumption. A run should score highly only when the underlying evidence allows an internal reviewer, supervisor, auditor, or external stakeholder to inspect and understand the basis of use.

If this concept is missing, organisations risk making deployment or assurance decisions on thin evidence. That creates problems for reviewability, incident analysis, policy alignment, and organisational learning. By contrast, when a high score is defined rigorously, RAIDT helps move governance from broad principles to operational judgement anchored in evidence.

Key idea: A high score matters because it indicates evidence-based governance readiness for a specific run, not merely confidence in the output or enthusiasm about the system.

What this item measures

The sufficiency of run-level evidence for reconstruction and review.
The degree to which a run can support justified use in its stated context.
The practical quality of the evidence pack behind the score profile.
The credibility of claims made across the five RAIDT pillars.
The extent to which governance judgement is based on documented evidence rather than assertion.
The readiness of the run for scrutiny, contestation, and audit-oriented examination.

Practical example / likely audience question

Audience question

Does a high RAIDT score mean the organisation has proved that the GenAI system is compliant, safe, and acceptable to use?

Answer

The concern behind that question is understandable because high scores are often misread as seals of approval. The direct answer is no. A high RAIDT score does not certify compliance, prove safety in every sense, or eliminate the need for legal, domain, or managerial judgement. What it does show is that the organisation has assembled strong enough run-level evidence to justify its governance position for that specific use case and context.

For example, a team may use a large language model to draft internal policy summaries. If the run has a clear task definition, recorded prompt and model configuration, versioned inputs, reviewer notes, output checks, escalation rules, and an auditable rationale for acceptance, RAIDT may judge the run highly. That means the run is well governed and well evidenced. It does not mean every future use of the same model is automatically acceptable, nor does it mean the organisation has satisfied all external regulatory requirements.

RAIDT handles this issue better than a generic AI governance approach because it makes the basis of the claim inspectable. Rather than relying on broad statements such as "we have responsible AI controls", RAIDT asks whether the specific run can actually be reconstructed and defended using evidence. That is a stronger and more operational standard.

Practical example in RAIDT terms

Consider a healthcare administration team using a generative AI tool to draft discharge-summary letters from structured clinician notes. The run-level issue is not only whether the letter sounds coherent, but whether the organisation can show how that draft was produced, what data were used, which prompts and settings applied, who checked the output, and what safeguards governed acceptance.

To achieve a high score, the evidence pack would need to include the task definition, approved workflow, source-note provenance, prompt template, model and version details, reviewer identity, checking criteria, records of corrections, and the final decision rationale. Responsibility is affected because a named reviewer must own acceptance. Auditability is affected because the process must be reconstructable. Interpretability is affected because reviewers need to explain why the draft was accepted or amended. Dependability is affected because the workflow must perform consistently. Traceability is affected because inputs, outputs, decisions, and hand-offs must be linked.

In that context, a high score improves governance readiness by showing that the use of generative AI is not an opaque drafting event but a documented and reviewable clinical-administrative process. The score does not replace clinical judgement, but it demonstrates that the governance apparatus around the run is substantially in place.

Detailed link to RAIDT

High score links to RAIDT in four ways.

First, it connects directly to RAIDT's core idea that governance should be anchored in evidence about a specific run rather than general claims about systems or policies.
Second, it depends on the run as the unit of assessment, because the score is only meaningful when attached to a defined task, time, configuration, and context.
Third, it translates the contents of the evidence pack into a structured score profile across the five pillars, making governance judgements more transparent and comparable.
Fourth, it supports reviewability, contestability, audit readiness, and organisational learning by showing when evidence is strong enough to support scrutiny.

High score ? Sufficient run-level evidence ? Strong evidence pack ? Credible RAIDT score profile ? Governance readiness

Link to the five RAIDT pillars

Responsibility

A high score requires evidence that responsibility for the run was assigned and exercised rather than assumed implicitly.