S5.06 - Scoring_anchors

S5.06 ? Scoring anchors

flowchart LR
    A[Traditional scoring problem
Vague labels and impressionistic judgement] --> B[RAIDT
Run-level evidence framework]
    B --> C[[S5.06 Scoring anchors
Explicit meaning of 1, 3, and 5]]
    C --> D[Evidence pack interpretation]
    C --> E[Five-pillar score profile]
    C --> F[Governance move
Evidence over assertion]
    D --> G[Reviewer reconstruction]
    E --> H[Governance readiness]
    E --> I[Organisational learning]
    J[Healthcare, finance, education,
public services, enterprise productivity] --> C

? Star S5 - RAIDT Pillars and Scoring

Star context: Shows how RAIDT converts qualitative governance judgement into a repeatable evidence-based scoring practice across the five pillars.

Academic picture

Definition / background

Scoring anchors are explicit descriptors attached to points on a scoring scale so that reviewers know what each score means in practice. In RAIDT, they define what counts as weak, partial, and strong run-level governance evidence, typically making the meanings of 1, 3, and 5 especially clear, while 2 and 4 represent intermediate positions. Their purpose is not to create an illusion of mathematical precision. Their purpose is to make judgement legible, repeatable, and open to challenge.

Conceptually, scoring anchors sit between a rubric and a decision. A rubric identifies what should be assessed; an anchor explains the evidential standard needed for a given score. This distinction matters in generative AI governance because many organisations can name desirable principles, yet still struggle to decide whether a specific use of a model is poorly governed, basically governed, or audit-ready. Anchors turn those broad principles into operational review points.

Within RAIDT, scoring anchors belong to the architecture of run-level assessment. RAIDT treats the run as the unit of governance, so the score must refer to evidence from a specific configured use of a GenAI system for a specific task, at a specific time, in a specific context. The anchor therefore does not ask whether the system is good in general; it asks what the available evidence for this run justifies saying. That is why scoring anchors connect directly to the run-level evidence pack and to the five-pillar score profile.

Scoring anchors also differ from benchmarks, performance metrics, or legal thresholds. A benchmark may compare outputs; a metric may quantify behaviour; a legal threshold may define compliance conditions. An anchor, by contrast, explains how evidence quality should be interpreted for governance scoring. In RAIDT, this makes anchors essential for moving from assertion to evidence-based judgement.

Why this concept matters

Scoring anchors solve a central governance problem: numbers can look authoritative even when the judgement behind them is vague. Without anchors, a score profile can become impressionistic, reviewer-dependent, and difficult to defend. One assessor may treat missing documentation as a minor weakness, while another may see it as a major governance failure. The resulting scores are then unstable, hard to compare, and of limited value for organisational learning.

Anchors reduce that ambiguity by making the evidential meaning of a score explicit. They help reviewers explain why a run received a low, basic, or strong score, and they help organisations identify what must improve to move from one level to another. This is especially important for GenAI deployments, where model behaviour, prompts, data context, and oversight arrangements can vary significantly across runs.

If scoring anchors are absent, organisations risk reporting governance maturity without having a defensible basis for the claim. That weakens reviewability, limits contestability, and makes audits harder because the rationale for the score cannot be reconstructed. In RAIDT, anchors are therefore a practical control against arbitrary scoring.

Key idea: Scoring anchors matter because they give RAIDT scores a shared evidential meaning, making governance judgements explainable rather than impressionistic.

What this item measures

The evidential standard required for a score of 1, 3, or 5 within a RAIDT pillar.
The difference between absent evidence, partial evidence, and strong or audit-ready evidence.
The consistency with which reviewers can map run-level documentation to a score.
The minimum governance conditions needed to justify moving from a weak score to a stronger one.
The extent to which a score profile can be defended, challenged, and repeated across reviewers or over time.
The visibility of evidence gaps that remain even when a run appears operationally successful.

Practical example / likely audience question

Audience question

Are scoring anchors just subjective labels attached to numbers?

Answer

The concern behind this question is reasonable: many scoring systems present neat numbers while concealing a subjective judgement underneath. RAIDT addresses that problem by making the judgement criteria visible. A scoring anchor is not merely a label such as low, medium, or high. It is an explicit statement of what kind of evidence must exist before a reviewer can justify assigning a given score.

For example, imagine a GenAI system used to draft internal policy summaries. A reviewer assessing Auditability might assign a score of 1 if there is no preserved prompt, no model version record, and no usable log of what happened during the run. The same reviewer might assign a 3 if the prompt and timestamp are stored but the review trail is incomplete. A 5 would require a strong, reconstructable record that allows a later reviewer to understand what was asked, what was produced, who reviewed it, and what decision followed.

This is where RAIDT is stronger than generic AI governance language. A generic approach may say that documentation should be adequate. RAIDT asks what evidence is actually present for this run and what score that evidence warrants. The anchor therefore turns a broad expectation into a reviewable judgement.

Practical example in RAIDT terms

Consider a healthcare setting in which a generative AI tool drafts discharge-summary text for clinicians. One run concerns a patient discharged from a respiratory ward on a particular day, using a specified prompt template, model version, and clinical review workflow.

The run-level governance issue is not simply whether the drafting tool is useful. The issue is whether the organisation can justify the governance score for that specific run. Relevant evidence would include the prompt template used, the model and version, the input sources consulted, the generated draft, the clinician's review and amendment record, the named accountable role, and the final approval decision. If these records are missing, the run should not receive a strong score merely because the output looked acceptable.

Scoring anchors improve governance readiness by clarifying what each score means. For Responsibility, a score of 1 might mean there is no accountable reviewer on record; a 3 might mean a reviewer is named but escalation rules are unclear; a 5 might mean the reviewer, approval action, and escalation path are all documented. For Auditability and Traceability, similar anchor logic tells the organisation whether it can reconstruct the run later. The result is a score profile grounded in evidence rather than confidence.

Detailed link to RAIDT

Scoring anchors link to RAIDT in four ways.

First, they operationalise RAIDT's core idea that governance claims should be based on evidence rather than principle alone.
Second, they attach judgement to the run, meaning the score reflects a specific use of a GenAI system in a specific context rather than a general opinion about the tool.
Third, they make the evidence pack interpretable by showing how documented artefacts translate into a five-pillar score profile.
Fourth, they support reviewability, contestability, audit readiness, and organisational learning because the rationale for the score can be reconstructed and improved.

Scoring anchors -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness

In this sense, scoring anchors are the interpretive bridge between collected evidence and an actionable governance judgement.

Link to the five RAIDT pillars

Responsibility

Scoring anchors clarify what responsible oversight looks like at different strength levels. They help distinguish between nominal accountability and demonstrable accountability.