C0.05 - Score_profile
C0.05 ? Score profile
flowchart LR
A[Traditional scoring limits
single score, generic rating, hidden trade-offs] --> B[RAIDT
run-level evidence framework]
H[Practical fields and domains
prompts, settings, review notes,
healthcare, finance, education, public services] --> C[[Score profile
five-pillar assessment of one run]]
B --> C
D[Run-level evidence and evidence pack] --> C
C --> E[Reviewer reconstruction
and contestability]
C --> F[Governance readiness
and organisational learning]
C --> G[Targeted improvement across
Responsibility, Auditability,
Interpretability, Dependability, Traceability]? Star C0 - RAIDT Core, Definition, Values, Claims and Innovation
Star context: Defines the project identity of RAIDT by showing that responsible governance of GenAI in organisational work should be expressed as a structured, run-level profile across five pillars rather than as a single simplified score or a vague assurance claim.
Definition / background
The RAIDT score profile is the structured assessment of one run across the framework's five pillars: Responsibility, Auditability, Interpretability, Dependability, and Traceability. It is designed to show how governance performance is distributed across different dimensions of a concrete GenAI use event, rather than pretending that one composite figure can adequately represent the run as a whole.
Conceptually, the score profile sits downstream of run-level evidence and the evidence pack. A run produces evidence; the evidence pack organises that material for review; the score profile interprets the evidential position of the run across the five pillars. This means the profile is not merely a dashboard output or management summary. It is an evidence-linked judgement structure that makes governance characteristics visible in a disciplined and reviewable way.
This matters because governance in generative AI is rarely one-dimensional. A run can be well documented but poorly explained, dependable in routine use but weakly contestable, or clearly assigned in responsibility while still missing enough trace to support later audit. A profile preserves those distinctions. It therefore differs from generic risk scoring, maturity models, or procurement ratings that reduce varied governance conditions to a single headline assessment.
Within RAIDT, the score profile belongs in the core architecture because it operationalises the framework's claim that governance should move from principles and assertions toward evidence, reviewability, and organisational learning. It provides a practical bridge between evidential capture and governance readiness. Without the profile, evidence remains descriptive; with it, evidence becomes structured for comparative scrutiny, explanation, and intervention.
Why this concept matters
The score profile solves a recurring governance problem: organisations often need a concise assessment of a GenAI run, but oversimplified scoring can hide exactly the trade-offs that matter most. If one number is used, important weaknesses may be masked by strengths elsewhere. The profile avoids that distortion by making the pattern of performance visible.
It also prevents confusion between evidence collection and governance interpretation. An evidence pack can contain rich documentation, but decision-makers still need a way to understand what that evidence implies. The score profile provides that interpretive layer without detaching judgement from the underlying run record. It gives supervisors, auditors, policy designers, and practitioners a structured way to ask where the run is strong, where it is weak, and what should improve next.
If this concept is missing, organisations risk superficial assurance, poorly targeted interventions, and a false sense of control. A run may appear acceptable overall while containing a serious deficiency in one pillar that is critical for the use context. RAIDT uses the score profile to move beyond generic governance rhetoric and towards operational visibility of trade-offs, weaknesses, and readiness.
Key idea: The score profile matters because RAIDT needs a way to express the governance quality of one run across multiple pillars without hiding important evidential trade-offs inside a single number.
What this item measures
- The degree to which one run demonstrates Responsibility through accountable roles, checks, and decision ownership.
- The extent to which the run is Auditability-ready, meaning another reviewer can inspect and reconstruct what happened.
- The practical level of Interpretability available for understanding how the output emerged and how it was reviewed.
- The Dependability of the run in terms of stable, usable, and trustworthy performance within its task context.
- The Traceability of the run across inputs, settings, actors, timestamps, outputs, and downstream actions.
- The distribution of governance strengths and weaknesses across the five pillars rather than a flattened overall impression.
- The evidential basis for judging whether the run contributes positively to wider governance readiness.
Practical example / likely audience question
Audience question
Why does RAIDT use a profile instead of giving each run one overall governance score?
Answer
The concern behind this question is usually a desire for simplicity. Managers, reviewers, and supervisors often want a single headline result because it seems easier to compare and communicate. The direct answer, however, is that one overall score can be misleading when governance quality is uneven across dimensions that matter differently in practice.
For example, a university might use GenAI to draft formative feedback for students. One run may score strongly on Traceability because the prompt, source material, timestamps, and reviewer notes are all preserved. It may also score reasonably on Responsibility because an academic is clearly assigned to approve the feedback. Yet the same run may score weakly on Interpretability if the reasoning behind key feedback statements cannot be clearly reconstructed, and weakly on Dependability if similar prompts yield inconsistent outcomes across comparable assignments. A single score could blur those weaknesses and imply a level of assurance that the evidence does not justify.
RAIDT handles this better than a generic AI governance approach because it does not treat governance as a single abstract property. It treats governance as a profile of evidence-backed conditions at the level of the run. That allows a reviewer not only to judge the run, but also to identify what kind of improvement is needed and why.
Practical example in RAIDT terms
Consider a finance setting in which a GenAI assistant is used to draft a quarterly internal risk summary for senior management. The run-level issue is not only whether the draft is useful, but whether its governance condition is visible in a way that supports review before the summary influences management discussion or downstream reporting.
The evidence needed includes the task brief, the prompt or template used, any source reports supplied to the model, the system version and settings, the generated summary, analyst edits, reviewer comments, approval records, and a note of any unsupported or contested statements. Responsibility is affected because the organisation must show who owned review and sign-off. Auditability is affected because another reviewer should be able to reconstruct how the draft was produced. Interpretability is affected because the team must understand how claims in the summary connect to the source material and instructions. Dependability is affected because the drafting process should be sufficiently stable and accurate for repeated reporting cycles. Traceability is affected because the run must be linked to the relevant inputs, timings, actors, and final approved artefact.
The score profile improves governance readiness here because it shows whether the run is uneven across pillars. A run might be well traced and well assigned but still weakly dependable if the model inserts unsupported emphasis, or weakly interpretable if reviewers cannot easily explain why key phrasing appeared. That visibility supports targeted control improvement rather than generic reassurance.
Detailed link to RAIDT
Score profile links to RAIDT in four ways.
First, it gives evaluative form to RAIDT's core idea that responsible GenAI governance should be evidenced at the level of the individual run.
Second, it depends on the run and on run-level evidence, because the profile can only be justified if one concrete use event has been sufficiently captured and reviewed.
Third, it sits alongside the evidence pack as one of RAIDT's two practical outputs: the evidence pack organises the proof, and the score profile expresses the governance condition revealed by that proof.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning by making the structure of strengths and weaknesses visible rather than implicit.
Score profile ? Run-level evidence ? Evidence pack ? Governance readiness
Link to the five RAIDT pillars
Responsibility
The score profile measures Responsibility by showing whether accountability, authority, and review duties were clearly allocated in the run and supported by evidence rather than assumption.
Example evidence / implication:
- Named user, reviewer, approver, or accountable role attached to the run.
- Clear indication of whether escalation, sign-off, or human override was required and completed.
Auditability
This item has a particularly strong connection to Auditability because the profile should reflect whether the run can be independently examined and reconstructed after the event.
Example evidence / implication:
- Prompt, inputs, outputs, timestamps, and review notes are preserved in sufficient detail for later scrutiny.
- The scoring judgement itself can be explained and defended against the contents of the evidence pack.
Interpretability
The score profile measures Interpretability by indicating whether reviewers can make sense of how the output emerged and how human judgement was applied around it.
Example evidence / implication:
- Prompt logic, task framing, and review rationale are documented clearly enough to support explanation.
- The run can be discussed in meaningful terms without relying entirely on opaque model behaviour.
Dependability
The score profile measures Dependability by showing whether the run performed in a sufficiently stable, reliable, and practically trustworthy way for its task context.
Example evidence / implication:
- The output met expected quality thresholds and did not require disproportionate correction.
- Similar runs under similar conditions would be expected to produce acceptably consistent results.
Traceability
The score profile measures Traceability by showing whether the run is properly linked to its inputs, settings, actors, timing, outputs, and downstream use.
Example evidence / implication:
- A reviewer can follow the chain from source material to generated draft to reviewed final outcome.
- Artefacts and metadata are connected tightly enough to support audit, challenge, and incident review.
The score profile affects all five pillars directly because it is the framework's structured expression of how a run stands across them. Its distinct value is that it keeps the pillars separable while still presenting them together.
Why this item is more than a generic concept
In general AI governance, a score profile might mean a dashboard view, a maturity heat map, or a set of broad ratings attached to a system, department, or policy area. In RAIDT, it has a narrower and more operational meaning. It is the evidence-linked, five-pillar assessment of one run.
The RAIDT meaning is more operational because it is tied to run-level evidence, the evidence pack, reviewer reconstruction, and governance readiness. The profile is therefore not just a communication device. It is a structured governance judgement that depends on evidential quality and that can be challenged, refined, and compared across concrete use events.
Common misunderstanding
Misunderstanding
The RAIDT score profile is simply a prettier way of presenting one overall governance score.
Correction
The profile is not a visual wrapper around a single hidden number. Its point is precisely to resist that reduction. For example, a public-service drafting run may be strong on Traceability because every input and output is preserved, yet weak on Interpretability because reviewers cannot clearly explain how the system derived certain phrasing from the source material. Treating those conditions as one overall score would conceal the specific weakness that needs governance attention. In RAIDT, the profile preserves these distinctions so that decisions, remediation, and comparison remain evidence-sensitive.
Boundary and limitation
The score profile does not prove that a run is universally acceptable, lawful, fair, or safe in every sense. It does not replace the evidence pack, and it does not remove the need for broader controls such as procurement review, legal compliance work, model evaluation, staff training, or domain-specific oversight. It is a structured representation of governance condition, not a substitute for all governance work.
The profile also depends on the quality of the evidence beneath it. If the evidence pack is thin, inconsistent, or poorly curated, the profile may project a false sense of precision. There is also a design risk that organisations may treat the profile mechanically and ignore contextual interpretation. RAIDT handles these limitations by keeping the score profile tied to run-level evidence, by emphasising reviewability and contestability, and by treating the profile as a tool for informed judgement rather than automated certainty.
Implementation levels
Manual implementation
A researcher or small team can implement the score profile manually by reviewing each significant run against the five RAIDT pillars and recording a short justification for each pillar score. This can be done in a structured template alongside the evidence pack.
Semi-automated implementation
Semi-automated implementation can use scoring rubrics, metadata prompts, review forms, and dashboard templates to help reviewers enter consistent pillar judgements while still allowing narrative explanation and contextual notes.
Fully automated implementation
At scale, a governance platform, wrapper, orchestration layer, or evidence service can assemble run artefacts automatically, pre-populate pillar indicators from captured metadata, route the run for human review, and generate score-profile views across teams, functions, or domains. Even here, the strongest model is usually automation-assisted judgement rather than fully unexamined scoring.
Practical use in the RAIDT project
Within the RAIDT project, the score profile is especially important for Paper 08 Foundations because it clarifies how RAIDT turns evidence into a governance judgement structure without collapsing the framework into generic compliance language. It also matters for Paper 09 Empirical Validation because one empirical question is whether different reviewers can apply the profile consistently and whether the profile surfaces meaningful distinctions across real runs.
For Paper 10 Policy Pathways, the score profile provides a way to communicate governance condition to organisations and policymakers without losing the run-level logic of the framework. It is likely to be central to sector playbooks, scoring rubrics, and evidence-pack workflows because it gives practitioners a concise but non-reductive view of governance quality. It also supports influence methods and governance interventions by identifying which pillar needs attention, rather than merely stating that governance is weak in general.
For supervision meetings, viva defence, and journal positioning, this item helps answer a crucial question: how does RAIDT move from evidence capture to evaluative insight? The answer is that the score profile is the structured, five-pillar expression of what the evidence shows about one run.
Key audience questions to prepare for
Q1. Why is a profile better than a composite score?
Because governance weaknesses are often unevenly distributed. A composite score may hide a serious deficiency in one pillar behind strengths in others, whereas a profile keeps those distinctions visible.
Q2. Does the score profile make RAIDT too subjective?
It introduces judgement, but not uncontrolled judgement. RAIDT constrains scoring through run-level evidence, structured pillars, reviewable justifications, and the possibility of contesting the assessment against the evidence pack.
Q3. Can two reviewers disagree on the same score profile?
Yes, and that is not necessarily a flaw. RAIDT is designed for reviewability and contestability, so disagreement can surface ambiguities in evidence, rubric design, or contextual interpretation that should be improved.
Q4. Is the score profile useful outside highly regulated sectors?
Yes. It is particularly valuable in high-accountability settings, but any organisation using GenAI can benefit from understanding whether its runs are well governed across multiple dimensions rather than relying on broad assurance claims.
Q5. What makes the RAIDT score profile distinctive?
Its distinctiveness lies in being run-level, evidence-linked, five-pillar, and governance-oriented. It is not a generic performance metric or enterprise maturity score; it is a structured view of how one concrete GenAI use event stands in governance terms.
Suggested citation concepts to support this item
- multi-criteria governance assessment for generative AI
- AI assurance profiles versus composite scores
- evidence-based scoring in sociotechnical governance
- interpretability, auditability, and traceability in AI oversight
- run-level governance metrics for human-AI workflows
- accountable AI evaluation in organisational settings
- scorecards and profiles in AI risk governance
- operationalising AI governance through evidence packs
- contestable assessment frameworks for AI assurance
- governance readiness indicators for generative AI adoption
Short explanation for presentation
The RAIDT score profile is the framework's structured assessment of one GenAI run across five pillars: Responsibility, Auditability, Interpretability, Dependability, and Traceability. Its purpose is to show the pattern of governance strengths and weaknesses in a way that remains tied to evidence. This matters because a single score can hide important trade-offs. A run may be highly traceable but still difficult to interpret, or responsibly assigned yet operationally unreliable. RAIDT therefore uses a profile rather than a flattened metric. The profile sits on top of run-level evidence and the evidence pack, and it helps supervisors, reviewers, and organisations see what kind of governance condition the run actually demonstrates. In that sense, it is one of the mechanisms that turns RAIDT from a conceptual framework into a practical method for review, comparison, and governance improvement.
One-line takeaway
Score profile is the five-pillar, evidence-linked assessment of one GenAI run because RAIDT needs governance trade-offs to remain visible at run level.