RAIDT Pillars and Scoring

flowchart LR
    A[Responsible AI concerns] --> B[RAIDT framework]
    I[Organisational GenAI runs] --> C[Run-level evidence]
    B --> C
    C --> D[Star S5: pillars and scoring]
    D --> E[Five-pillar profile]
    D --> F[Evidence-based judgement]
    E --> G[Targeted interventions]
    F --> H[Audit and policy use]
    H --> J[Sector playbooks]

Circle 2 - Operational governance mechanism

Ring: Operational governance star

Function

Defines the five RAIDT pillars ? Responsibility, Auditability, Interpretability, Dependability, and Traceability ? and the scoring logic used to assess whether a specific GenAI run is governable, reviewable, and suitable for organisational use. This star translates RAIDT from a conceptual governance framework into an operational assessment method grounded in run-level evidence.

Role in the project

This star sits at the centre of RAIDT?s operational layer. It explains how broad claims about responsible AI become inspectable judgements about a particular run. In project terms, S5 links conceptual foundations to practical evidence, scoring, and intervention design. It therefore contributes to theory-building in Paper 08, measurement and validation in Paper 09, and policy translation in Paper 10. It also provides a common grammar for sector playbooks, because each playbook needs a way to judge whether a run is merely useful or genuinely governable.

Main questions answered by this star
Workshop discussion prompts
Items in this star (13)
Main message

Responsible AI discourse often begins with values such as fairness, accountability, transparency, safety, and human oversight. These are necessary starting points, but they remain too abstract when an organisation faces a concrete challenge: a specific GenAI output has influenced work, and someone now asks whether that output can be justified, reconstructed, challenged, or corrected. This problem becomes sharper under conditions of managerial uncertainty. Organisational users rarely interact with a model in the abstract; they interact with a run. A run is one configured use of a GenAI system for a defined task, at a defined time, in a defined context. It includes the instruction or prompt, model and tool configuration, retrieved context where used, output, and the human or automated checks around it. RAIDT therefore treats the run, rather than the model alone, as the unit of governance.

Star S5 explains how RAIDT evaluates that unit of governance. The five pillars provide an operational grammar for judging whether a run is responsibly governed. Responsibility asks whether the run had a legitimate purpose, an authorised user, appropriate constraints, and a clear duty of care. Auditability asks whether the run can be reconstructed later from evidence rather than memory. Interpretability asks whether the relevant human can understand what the output means, what assumptions it relies on, and what uncertainty remains. Dependability asks whether the configuration behaves with sufficient stability and safety for the intended task. Traceability asks whether the output can be linked back to the prompts, sources, tools, versions, and decision steps that shaped it. Together, the pillars convert responsible-AI aspirations into reviewable evidence questions.

This matters because governance failure in GenAI is often not simply a failure of output quality. An output may appear fluent, persuasive, and even factually plausible while still being poorly governed. For example, a manager may receive a well-written policy summary produced through RAG, yet the organisation may be unable to show which documents were retrieved, which model version was used, whether the prompt template had been approved, or whether a human checked the output before use. In that case, the problem is not only epistemic uncertainty; it is governance opacity. RAIDT responds by asking what evidence must exist if another reviewer needs to inspect the run later. The answer is the run-level evidence pack, and S5 explains how that evidence is interpreted through the five pillars.

The pillars were selected because they answer the main questions that recur across organisational uses of GenAI. Responsibility addresses authorisation, purpose, role, policy constraints, oversight, contestability, and escalation. Auditability addresses logging, reconstructability, retention, and reviewability. Interpretability addresses whether the output is usable in a disciplined way by the intended audience, which is especially important where outputs can sound more certain than the underlying evidence warrants. Dependability addresses repeat-run stability, failure behaviour, threshold rules, drift, and safe handling of uncertainty. Traceability addresses provenance, including prompt versions, retrieved sources, document identifiers, model lineage, tool traces, and downstream use. Other concerns such as privacy, fairness, security, inclusion, or sustainability are not excluded; rather, they are surfaced through one or more of the five pillars or through sector-specific scoring adaptations.

Scoring is needed because evidence capture alone does not create a governance judgement. An evidence pack may be long and technically detailed, yet still leave uncertainty about whether the evidence is weak, partial, or audit-ready. RAIDT therefore uses a five-point anchored scoring approach. A score of 1 indicates missing evidence or a critical failure of the pillar intent. A score of 3 indicates partial evidence that may support lower-risk use but remains insufficient for stronger governance claims. A score of 5 indicates evidence strong enough to support reconstruction, challenge, and justified use for the stated task. Crucially, the score is not a proxy for factual truth, legal compliance, or moral virtue. It is a judgement about governance readiness for a specific run.

The insistence on a visible five-pillar profile is one of RAIDT?s most important design choices. A single composite score can help with dashboards or reporting, but it can also hide serious weaknesses. A run may have highly interpretable output because the language is clear and well structured, yet still have weak traceability because the retrieval snapshot was not stored. Another run may be fully auditable and traceable but remain weak on responsibility because no authorised purpose or escalation pathway was recorded. In both cases, an average score would conceal the real governance problem. The profile preserves trade-offs and reveals where targeted intervention is needed.

This logic also makes S5 central to empirical validation. Paper 09 does not merely ask whether one technical configuration performs better than another. It asks whether governance interventions shift evidence quality across the RAIDT pillars. RAG may improve Traceability and Auditability when retrieval snapshots, source identifiers, and corpus versions are preserved; it may fail to do so if source capture is weak. PEFT or LoRA may improve Dependability when adapter lineage and deployment controls are recorded. RLHF-type alignment controls may support Responsibility by improving refusal behaviour or safety tone, but they can still leave audit gaps if policy provenance is not logged. Structured prompting may improve Interpretability and Responsibility by clarifying task constraints, but it does not solve provenance or reconstructability on its own. S5 provides the measurement frame for testing these claims.

The star also matters for policy translation. Frameworks such as the EU AI Act, ISO/IEC 42001, and the NIST AI RMF describe governance expectations at organisational and system levels, but practitioners still need a practical way to show what happened in a specific use episode. RAIDT?s claim is not that it replaces these frameworks. Its claim is that it offers a run-level evidence grammar through which organisational governance, standards alignment, and audit practice can be operationalised. In this sense, S5 is where RAIDT becomes legible to supervisors, auditors, and sector stakeholders: it shows how a project on responsible GenAI governance can move from abstract principles to inspectable evidence, measurable scores, and concrete governance interventions.

The boundaries are equally important. S5 does not claim that a high score proves that an output is correct, safe in every context, or legally compliant. It does not remove the need for domain expertise, human judgement, or wider organisational controls. It does, however, make it much harder for organisations to rely on vague assurances. By forcing the question ?what evidence exists for this run, and what does that evidence justify??, the RAIDT pillars and scoring system strengthen conceptual clarity, empirical evaluation, and governance action across the project.

Key questions and answers

Q1. What is a RAIDT pillar?

Answer:
A RAIDT pillar is a governance dimension used to judge whether a specific GenAI run is reviewable and responsibly managed. Each pillar takes a broad responsible-AI principle and converts it into evidence-based questions that can be answered from the run-level evidence pack. The purpose is not to describe ideals in the abstract, but to determine whether a real organisational use of GenAI can be inspected later with sufficient clarity.

Practical example:
A complaint-handling assistant generates a customer response. If the organisation stores the prompt version, model version, retrieved policy documents, output, and reviewer decision, the run can be assessed across several pillars rather than merely accepted because the text sounds persuasive.

Link to RAIDT:
The pillars are the bridge between the run-level evidence pack and the RAIDT score profile. They turn evidence into a governance judgement.

Q2. Why does RAIDT need five pillars rather than one generic trust score?

Answer:
One generic score would flatten distinct governance questions into a single number. RAIDT separates the pillars because a run can be strong in one area and weak in another. Organisational governance requires visibility into these trade-offs. Responsibility, Auditability, Interpretability, Dependability, and Traceability capture different but complementary dimensions of governability.

Practical example:
A legal drafting assistant may produce clear and well-structured output, giving it stronger Interpretability, but if the underlying prompt template and retrieved source set are not retained, Auditability and Traceability remain weak.

Link to RAIDT:
The five-pillar profile preserves the shape of governance strengths and weaknesses, which is more useful for intervention than a single label.

Q3. Why does RAIDT score the run rather than the model alone?

Answer:
A model may be technically impressive yet still be governed badly in practice. Organisational risk arises when a model is combined with prompts, retrieved context, tool calls, user roles, policy limits, and review practices. Scoring the run recognises that governance quality is produced by this whole configuration, not by model capability alone.

Practical example:
The same large language model may be used safely in one department with approved prompts, citation capture, and reviewer checks, but poorly in another department with no logging and no oversight.

Link to RAIDT:
RAIDT treats the run as the unit of governance, so the evidence pack and score profile are both tied to the specific use episode.

Q4. What does Responsibility assess in practice?

Answer:
Responsibility assesses whether the run was justified, authorised, bounded, and subject to suitable oversight. It asks who initiated the run, for what purpose, under what policy constraints, with what escalation route, and with what limits on reliance. This is where duty of care and contestability enter the framework most directly.

Practical example:
In clinical note drafting, a responsible run record would state that the output is a draft for clinician review, identify the clinician role, record applicable policy constraints, and specify what must be escalated.

Link to RAIDT:
Responsibility ties the run to organisational authority, governance interventions, and acceptable use conditions in the evidence pack.

Q5. How is Auditability different from Traceability?

Answer:
Auditability is about reconstructing and checking what happened; Traceability is about following provenance from output back to inputs, sources, and configuration. They overlap, but they are not identical. A run may be auditable in the sense that logs exist, yet still have weak provenance if source snapshots or tool traces are missing.

Practical example:
A finance assistant may log the final prompt and output, which supports Auditability, but if it does not preserve the retrieved regulations or spreadsheet extracts used during the run, Traceability remains incomplete.

Link to RAIDT:
RAIDT separates these pillars so that missing provenance is not hidden inside general logging claims.

Q6. What counts as good Interpretability in RAIDT?

Answer:
Good Interpretability means that the intended user can understand the output, its assumptions, its limits, and the uncertainty attached to it. RAIDT does not require opening the internal mechanics of a foundation model. It requires practical intelligibility at the point of use so that humans can rely appropriately, question claims, and avoid over-trusting fluent text.

Practical example:
A procurement-risk summary is more interpretable when it separates evidence, uncertainty, recommended action, and confidence warnings rather than presenting one smooth paragraph with no structure.

Link to RAIDT:
Interpretability connects evidence capture to disciplined human use. It supports both scoring and governance interventions such as prompt redesign.

Q7. Why is Dependability especially important for GenAI under uncertainty?

Answer:
GenAI outputs can vary across repeated runs, prompt phrasing, model updates, or retrieval differences. Dependability addresses whether that variability is acceptable for the task, whether failure modes are monitored, and whether uncertainty is handled safely. It recognises that organisational use requires more than one successful demonstration.

Practical example:
A cyber-triage assistant that gives materially different severity recommendations for similar alerts would raise dependability concerns even if some outputs appear plausible.

Link to RAIDT:
Dependability is evidenced through repeat runs, perturbation checks, thresholds, incident logs, and monitoring within the evidence pack.

Q8. Why must the pillar profile remain visible even if a composite score is reported?

Answer:
A visible profile prevents governance trade-offs from being averaged away. Decision-makers need to know where the real weakness lies, because remediation depends on the weak pillar. A composite score may be useful for summary reporting, but it should never replace the underlying governance pattern.

Practical example:
A run might score highly on Responsibility and Interpretability but poorly on Auditability because the configuration record was incomplete. An average would obscure the operational deficiency.

Link to RAIDT:
Profile-first reporting is central to RAIDT scoring and to governance interventions targeted at specific weaknesses.

Q9. How do interventions such as RAG, PEFT/LoRA, or RLHF-type controls affect the score profile?

Answer:
These interventions can improve governance, but only when their evidence is captured. RAG can improve Traceability and Auditability if retrieval queries, source IDs, and snapshots are stored. PEFT or LoRA can improve Dependability if adapter versions and deployment lineage are recorded. RLHF-type controls can support Responsibility or safer behaviour, but only if policy and alignment provenance are documented.

Practical example:
A policy assistant with RAG may cite sources well, but if the system stores only displayed citations and not the retrieved passages or corpus version, the claimed traceability remains weak.

Link to RAIDT:
Paper 09 uses the pillars to evaluate whether technical interventions genuinely improve governance readiness, not only output behaviour.

Q10. How does this star help supervisors understand the RAIDT project?

Answer:
S5 shows where RAIDT becomes methodologically concrete. It demonstrates that the project is not only about ethical language or abstract framework design; it offers a structured way to capture evidence, assign scores, compare configurations, and identify governance interventions. For supervision, this makes the project legible as a coherent research programme spanning theory, empirical testing, and policy translation.

Practical example:
In a supervision meeting, the pillar profile can be used to explain how one empirical case differs from another without drifting into purely technical model comparison.

Link to RAIDT:
This star links Paper 08 foundations, Paper 09 validation, Paper 10 policy pathways, and sector playbooks through one operational scoring logic.

Practical examples
  1. Healthcare discharge summary drafting
    A clinician uses GenAI to draft discharge notes. Responsibility requires a clear statement that the output is a draft, not a diagnosis. Auditability requires run IDs, prompt versions, timestamps, and reviewer notes. Interpretability requires structured wording and uncertainty statements. Dependability requires repeat-run checks on similar cases. Traceability requires links to the clinical source notes and any retrieved guideline excerpts.

  2. Banking customer-explanation assistant
    A bank uses GenAI to explain why a customer application triggered review. A well-governed run must preserve policy constraints, model and prompt versions, decision-support limitations, retrieved policy sources, and reviewer approval. Strong Interpretability matters because the explanation must be understandable to staff and contestable by compliance reviewers.

  3. RAG-enabled policy advice tool in higher education
    A university administrator uses a RAG system to answer questions about student policy. The governance challenge is not only answer quality but whether the retrieved policy documents, corpus version, and timestamped prompt can be preserved. Here, Traceability and Auditability are decisive.

  4. LoRA-adapted internal writing assistant
    An organisation deploys a LoRA-adapted model for internal document drafting. The adapter may improve style consistency, but the run is only strongly governed if adapter lineage, deployment version, prompt template, and human review are recorded. Otherwise, claimed Dependability is difficult to evidence.

Evidence needed / what to capture
Link to RAIDT project
Citation ideas to support this note
Boundaries and limitations
Conclusion

Star S5 is the point in RAIDT where the project becomes operationally measurable. Up to this stage, we can talk about responsible AI, uncertainty, and governance principles, but supervisors need to see how those ideas are turned into something inspectable. The five pillars do that work. Responsibility asks whether the run was justified and properly overseen. Auditability asks whether it can be reconstructed later. Interpretability asks whether the relevant human can understand the output and its limits. Dependability asks whether the configuration behaves stably enough for the task. Traceability asks whether the output can be linked back to prompts, sources, tools, and versions. Scoring matters because evidence capture alone is not enough; we need a disciplined way to judge whether the evidence is missing, partial, or audit-ready. The important point is that RAIDT is not scoring truth in the abstract. It is scoring governance readiness for a specific run. That gives the project a clear methodological contribution for Paper 08, measurable variables for Paper 09, and a practical evidence grammar for Paper 10 and sector playbooks.

Slides
Slide 1 — why this star matters

Purpose:
Frame Star S5 as the point where RAIDT becomes measurable and operational.

Key message:
S5 turns responsible-AI principles into a practical method for judging whether a specific GenAI run is governable.

Slide content:

  • RAIDT governs the run, not only the model
  • S5 defines the five pillars and scoring logic
  • The question is evidence, not abstract trust
  • This star links theory to operational practice

Speaker note:
Explain that S5 is important because organisations do not govern AI in the abstract; they govern specific uses. This slide should establish that RAIDT?s novelty lies in making a run inspectable through evidence and then assessable through a disciplined score profile.

Visual idea:
A simple flow from responsible-AI values to run-level evidence to five-pillar profile.

Link to RAIDT:
Introduces the core operational mechanism used across the RAIDT project.

Citation support to mention if asked:
Responsible AI operationalisation; Information Systems governance.

Slide 2 — from principles to run-level evidence

Purpose:
Show why broad governance language is insufficient without evidence tied to a specific run.

Key message:
RAIDT answers the question: what evidence must exist if a GenAI output later needs to be reviewed or challenged?

Slide content:

  • Broad values are necessary but not inspectable on their own
  • A run includes prompt, model, tools, context, output, and checks
  • The evidence pack records what happened in that run
  • S5 interprets that pack through the five pillars

Speaker note:
Walk through the idea that the run is the smallest meaningful governance unit for organisational GenAI use. Stress that a persuasive output is not enough; the organisation must be able to show how the output came about and what controls surrounded it.

Visual idea:
Run anatomy diagram showing input, configuration, retrieval, output, and review.

Link to RAIDT:
Defines the evidential basis for both the evidence pack and the score profile.

Citation support to mention if asked:
Governance, audit trail, and provenance literature; AI uncertainty research.

Slide 3 — the five RAIDT pillars

Purpose:
Introduce the pillars as distinct governance dimensions.

Key message:
The pillars capture complementary questions about whether a run was justified, reconstructable, understandable, stable, and traceable.

Slide content:

  • Responsibility: purpose, authority, oversight
  • Auditability: logs, reconstruction, retention
  • Interpretability: meaning, limits, uncertainty
  • Dependability: stability, monitoring, failure handling
  • Traceability: provenance across prompts, sources, tools, versions

Speaker note:
Give one sentence on each pillar and emphasise that they are not arbitrary categories. They were selected because they reflect the main governance questions raised by organisational GenAI use and can be evidenced at run level.

Visual idea:
Five-column pillar graphic or radial wheel centred on the run.

Link to RAIDT:
Provides the organising grammar used throughout RAIDT scoring and evidence capture.

Citation support to mention if asked:
Responsible AI governance categories; auditability and traceability standards concepts.

Slide 4 — what scoring adds

Purpose:
Explain why RAIDT scores pillars instead of collecting evidence without judgement.

Key message:
Scoring turns documentation into a disciplined governance judgement about whether evidence is missing, partial, or audit-ready.

Slide content:

  • Evidence alone does not show adequacy
  • RAIDT uses anchored scores from 1 to 5
  • Scores must point back to evidence fields
  • The score is governance readiness, not truth or legality

Speaker note:
Clarify that the score does not certify correctness. It evaluates whether the run can be reviewed and justified. Briefly note the meaning of low, partial, and strong scores and the importance of calibration.

Visual idea:
Anchored scoring ladder with evidence examples at 1, 3, and 5.

Link to RAIDT:
Defines the measurement logic used in empirical validation and governance decision-making.

Citation support to mention if asked:
Rubric-based evaluation, calibration, and audit-readiness concepts.

Slide 5 — profile over composite

Purpose:
Show why RAIDT preserves the five-pillar profile rather than relying on a single average.

Key message:
A single composite score can hide governance weaknesses that matter operationally.

Slide content:

  • Different pillars fail in different ways
  • Strong clarity can coexist with weak provenance
  • Profile visibility supports targeted intervention
  • Composite scores are secondary, not primary

Speaker note:
Use an example such as a highly readable output with poor traceability. Explain that remediation depends on knowing which pillar is weak. This is why profile-first reporting is a substantive design choice rather than a presentation preference.

Visual idea:
Radar chart or side-by-side profile versus average-number comparison.

Link to RAIDT:
Protects the diagnostic value of scoring and keeps trade-offs visible across runs.

Citation support to mention if asked:
Multi-criteria governance assessment; transparency and accountability literature.

Slide 6 — evidence pack and proof logic

Purpose:
Show what evidence has to be captured for the pillars to be credible.

Key message:
The evidence pack must preserve purpose, configuration, provenance, output, checks, and review decisions in a reconstructable form.

Slide content:

  • Purpose, role, risk tier, and oversight fields
  • Prompt, model, tool, and parameter records
  • Retrieval snapshots, source IDs, and hashes where relevant
  • Output, uncertainty, repeat-run, and reviewer records

Speaker note:
Explain that S5 depends on concrete fields rather than narrative claims. Mention that Auditability and Traceability are especially sensitive to missing identifiers, versions, and retrieval evidence. Stress that evidence quality is what makes scoring credible.

Visual idea:
Evidence chain or table mapping evidence fields to pillars.

Link to RAIDT:
Connects directly to the run-level evidence pack and evidence-based scoring approach.

Citation support to mention if asked:
Audit trail, provenance, logging, and reproducibility literature.

Slide 7 — empirical validation and interventions

Purpose:
Explain how S5 supports Paper 09 and the evaluation of governance interventions.

Key message:
The pillars let RAIDT test whether interventions such as RAG, structured prompting, PEFT/LoRA, or RLHF-type controls improve governance readiness.

Slide content:

  • Repeated runs expose variance and instability
  • RAG may improve traceability if retrieval is preserved
  • PEFT/LoRA may improve dependability if lineage is logged
  • Alignment controls help only when policy provenance is recorded

Speaker note:
Make clear that RAIDT is not comparing technical methods only for performance. It is comparing how they affect evidential governability. This is what makes the framework empirically useful rather than purely descriptive.

Visual idea:
Comparison table of interventions against the five pillars.

Link to RAIDT:
Positions S5 as the measurement frame for empirical validation in Paper 09.

Citation support to mention if asked:
Empirical evaluation of GenAI governance interventions; RAG and alignment documentation work.

Slide 8 — project and policy implications

Purpose:
Close by explaining why this star matters across the wider RAIDT programme.

Key message:
S5 connects conceptual foundations, empirical validation, policy translation, and sector playbooks through one evidence-based governance grammar.

Slide content:

  • Paper 08: conceptual and methodological foundation
  • Paper 09: measurable governance outcomes
  • Paper 10: policy and standards translation
  • Sector playbooks: stable pillars, contextual thresholds

Speaker note:
Finish by positioning S5 as the operational core of RAIDT. It gives supervisors a clean explanation of the contribution: not another abstract responsible-AI framework, but a way to observe, score, compare, and improve GenAI governance at the level where organisational use actually happens.

Visual idea:
Bridge diagram linking foundations, empirical validation, policy pathways, and sector playbooks through S5.

Link to RAIDT:
Shows how the star supports the full RAIDT programme and its practical outputs: the evidence pack and the score profile.

Citation support to mention if asked:
EU AI Act, ISO/IEC 42001, NIST AI RMF, and responsible AI governance literature.

Powered by Forestry.md