Policy, Standards and Assurance

flowchart LR
    A[AI policy principles] --> B[Governance gap]
    B --> C[RAIDT run evidence]
    C --> D[Star S9 policy layer]
    D --> E[Evidence packs]
    D --> F[RAIDT score profile]
    E --> G[Audit and assurance]
    F --> G
    G --> H[Policy alignment]
    H --> I[Organisational learning]

Circle 3 - Academic, adoption and boundary layer

Ring: Adoption star

Function

Translate RAIDT from a conceptual responsible governance framework into something that can be aligned with policy instruments, organisational standards, assurance routines, procurement controls, audit practices, and accountability requirements. This star explains how run-level evidence becomes usable for assurance claims about generative AI in real organisational settings.

Role in the project

This star sits at the boundary between RAIDT's conceptual foundations and its implementation pathway. It is primarily a policy and assurance star, but it also supports methodology, empirical validation, and sector application. In the wider project, it shows that RAIDT is not only a way of thinking about responsible AI, but also a practical governance architecture that can be mapped to standards, tested through evidence, and used to guide interventions. It therefore connects foundational ideas from Paper 08 to empirical validation in Paper 09 and policy pathways in Paper 10.

Main questions answered by this star
Workshop discussion prompts
Items in this star (12)
Main message

Policy, standards and assurance matter in RAIDT because most organisational discussion about responsible AI still happens at the wrong level of abstraction. Institutions often publish high-level principles such as fairness, transparency, accountability or human oversight, but these principles do not by themselves show what happened when a generative AI system was actually used for a task. A policy may say that outputs must be checked, a standard may require documented controls, and an assurance team may ask whether risk has been managed, yet none of these questions can be answered well if the underlying unit of analysis is vague. RAIDT addresses this problem by defining the run as the unit of governance. A run is one configured use of a generative AI system for a specific task, at a specific time, in a specific context. That framing turns abstract governance language into observable evidence.

In this star, policy refers to the internal and external rules that shape acceptable use. Standards refer to structured governance frameworks that organise processes, controls, responsibilities and evidence. Assurance refers to the disciplined practice of judging whether claims about a system or process are sufficiently supported by evidence. These are related but not identical. Policy states expectations. Standards provide organised governance structures. Assurance tests whether the organisation can credibly demonstrate that expectations are being met. RAIDT matters because it gives all three a common operational anchor: the run-level evidence pack and the five-pillar RAIDT score profile.

The governance problem is especially acute for generative AI because outputs are probabilistic, context-dependent and sensitive to prompt design, model configuration, retrieved context, and downstream human interpretation. Conventional software governance assumes a more stable relationship between specification and output. By contrast, a generative system may produce different responses to closely related prompts, may rely on external retrieval pipelines, and may be deployed by non-technical staff across diverse workflows. This creates managerial uncertainty. Leaders may know there is value in using these systems, but they struggle to identify what evidence is sufficient for confident use, what should be reviewed before deployment, and what should be recorded for later challenge or audit. RAIDT offers a practical answer: capture evidence at the moment of use, score the run across five governance pillars, and use that evidence to support interventions.

This is where policy and standards alignment becomes substantive rather than symbolic. A framework such as the EU AI Act signals that organisations need documented risk management, human oversight, traceability, monitoring and accountability. ISO/IEC 42001 points towards a management-system approach in which roles, controls, monitoring and continual improvement are made systematic. The NIST AI RMF and its generative AI profile provide structured language for risk management, governance, measurement and management activities. RAIDT does not replace these instruments, nor does it claim one-to-one equivalence with them. Instead, it offers an evidence grammar that helps an organisation translate their expectations into fields, checks, logs, review practices and governance triggers at the run level. In other words, RAIDT provides a way to show how a policy requirement becomes something observable.

The link to the five pillars is direct. Responsibility asks who initiated, approved, reviewed or owned a run and whether accountability is allocated clearly. Auditability asks whether the run can be inspected later by an internal reviewer, auditor or regulator with enough context to understand what occurred. Interpretability asks whether the purpose, prompt logic, retrieved sources, output limitations and review reasoning can be explained in a way that supports informed judgement. Dependability asks whether the run is sufficiently robust for the task, including whether checks were performed, whether the model and tools behaved as expected, and whether escalation rules exist for failure or uncertainty. Traceability asks whether the chain from request to context to output to decision can be reconstructed. These pillars give policy and assurance work a structured evaluative language.

A practical example clarifies the point. Imagine a public sector team using a large language model with retrieval-augmented generation to draft responses to citizen enquiries. A general policy may require accuracy, privacy protection and human review. RAIDT turns those into concrete evidence fields: which model was used, which knowledge sources were retrieved, which version of the prompt template was applied, who checked the answer, what issues were identified, whether the response was edited, and whether the run met a threshold profile for Dependability and Traceability. If a complaint later arises, the organisation has more than a policy statement. It has a run-level record that supports explanation, contestability and improvement.

The same logic applies to procurement and assurance. When an organisation purchases a generative AI capability, it should not only ask whether a supplier claims compliance or alignment. It should ask whether the system can produce the evidence needed for governance in use. Can configurations be logged? Can prompts and retrieved context be retained appropriately? Can incidents be linked back to individual runs? Can outputs be sampled for audit? Can internal teams apply a consistent scoring logic? RAIDT therefore supports procurement by clarifying the evidential affordances a system must provide. A technically impressive tool that cannot support traceability or review may be unsuitable for sensitive organisational work.

This star also matters because assurance is not just an end-stage audit activity. In RAIDT, assurance is distributed across the lifecycle of use. Some assurance is ex ante, such as task classification, control design, approved prompt templates, retrieval restrictions, or PEFT and alignment controls defined by the provider or organisation. Some assurance is in-process, such as human review, confidence signalling, exception handling, or escalation where uncertainty is high. Some assurance is ex post, such as incident analysis, post-market monitoring, internal audit review and policy revision. The run-level evidence pack allows these layers to connect. That is especially valuable for empirical work because it creates observable artefacts that can be compared across settings, sectors and governance arrangements.

There are important boundaries. RAIDT does not claim that every legal or standards requirement can be reduced to a run score. It does not eliminate the need for broader organisational governance, including strategy, training, vendor management, data governance, and legal interpretation. It does not guarantee that an output is correct simply because evidence was captured. What it does claim is narrower and more defensible: governance improves when claims about responsible use are tied to structured evidence at the level where generative AI work actually happens. That claim is methodologically useful, policy-relevant and testable.

For the RAIDT project, this star is therefore pivotal. It shows supervisors and stakeholders that the framework is not merely a conceptual model of responsible AI, and not merely a technical recipe for prompt engineering. It is an Information Systems governance contribution that addresses uncertainty by linking situated use, evidence capture, scoring, assurance and organisational intervention. That is what makes RAIDT legible to scholars, managers, auditors and policy audiences at the same time.

Key questions and answers

Q1. What is meant by policy, standards and assurance in RAIDT?

Answer:
In RAIDT, policy refers to the rules and expectations that govern acceptable generative AI use; standards refer to structured governance frameworks that organise responsibilities, controls and review processes; and assurance refers to the disciplined judgement that those claims are supported by evidence. RAIDT connects these layers by treating each run as the evidential unit from which broader governance claims are built.

Practical example:
An organisation may have an internal policy requiring human review of sensitive outputs, a standards-based management process for documenting controls, and an assurance review that checks whether those controls were actually followed.

Link to RAIDT:
RAIDT provides the run-level evidence pack and five-pillar profile that allow those policy and assurance claims to be demonstrated rather than merely stated.

Q2. Why is a general responsible AI policy not sufficient on its own?

Answer:
A general policy often states values or principles, but it rarely specifies what must be captured when a model is used in practice. Without a run-level record, organisations cannot reliably reconstruct what prompt was used, what context was retrieved, what output was produced, or what checks were applied.

Practical example:
A team may claim that all AI-generated reports are reviewed by staff, but if no run records exist, it is difficult to show who reviewed which output and under what conditions.

Link to RAIDT:
RAIDT closes this gap by making review, context, output and control evidence part of the evidence pack for each run.

Q3. What governance problem does this star solve?

Answer:
This star solves the translation problem between abstract governance expectations and operational evidence. Organisations know they need accountability, auditability and oversight, but they often lack a practical method for documenting these qualities at the point of use.

Practical example:
An assurance team asks whether a generative AI assistant used in procurement produced misleading advice. If the organisation only has a policy document, the question remains abstract. If it has run-level evidence, the review becomes concrete.

Link to RAIDT:
RAIDT turns policy expectations into observable fields, checks and scores that support governance interventions.

Q4. How does the run-level evidence pack support assurance?

Answer:
The evidence pack records the configuration, context, output and review history of a specific run. This allows assurance work to examine whether controls were present, whether reviewers acted appropriately, and whether the run met the organisation's threshold for acceptable use.

Practical example:
A department audits a sample of AI-assisted case summaries and reviews the prompt template, retrieved documents, reviewer sign-off, and any logged concerns.

Link to RAIDT:
The evidence pack is the operational foundation for Auditability, Traceability and Responsibility, and it gives assurance reviewers a stable object of inspection.

Q5. How do standards such as ISO/IEC 42001 and the NIST AI RMF fit into this star?

Answer:
These standards provide structured governance language for roles, controls, risk management, monitoring and continual improvement. RAIDT does not replace them; instead, it helps operationalise them by showing what evidence could be captured at the level of actual AI use.

Practical example:
A governance team maps a standard's monitoring requirement to RAIDT evidence fields such as run logs, exception flags, human checks and incident links.

Link to RAIDT:
RAIDT functions as an evidence grammar and crosswalk layer that connects external standards to internal run-level practice.

Answer:
The EU AI Act is relevant because it sharpens organisational expectations around documentation, risk management, human oversight, traceability and post-deployment monitoring. Even where formal legal scope differs across organisations, it influences governance thinking and procurement expectations.

Practical example:
A university or supplier may not use RAIDT as a legal checklist, but it may still use RAIDT evidence to show how oversight, documentation and monitoring are being handled for high-stakes tasks.

Link to RAIDT:
RAIDT supports policy alignment by making evidential claims about runs inspectable and comparable, which is useful for policy crosswalks and governance review.

Q7. How does procurement connect to policy, standards and assurance?

Answer:
Procurement is where governance assumptions are often locked in. If a tool cannot export logs, document model versions, capture retrieval context, or support review workflows, later assurance becomes weak regardless of policy ambition.

Practical example:
Two vendors offer similar drafting assistants, but only one supports structured logging, version control, and retention settings needed for audit.

Link to RAIDT:
RAIDT clarifies the evidential and control capabilities that should be considered during procurement because they affect all five pillars in practice.

Q8. How can internal audit use RAIDT?

Answer:
Internal audit can use RAIDT to sample runs, inspect evidence packs, compare scores across teams, and identify whether control failures are isolated or systemic. This moves audit discussion from vague AI concern to specific evidence patterns.

Practical example:
Auditors review a month's worth of runs from a legal drafting workflow and find that Traceability is strong but Interpretability is weak because reviewers are not recording why edits were made.

Link to RAIDT:
The RAIDT profile helps auditors see whether issues are concentrated in Responsibility, Dependability or another pillar, which supports targeted governance interventions.

Q9. How do incident response and post-market monitoring fit into this star?

Answer:
Generative AI governance cannot end at initial deployment. Organisations need mechanisms to detect, investigate and learn from failures, complaints or near misses. Run-level evidence makes this possible by linking incidents back to specific uses and configurations.

Practical example:
A misleading customer-facing output is reported. Investigators trace it to a run that used an outdated retrieval source and an unapproved prompt variant.

Link to RAIDT:
Incident review and monitoring depend on Traceability and Auditability, and the evidence pack provides the material needed to update controls and scoring thresholds.

Q10. How does this star help supervisors understand the overall RAIDT project?

Answer:
This star demonstrates that RAIDT is a governance framework designed for organisational use, not only a conceptual statement about responsible AI. It shows how theory, evidence and policy relevance are integrated through a common unit of analysis.

Practical example:
In supervision, a reviewer can see how Paper 08 defines the logic of the run, how Paper 09 tests it empirically, and how Paper 10 positions it for policy and standards alignment.

Link to RAIDT:
S9 makes the project legible across scholarly, managerial and assurance audiences by linking evidence packs, scores and interventions to recognised governance needs.

Practical examples
  1. Public sector correspondence drafting: A team uses a GenAI assistant with RAG to draft replies to citizen enquiries. RAIDT captures the prompt template, retrieved policy documents, output, reviewer edits, and a score profile that determines whether the draft can be reused or must be escalated.
  2. Procurement evaluation of a GenAI platform: A university compares suppliers not only on features and cost, but on whether each system can support run logs, evidence export, reviewer workflows, retention controls, and incident traceability.
  3. Internal audit of AI-assisted reporting: An audit team samples runs from a management reporting workflow and finds inconsistent review notes. The issue is not only technical quality, but weak interpretability evidence, prompting a governance intervention.
  4. Incident response after a problematic output: A clinical administration team discovers that an AI-generated summary omitted an important caveat. RAIDT allows the organisation to inspect the run, identify which source documents were retrieved, see who approved the summary, and revise the control design.
  5. Post-market monitoring in a sector playbook: A sector-specific playbook defines periodic review thresholds for high-risk uses. RAIDT scores across multiple runs reveal where Dependability improves after prompt redesign but Traceability remains weak because source citation practices are inconsistent.
Evidence needed / what to capture
Link to RAIDT project
Citation ideas to support this note
Boundaries and limitations
Conclusion

This star explains how RAIDT moves from a good idea about responsible AI into an auditable governance architecture. The core issue is that organisations often have policies and standards language, but they struggle to demonstrate what happened when a generative AI system was actually used. RAIDT addresses that by making the run the unit of governance. For each run, we can capture the prompt, model configuration, retrieved context, output and review checks, and then score the run across Responsibility, Auditability, Interpretability, Dependability and Traceability. That matters because standards and policy instruments usually ask for oversight, documentation, monitoring and accountability, but they do not always specify what evidence should exist at the moment of use. S9 shows that RAIDT can act as the operational bridge. It does not replace the EU AI Act, ISO/IEC 42001 or the NIST AI RMF, but it helps translate them into evidence fields, assurance routines and governance interventions. So, for supervision, this star is important because it shows RAIDT as an Information Systems governance contribution with empirical and policy relevance.

Slides
Slide 1 — why policy, standards and assurance matter

Purpose:
Frame the star and explain why it matters for the RAIDT project.

Key message:
RAIDT needs a policy and assurance layer because responsible AI claims only become credible when they can be tied to evidence from actual runs.

Slide content:

  • Organisations have AI principles but often weak operational proof
  • Generative AI creates uncertainty at the point of use
  • RAIDT treats the run as the unit of governance
  • S9 links RAIDT to accountability, audit and assurance

Speaker note:
Open by saying that this star is the bridge between conceptual responsible AI language and real organisational governance. Emphasise that most institutions can state principles, but fewer can show what happened during a specific use of a generative AI system. S9 matters because it explains how RAIDT turns that abstract governance demand into inspectable evidence.

Visual idea:
Problem-to-solution framing graphic: principles on the left, run-level evidence on the right.

Link to RAIDT:
Introduces why the evidence pack and five-pillar score profile are needed for the project as a whole.

Citation support to mention if asked:
Responsible AI governance literature; assurance and accountability literature.

Slide 2 — the governance gap

Purpose:
Explain the specific organisational problem that S9 addresses.

Key message:
Policies and standards often stay too abstract unless they are translated into observable evidence at the level of use.

Slide content:

  • Policy says what should happen
  • Standards organise controls and responsibilities
  • Assurance asks whether claims are evidenced
  • Without run records, review remains weak

Speaker note:
Clarify the distinction between policy, standards and assurance. Then make the argument that the gap is not absence of principles but absence of operational evidence. This is particularly problematic for generative AI because outputs vary with prompt, model configuration, context retrieval and human review.

Visual idea:
Three-layer hierarchy: policy, standards, assurance, with a missing evidence layer highlighted.

Link to RAIDT:
Shows why RAIDT positions the run-level evidence pack as the missing operational layer.

Citation support to mention if asked:
Information Systems governance; audit and assurance literature; uncertainty in AI-enabled work.

Slide 3 — RAIDT's run-level solution

Purpose:
Explain how RAIDT operationalises governance.

Key message:
RAIDT converts abstract governance expectations into run-level evidence and a structured five-pillar score profile.

Slide content:

  • A run = one task, one time, one context, one configuration
  • Capture prompt, model, tools, retrieved context, output and checks
  • Produce a run-level evidence pack
  • Score the run across five RAIDT pillars

Speaker note:
Talk through the run as the core unit of analysis. Stress that governance improves when evidence is captured at the moment of use rather than reconstructed later. The evidence pack and score profile are the two practical outputs that make RAIDT useful for managers, auditors and researchers.

Visual idea:
Process flow from input and configuration to output, checks, evidence pack and score profile.

Link to RAIDT:
Centres the framework's core logic and its practical outputs.

Citation support to mention if asked:
Run-level evidence logic; prompt engineering and RAG governance; AI assurance approaches.

Slide 4 — mapping to standards and policy instruments

Purpose:
Show how RAIDT relates to recognised governance frameworks.

Key message:
RAIDT does not replace policy instruments and standards; it helps operationalise them through an evidence grammar.

Slide content:

  • EU AI Act highlights documentation, oversight and monitoring
  • ISO/IEC 42001 structures management-system governance
  • NIST AI RMF offers risk governance language
  • RAIDT crosswalks these expectations to run evidence

Speaker note:
Explain that S9 is not making a legal equivalence claim. The contribution is more practical: RAIDT helps organisations identify what they would need to capture if they want to show that broad policy or standards requirements are being met in day-to-day use. This is why the policy crosswalk and evidence grammar items in the star are important.

Visual idea:
Crosswalk table linking external frameworks to RAIDT evidence fields and pillars.

Link to RAIDT:
Connects run-level evidence packs and scoring to Paper 10 and to policy alignment use cases.

Citation support to mention if asked:
EU AI Act commentary; ISO/IEC 42001 materials; NIST AI RMF and GenAI profile.

Slide 5 — assurance, audit and procurement

Purpose:
Show how S9 matters in organisational practice.

Key message:
RAIDT supports assurance not only after deployment but also in procurement, internal audit and routine governance review.

Slide content:

  • Procurement should test evidential capabilities, not only features
  • Internal audit can sample runs and inspect evidence packs
  • Assurance reviews can use pillar scores as governance signals
  • Weak evidence leads to targeted interventions

Speaker note:
Use this slide to show that S9 is practically useful. A tool that cannot log prompts, retrieval context or review decisions will be difficult to govern regardless of policy statements. Likewise, audit becomes more concrete when reviewers can inspect run-level evidence rather than relying on general claims about safe use.

Visual idea:
Three-column comparison: procurement, audit, assurance, all pointing to the same evidence pack.

Link to RAIDT:
Demonstrates how evidence packs and pillar scores support governance interventions across organisational functions.

Citation support to mention if asked:
AI procurement governance; auditability and assurance case literature.

Slide 6 — monitoring, incidents and learning

Purpose:
Explain why S9 extends beyond initial approval.

Key message:
Run-level evidence makes incident response and post-market monitoring possible because failures can be traced back to specific uses.

Slide content:

  • Governance continues after deployment
  • Complaints and near misses must be investigable
  • Incident review depends on traceable run records
  • Monitoring results should feed back into controls

Speaker note:
Stress that assurance is not a one-off event. Generative AI systems interact with changing tasks, users and knowledge sources, so governance must include learning over time. Run-level evidence allows organisations to detect patterns, investigate problematic outputs and update prompt templates, review rules or procurement decisions.

Visual idea:
Feedback loop diagram from run to incident review to control redesign.

Link to RAIDT:
Reinforces the importance of Traceability, Auditability and governance interventions.

Citation support to mention if asked:
Post-deployment monitoring; incident management; sociotechnical learning in AI governance.

Slide 7 — contribution across the RAIDT papers

Purpose:
Position S9 within the thesis structure.

Key message:
S9 shows how RAIDT connects foundations, empirical validation and policy pathways through a common unit of evidence.

Slide content:

  • Paper 08: why the run is the right governance unit
  • Paper 09: how evidence packs and scores can be studied empirically
  • Paper 10: how RAIDT aligns with policy and standards pathways
  • Sector playbooks adapt the same logic to different settings

Speaker note:
Use this slide for supervisors. It shows that S9 is not an isolated note but a structural bridge in the project. The same run-level logic supports theory, fieldwork and policy discussion, which strengthens the coherence of the overall thesis.

Visual idea:
Three-part thesis map with S9 acting as the connecting bridge.

Link to RAIDT:
Makes explicit how this star supports the whole project architecture.

Citation support to mention if asked:
Methodological and policy pathway literature; empirical governance research design.

Slide 8 — limits and supervisory takeaway

Purpose:
End with a balanced statement of value and scope.

Key message:
RAIDT offers a credible governance mechanism for generative AI use, but it complements rather than replaces broader legal, organisational and sector-specific judgement.

Slide content:

  • RAIDT is not a full compliance checklist
  • Scores inform judgement; they do not replace it
  • Broader governance still matters
  • The value lies in evidence-backed accountability at run level

Speaker note:
Close by stating the contribution carefully. S9 does not overclaim. It argues that policy alignment and assurance improve when responsible AI claims are anchored in run-level evidence. That is a defensible, empirically testable and policy-relevant contribution to Information Systems governance.

Visual idea:
Balanced comparison graphic: what RAIDT does and what RAIDT does not do.

Link to RAIDT:
Summarises how the evidence pack, score profile and interventions give the project practical governance relevance.

Citation support to mention if asked:
AI governance limits; assurance and accountability scholarship; sector guidance materials.

Powered by Forestry.md