Run-Level Evidence Logic

#raidt/S3

flowchart LR
    A[Operational AI use] --> B[Post hoc opacity]
    B --> C[RAIDT framework]
    C --> D[Run as governance unit]
    J[Prompt model and context] --> D
    D --> E[Run-level evidence]
    E --> F[Evidence pack]
    E --> G[Five-pillar scoring]
    F --> H[Reconstruction and challenge]
    G --> I[Oversight and policy alignment]

← Circle 2 - Operational governance mechanism

Ring: Operational star

Function

Explains why RAIDT treats the run as the primary unit of governance and why each run should generate an evidence object that supports reconstruction, comparison, challenge, and practical oversight. This star defines the logic behind the run-level evidence pack and clarifies how evidence converts a single use of generative AI into a governable organisational event.

Role in the project

This star sits in the operational layer of RAIDT, but it also connects foundations, implementation, empirical validation, and policy translation. It operationalises a central RAIDT claim: governance should not stop at model documentation or high-level policy statements, because the most meaningful risks and decisions arise when a specific model, prompt, tool configuration, retrieved context, and human action come together in an actual run. The note therefore supports the evidence pack, informs the five-pillar score profile, and provides a bridge between conceptual work in Paper 08, empirical work in Paper 09, and policy pathways in Paper 10.

Main questions answered by this star

What does run-level evidence logic mean in RAIDT?
Why does RAIDT need the run, rather than the model or policy alone, as the unit of governance?
What organisational problem is solved by capturing evidence at the point of use?
What counts as an evidence object or proof object in practice?
What evidence is needed to reconstruct, compare, replay, or challenge a run?
How does this logic connect to the run-level evidence pack?
How does run-level evidence support the five RAIDT pillars: Responsibility, Auditability, Interpretability, Dependability, and Traceability?
How does this star help supervisors understand RAIDT as an Information Systems governance framework rather than only a technical monitoring idea?
What kinds of managerial uncertainty does run-level evidence reduce, and which uncertainties remain?
How can this logic support empirical validation, sector playbooks, and policy alignment with instruments such as the EU AI Act, ISO/IEC 42001, and the NIST AI RMF?

Workshop discussion prompts

10-20 min ? What is the difference between governing a model in the abstract and governing a specific organisational run at the point of use?
20-40 min ? Which minimum evidence fields are necessary if a supervisor, auditor, or affected stakeholder needs to reconstruct and contest a run later?
40-60 min ? How should run-level evidence feed RAIDT scoring, governance interventions, and sector-specific playbooks without creating unmanageable documentation burdens?

Items in this star (10)

Main message

Run-level evidence logic sits near the centre of RAIDT because generative AI governance often fails at the exact point where organisational consequences arise. Many organisations have policies, procurement checklists, model cards, or high-level Responsible AI statements, yet they still struggle to answer basic operational questions after an incident or contested decision. What prompt was used? Which model version generated the output? Was retrieval enabled? Which documents were supplied as context? Who reviewed the output before action was taken? If those questions cannot be answered reliably, governance becomes rhetorical rather than operational. RAIDT responds by treating the run as the unit that matters most in practice.

In RAIDT, a run is one configured use of a generative AI system for a specific task, at a specific time, in a specific context. It is not the model in general, and it is not the whole application estate. It is the situated event in which prompt or instruction, model and tool configuration, retrieved context where used, output, and human or automated checks come together. This matters because organisational risk is often produced at this level. A harmless model can be used in a risky way. A well-written policy can be ignored in a hurried workflow. A strong prompt pattern can fail when retrieval injects poor source material. The run therefore becomes the smallest governance unit that still preserves enough context to understand what actually happened.
The core claim of this star is that each meaningful run should generate an evidence object, and that this evidence object functions as a proof object for governance. In plain terms, the evidence object is the recorded basis on which the organisation can later reconstruct, compare, challenge, and, where feasible, replay the run. It is not merely a log dump. It is a structured package showing what was asked, what system configuration was in place, what context was supplied, what output was produced, what checks were performed, and what decisions followed. When such evidence is captured at the point of use, governance becomes actionable. When it is absent, managers are left with uncertainty, fragmented accountability, and weak auditability.

This logic addresses several problems at once. First, it reduces evidential gaps. In many GenAI deployments, traces are partial, inconsistent, or spread across prompts, chat histories, application logs, and human memory. Second, it reduces managerial uncertainty by making it easier to inspect why a questionable output arose and whether it reflects prompt design, retrieval quality, model behaviour, user behaviour, or missing controls. Third, it supports contestability. If an employee, customer, student, clinician, or regulator asks how an AI-assisted output was generated, an organisation needs more than reassurance; it needs an inspectable record. Fourth, it enables comparison across runs. Without common evidence fields, organisations cannot identify recurring failure patterns, benchmark teams, or evaluate whether governance interventions are improving outcomes over time.

RAIDT therefore links run-level evidence directly to its two practical outputs: the run-level evidence pack and the five-pillar score profile. The evidence pack is the operational container. It assembles the minimum metadata and artefacts required to understand a run as a governable event. The score profile is the evaluative layer. Responsibility depends on whether roles, approvals, and review obligations are evident. Auditability depends on whether an external or internal reviewer can inspect the run later. Interpretability depends partly on whether the prompt logic, retrieved context, model settings, and rationale are sufficiently intelligible. Dependability depends on evidence of checks, consistency, exception handling, and error management. Traceability depends on whether the run can be linked to inputs, outputs, versions, users, and downstream actions. In this sense, evidence is not peripheral to RAIDT; it is the substrate that makes scoring credible.

A practical example makes the point clearer. Imagine an HR team using a GenAI assistant to draft a capability summary for promotion review. If the organisation stores only the final output, it cannot later determine whether the model relied on approved performance documents, whether the prompt asked for an even-handed summary, whether retrieval pulled outdated material, or whether a human reviewer corrected bias before the document was used. A run-level evidence pack would capture the prompt, model and version, retrieved documents, generated text, reviewer identity, review outcome, and any intervention taken. That does not eliminate the possibility of unfairness, but it does mean the organisation can investigate, compare similar runs, and improve controls.

The same logic applies to RAG, prompt engineering, PEFT or LoRA customisation, and alignment controls such as RLHF-informed behaviour or organisational system prompts. These design choices affect outputs, but their governance value depends on whether their effects are evidenced at run level. A retrieval pipeline may be well designed in theory, but if a particular run uses stale or irrelevant source documents, the governance question concerns that run. A fine-tuned model may appear stable in testing, but if a live run behaves unexpectedly under deadline pressure or ambiguous prompting, that event must be evidentially visible. RAIDT does not deny the value of model-level assurance; rather, it argues that model-level assurance is insufficient unless connected to the situated use event.
This star is also important methodologically. For Paper 08, run-level evidence logic helps define RAIDT's foundational pathway by specifying the unit of analysis and the governance artefact. For Paper 09, it provides something testable: whether different evidence configurations improve reconstruction, inter-rater scoring, challenge handling, or decision quality in real settings. For Paper 10, it offers a policy pathway by showing how broad Responsible AI obligations can be translated into operational record-keeping, oversight, and assurance practices. The idea is especially relevant to sector playbooks because sectors differ in tolerance for opacity, required retention periods, review intensity, and acceptable evidence burden.

The concept does have limits. Evidence does not guarantee truth, fairness, or compliance by itself. Some runs cannot be replayed exactly because model versions, external tools, or dynamic retrieval sources may change. Excessive evidence capture can also create cost, privacy, security, and usability burdens. RAIDT therefore should not be read as advocating indiscriminate surveillance or total logging of every interaction. The stronger claim is narrower and more defensible: if an organisation wants meaningful governance of generative AI in work settings, it needs a proportionate, structured, run-level evidence logic that makes important uses reconstructable, comparable, challengeable, and governable.

Key questions and answers

Q1. What does run-level evidence logic mean?

Answer:
Run-level evidence logic is the idea that each significant use of a generative AI system should leave a structured evidential record showing what happened, under what conditions, and with what checks. The focus is not on abstract model capability alone but on the concrete event in which a model, prompt, context, tools, and reviewers interact. This makes governance attach to actual organisational behaviour rather than only to policy declarations.

Practical example:
A procurement officer uses a GenAI tool to summarise supplier responses. The organisation records the prompt, the uploaded tender documents, the model version, the summary output, and the human review decision.

Link to RAIDT:
This is the basis of the RAIDT evidence pack and the starting point for scoring the run across all five pillars.

Q2. Why is the run the unit of governance rather than the model alone?

Answer:
The model alone does not explain how it was used in a particular organisational context. Harm, error, or value often arises from the combination of prompt design, retrieved documents, user intent, task pressure, and review practice. The run captures that configuration in a way a model card cannot.

Practical example:
Two teams use the same model, but one team uses approved internal guidance with mandatory review while the other pastes unverified external text and skips checking. The governance position differs because the runs differ.

Link to RAIDT:
RAIDT defines a run as the governable event and uses run evidence to justify Responsibility, Dependability, and Traceability scores.

Q3. What problem does run-level evidence solve?

Answer:
It solves the problem of post hoc opacity. Without run evidence, organisations often cannot explain why an output was produced, whether controls were followed, or how to challenge a disputed result. Evidence reduces uncertainty and enables credible oversight.

Practical example:
A customer disputes an AI-assisted complaint response. With run evidence, the firm can inspect the source materials, prompt, output, and reviewer decision instead of relying on memory.

Link to RAIDT:
The star supports reconstructability and contestability, which are core to the evidence pack and to Auditability scoring.

Q4. What is an evidence object or proof object in this framework?

Answer:
An evidence object is the structured record of the run. It becomes a proof object when it is sufficiently complete and credible to support reconstruction, comparison, and challenge in governance settings. The distinction matters because not every log entry is evidentially useful.

Practical example:
A timestamp alone is a weak record. A package containing prompt text, system configuration, retrieved files, generated output, approval step, and exception notes is a much stronger proof object.

Link to RAIDT:
The RAIDT evidence pack formalises what counts as usable proof for scoring and intervention decisions.

Q5. How does this connect to the five RAIDT pillars?

Answer:
Each pillar depends on evidence. Responsibility needs role and approval evidence. Auditability needs inspectable records. Interpretability needs intelligible prompt and context information. Dependability needs evidence of checks and consistency. Traceability needs links across inputs, outputs, versions, and downstream use.

Practical example:
If a run record shows who approved the output, which knowledge base was used, and what validation step passed or failed, multiple pillars can be assessed from one evidence pack.

Link to RAIDT:
This star is one of the mechanisms that makes five-pillar scoring operational rather than impressionistic.

Q6. How does run-level evidence help with RAG and prompt engineering?

Answer:
RAG and prompt engineering shape outputs, but their real governance significance appears only when their effects are visible in a specific run. Evidence shows whether the right sources were retrieved, whether the prompt constrained the task appropriately, and whether the output remained within acceptable bounds.

Practical example:
A legal operations assistant retrieves an outdated policy memo during a RAG-supported drafting task. The run record shows that the issue came from retrieval quality rather than from the base model alone.

Link to RAIDT:
Evidence at point of use supports Interpretability, Dependability, and targeted governance interventions such as prompt redesign or source curation.

Q7. How does this relate to alignment controls such as RLHF, system prompts, or fine-tuning?

Answer:
Alignment controls influence behaviour, but organisations still need evidence of how those controls operated in live use. A controlled model may behave differently across contexts, tasks, or user practices. Run evidence reveals whether alignment assumptions held in practice.

Practical example:
A fine-tuned internal assistant is expected to refuse sensitive requests, yet one team finds a workaround through ambiguous phrasing. The run evidence helps identify whether the issue sits in system prompt design, user permissions, or review failure.

Link to RAIDT:
This supports governance interventions by linking technical control assumptions to observable run outcomes.

Q8. What evidence should be captured as a minimum?

Answer:
A minimum set usually includes run identifier, date and time, task purpose, user or role, model and version, tool configuration, prompt or instruction, retrieved context where relevant, output, review or check status, and final action. Additional fields may be required in higher-risk settings.

Practical example:
For an internal policy-summary task, the evidence pack may include the knowledge base version, retrieval query, summary text, reviewer comments, and whether the summary was sent onward or discarded.

Link to RAIDT:
Minimum metadata is essential for evidence readiness and for consistent scoring across runs and sectors.

Q9. Does evidence guarantee good governance?

Answer:
No. Evidence improves visibility and accountability, but it does not automatically make outputs fair, accurate, or lawful. Poor controls can still be documented faithfully. Governance still requires judgement, review, escalation rules, and policy alignment.

Practical example:
A team may record every field correctly but still use an inappropriate prompt template for a sensitive assessment task.

Link to RAIDT:
RAIDT uses evidence as the basis for assessment and intervention, not as proof that the run is acceptable by default.

Q10. How does this star help supervisors understand the overall RAIDT contribution?

Answer:
It shows that RAIDT is not simply another Responsible AI principle set. Its distinctive contribution is to define a practical unit of governance and an operational evidence logic that can support empirical testing, scoring, auditing, and policy translation.

Practical example:
In a supervision meeting, this star can be used to explain why RAIDT focuses on governable events and evidence packs rather than only on abstract ethics principles.

Link to RAIDT:
This star connects the conceptual architecture of RAIDT to its practical outputs, making the project legible across foundations, empirical validation, and policy pathways.

Practical examples

An HR team uses a GenAI assistant to draft promotion-review summaries. Run-level evidence allows later checking of prompts, sources, review steps, and possible bias concerns.
A customer-service unit uses RAG to draft complaint responses from policy documents. Evidence shows whether the assistant relied on the correct policy version and whether a human corrected the response before release.
A procurement team uses GenAI to compare supplier bids. Comparable evidence across runs allows managers to identify inconsistent prompting, uneven source selection, or missing approvals.
A university professional-services team uses an internal chatbot to draft student guidance. Run evidence helps distinguish between model issues, weak source curation, and staff misuse when advice is challenged.
A compliance function pilots a fine-tuned internal model. Run-level records show whether expected refusal behaviours and review controls operate consistently in live organisational work.

Evidence needed / what to capture

Run identifier and timestamp
Business task, purpose, and organisational context
User identity, role, team, or delegated agent status
Model name, provider, version, and relevant configuration settings
Tool use, workflow settings, and whether automation or human-in-the-loop review was enabled
Prompt, system instruction, template, or workflow trigger
Retrieved context, source documents, knowledge base version, and retrieval parameters where RAG is used
Input artefacts supplied by the user or system
Output artefact, including version if edited by a human
Review, approval, escalation, or exception-handling steps
Risk classification, sensitivity level, or policy pathway triggered
Applied scoring fields across the five RAIDT pillars
Governance intervention taken, if any, and downstream action or decision outcome
Retention, access-control, and privacy notes where evidence capture itself creates governance obligations

Link to RAIDT project

Paper 08: foundations and methodological pathways
This note helps define RAIDT's core unit of analysis, its proof-object logic, and the methodological reason for focusing on runs rather than abstract model governance alone.
Paper 09: empirical validation
The note specifies observable variables for testing, such as reconstructability, scoring consistency, evidence completeness, challenge handling, and the impact of governance interventions across runs.
Paper 10: policy pathways
The note translates broad governance expectations into practical record-keeping and oversight mechanisms that policymakers and organisational leaders can understand.
Sector playbooks
Different sectors can adapt the same logic by changing the minimum evidence requirements, review intensity, escalation rules, and retention standards according to risk and context.
RAIDT scoring
Scoring is only credible if assessors can inspect evidence from the run. This star explains why evidence quality is a precondition for defensible scores.
RAIDT evidence pack
This note is one of the clearest conceptual justifications for the evidence pack, especially its emphasis on reconstructability, comparability, and challenge.
RAIDT governance interventions
Interventions such as prompt redesign, retrieval curation, reviewer training, threshold changes, or use restrictions should be triggered and assessed on the basis of run evidence.

Citation ideas to support this note

Responsible AI governance literature on accountability, contestability, and auditability
Information Systems governance literature on control, oversight, and organisational accountability mechanisms
AI assurance and algorithmic auditing literature on documentation, evidence, and traceability
Human-AI interaction and prompt engineering literature on situated use and user configuration effects
RAG and knowledge-grounding literature on source quality, retrieval failure, and context dependence
Model documentation and transparency literature, including model cards, datasheets, and assurance case approaches
Standards and policy sources such as the EU AI Act, ISO/IEC 42001, and NIST AI RMF for operational governance expectations
Empirical studies on uncertainty, human review, and failure analysis in organisational AI deployments

Boundaries and limitations

This concept does not claim that capturing evidence automatically makes a run lawful, fair, or accurate.
It does not assume every run can be replayed exactly, especially when model versions, tools, or retrieval sources change over time.
It does not argue for indiscriminate logging of all interactions without regard to privacy, labour, or security implications.
It does not replace model-level testing, policy design, or organisational training; it complements them.
It does not eliminate managerial judgement. Evidence improves decision quality, but interpretation and escalation still matter.
It may impose operational costs, so proportionality remains essential when deciding which runs require richer evidence packs.

Conclusion

This star explains why RAIDT governs generative AI at the level of the run. A run is one specific use of a GenAI system for one task, at one time, in one context. That is the point where prompt, model, retrieved context, output, and review actually come together, so it is also the point where organisational risk and accountability arise. The argument is that governance is weak if we only document the model or write broad policies, because those measures do not tell us what happened in a contested case. Run-level evidence solves that by creating an evidence object, or proof object, that allows reconstruction, comparison, and challenge. In practical terms, that logic produces the RAIDT evidence pack and makes the five-pillar score profile defensible. It also gives the project a clear methodological contribution: Paper 08 can define the governance unit, Paper 09 can test the evidential and scoring logic empirically, and Paper 10 can show how this becomes a policy pathway for organisational use of generative AI.

Suggested slide order for oral presentation

Frame the star and the governance problem.
Explain why the run is the unit of governance.
Define the evidence object and proof-object logic.
Show what belongs in the evidence pack.
Connect evidence to the five RAIDT pillars.
Use practical examples to show organisational relevance.
Link the star to the three papers and sector playbooks.
Close with limits, proportionality, and project significance.

Slides

Slide 1 — why run-level evidence matters

Purpose:
Frame the concept and explain why this star matters within RAIDT.

Key message:
RAIDT becomes operational only when a specific AI run is treated as a governable event supported by evidence.

Slide content:

GenAI governance often fails at point of use
Policies and model documentation are not enough
The run is where context, output, and accountability meet
Evidence turns use into a governable event

Speaker note:
Introduce the problem first: organisations often have broad AI principles but cannot explain what happened in a disputed case. This slide sets up RAIDT's practical move from abstract governance to situated governance. Emphasise that the run is not a technical detail; it is the organisational event where risk, judgement, and responsibility become visible.

Visual idea:
Comparison graphic showing policy and model documentation on one side, and a concrete run with evidence on the other.

Link to RAIDT:
This slide frames the entire RAIDT architecture by positioning run-level evidence as the basis for evidence packs and score profiles.

Citation support to mention if asked:
Responsible AI governance literature on accountability gaps and AI assurance literature on operational evidence.

Slide 2 — why the run is the unit of governance

Purpose:
Explain why RAIDT focuses on the run rather than the model alone.

Key message:
Risk and accountability arise from situated use, not from model capability in the abstract.

Slide content:

A run is one configured use for one task
Prompt, model, tools, context, and checks combine in the run
The same model can produce very different governance risks
Governance must therefore attach to the run

Speaker note:
Clarify the definition of a run and stress that context matters. The same model can be low risk in one workflow and problematic in another because prompts, retrieved sources, time pressure, and review discipline differ. This is the conceptual move that distinguishes RAIDT from frameworks focused mainly on model-level description.

Visual idea:
Layered diagram with prompt, model, retrieval, output, and review feeding into a single run box.

Link to RAIDT:
This is the governing logic behind the run-level evidence pack and one of the foundational claims that Paper 08 needs to establish.

Citation support to mention if asked:
Human-AI interaction research, prompt engineering literature, and Information Systems governance work on situated organisational controls.

Slide 3 — evidence object and proof object

Purpose:
Define what RAIDT means by evidence and why not every log is enough.

Key message:
A usable governance record is a structured proof object, not just fragmented technical logging.

Slide content:

Evidence object records what happened in the run
Proof object supports reconstruction, comparison, and challenge
Fragmented logs do not support credible oversight
Structured evidence is needed at point of use

Speaker note:
Explain that the distinction matters because many systems already produce logs, but those logs are often incomplete or unreadable for governance purposes. RAIDT requires evidence that is meaningful to reviewers, managers, auditors, and potentially affected parties. The point is evidential sufficiency, not raw data volume.

Visual idea:
Before-and-after comparison: scattered logs versus a structured evidence pack.

Link to RAIDT:
This slide explains the logic behind S3.02, S3.03, S3.04, and S3.09 and shows why the evidence pack is central to RAIDT.

Citation support to mention if asked:
AI assurance, audit trail, and documentation literature including transparency and assurance-case approaches.

Slide 4 — what the evidence pack should capture

Purpose:
Show the practical contents of run-level evidence.

Key message:
RAIDT needs a minimum metadata set that makes runs reconstructable and reviewable.

Slide content:

Task, context, actor, and timestamp
Prompt, model, version, and tool settings
Retrieved sources and input artefacts where relevant
Output, review steps, and downstream action

Speaker note:
Talk through the minimum evidence fields as a proportionate baseline. The point is not to capture everything, but to preserve enough information to answer later questions about what was done, what sources were used, and whether required checks occurred. Mention that higher-risk sectors may require richer fields.

Visual idea:
Table or checklist showing minimum metadata fields in the evidence pack.

Link to RAIDT:
This slide directly supports evidence readiness, minimum metadata, reconstructability, and traceability in RAIDT scoring.

Citation support to mention if asked:
Standards and governance sources such as ISO/IEC 42001, NIST AI RMF, and documentation research.

Slide 5 — how evidence supports the five pillars

Purpose:
Connect the evidence logic to RAIDT's scoring model.

Key message:
The five RAIDT pillars can only be scored credibly when the run leaves inspectable evidence.

Slide content:

Responsibility needs role and approval evidence
Auditability needs inspectable records
Interpretability needs understandable prompt and context data
Dependability and Traceability need checks, links, and versions

Speaker note:
Make explicit that evidence is the substrate for scoring. Without evidence, a score risks becoming a subjective impression. With evidence, the organisation can justify why a run was strong or weak on each pillar and can compare results across teams, tools, and sectors.

Visual idea:
Five-column pillar table with evidence types mapped to each pillar.

Link to RAIDT:
This slide ties the note directly to RAIDT's second practical output, the five-pillar score profile.

Citation support to mention if asked:
Accountability, auditability, and assurance literature, plus governance standards on traceability and review controls.

Slide 6 — practical organisational examples

Purpose:
Show how the concept appears in live GenAI governance situations.

Key message:
Run-level evidence is useful across varied organisational tasks because it clarifies what happened and what should change.

Slide content:

HR promotion summary drafting
Customer complaint response with RAG
Procurement comparison of supplier bids
University guidance drafting with internal chatbot

Speaker note:
Use one or two examples in detail and keep the rest as quick illustrations. The point is to show that the concept generalises across organisational settings while remaining sensitive to context, review burden, and sector-specific risk. This also helps workshop participants connect the theory to their own practice.

Visual idea:
Four-panel use-case grid with one-line evidence question under each case.

Link to RAIDT:
This slide supports sector playbook development and shows how run-level evidence enables targeted governance interventions.

Citation support to mention if asked:
Organisational AI adoption studies, sector assurance guidance, and empirical work on human review of AI outputs.

Slide 7 — links to papers, validation, and policy

Purpose:
Explain why this star matters to the wider RAIDT research programme.

Key message:
Run-level evidence logic connects RAIDT's theory, empirical testing, and policy translation.

Slide content:

Paper 08 defines the governance unit and proof-object logic
Paper 09 tests evidence quality and scoring consistency
Paper 10 translates the logic into policy pathways
Sector playbooks adapt evidence requirements by context

Speaker note:
Position this star as a bridge note. It gives the thesis a coherent thread from concept to method to implementation. It also makes RAIDT legible to different audiences: supervisors can see the theory contribution, empirical readers can see what is testable, and policy readers can see how abstract obligations become operational practices.

Visual idea:
Three-stage pathway diagram: foundations, validation, policy, with sector playbooks underneath.

Link to RAIDT:
This slide shows how S3 contributes across the whole RAIDT programme rather than only within one ring.

Citation support to mention if asked:
Methodology literature on operationalisation and policy sources such as the EU AI Act and NIST AI RMF.

Slide 8 — limits and strategic value

Purpose:
End with a balanced account of what the concept can and cannot do.

Key message:
Run-level evidence does not solve governance by itself, but it makes governance testable, challengeable, and improvable.

Slide content:

Evidence does not guarantee fairness or legality
Some runs cannot be replayed exactly
Proportionality matters because evidence capture has costs
RAIDT's value is operational clarity, not total control

Speaker note:
Close by stressing proportionality and realism. This avoids overstating the concept and strengthens the academic argument. RAIDT is not claiming perfect reproducibility or perfect accountability; it is claiming that meaningful governance requires structured run-level evidence if organisations want to inspect, challenge, and improve AI use in work settings.

Visual idea:
Boundary diagram showing what evidence logic enables and what still requires judgement, policy, and organisational design.

Link to RAIDT:
This slide reinforces RAIDT as a practical governance framework with explicit limits, which strengthens both supervisory discussion and policy relevance.

Citation support to mention if asked:
Critical Responsible AI literature on documentation limits, plus governance and assurance work on proportional control design.