S3.04 - Evidence_at_point_of_use

S3.04 ? Evidence at point of use

flowchart LR
    A[Retrospective reporting
and memory-based accounts] --> B[RAIDT
run-level evidence framework]
    P[Prompts timestamps
model IDs sources approvals] --> C[[Evidence at point of use]]
    Q[Healthcare finance education
public administration workflows] --> C
    B --> C
    C --> D[Run-level evidence pack]
    C --> E[RAIDT score profile]
    C --> H[Reviewability contestability
audit readiness]
    D --> F[Reviewer reconstruction]
    E --> G[Governance readiness]
    H --> G

Star S3 - Run-Level Evidence Logic

Star context: Positions evidence capture at the moment of action within RAIDT's run-level proof logic, so that each run can later be reconstructed, compared, challenged, and governed on the basis of artefacts rather than retrospective narrative.


Academic picture
Definition / background

Evidence at point of use means that the relevant artefacts, metadata, decisions, and contextual markers of a generative AI run are captured at the time the run occurs, or as close to that moment as the workflow allows. In RAIDT, this is a foundational operational principle because the framework governs a concrete run rather than an abstract system claim. A run is one configured use of a GenAI system for a specific task, at a specific time, in a specific context; if the evidence is not captured at that point, later claims about what happened become harder to verify and easier to contest.

Conceptually, the idea draws on long-standing governance concerns around contemporaneous records, provenance, audit trails, and evidential integrity. What RAIDT adds is a run-level formulation tailored to generative AI work. The issue is not simply whether an organisation has a policy, a model card, or a general assurance statement. The issue is whether a particular use of a GenAI system can be reconstructed from evidence that was generated during the run itself.

This distinguishes evidence at point of use from retrospective reporting. A report written afterwards may still be useful, but in RAIDT it is secondary evidence unless it points back to the primary artefacts generated in the run. Those artefacts may include the task framing, prompt or instruction set, user identity or role, model and configuration details, timestamps, source materials, retrieved context, output versions, approval actions, and any exceptions or interventions. The closer these are captured to the point of use, the stronger the evidential basis for review.

Within RAIDT, this matters because the run-level evidence pack and the five-pillar score profile depend on evidence quality, not merely on policy aspiration. Responsibility, Auditability, Interpretability, Dependability, and Traceability can only be scored credibly when there is stable run-level evidence to examine. Evidence at point of use therefore belongs inside RAIDT as a core condition for operational governance, not as a peripheral documentation preference.

Why this concept matters

This concept addresses a persistent governance failure in organisational GenAI use: the tendency to explain high-stakes runs after the event without preserving the materials needed to verify those explanations. When that happens, review becomes dependent on memory, convenience, and selective summary. RAIDT avoids this by insisting that evidence should arise from the run itself.

The concept also prevents confusion between documentation and evidence. Many organisations produce documents about AI use, but those documents do not necessarily prove what happened in a given run. Evidence at point of use narrows that gap by treating prompts, retrieved sources, model settings, timestamps, human interventions, and decision checkpoints as primary artefacts rather than optional administrative notes.

If this discipline is missing, several risks appear at once: weak audit readiness, poor contestability, limited reconstruction after incidents, reduced confidence in scoring, and reduced organisational learning. A governance framework may look mature at policy level while remaining fragile in practice because it cannot evidence how a particular run unfolded.

For organisations using GenAI in professional work, the concept turns governance from principle to operation. It supports reviewable use, proportionate oversight, and defensible decisions about whether a run was acceptable, improvable, or unacceptable.

Key idea: evidence at point of use matters because RAIDT can only govern a run credibly if the evidence is captured while that run is actually happening.

What this item captures
Practical example / likely audience question

Audience question

Why not write a report afterward?

Answer

The concern behind this question is understandable: if staff can explain later what they did, it may seem unnecessary to capture detailed evidence during the run itself. The difficulty is that retrospective reporting is vulnerable to omission, simplification, hindsight bias, and unstable identifiers. People may remember the general aim of a run while forgetting the exact prompt wording, retrieved context, model version, approval path, or intermediate output that shaped the final result.

The direct answer is that a report written afterwards is useful only when it points back to the artefacts created during the run. In RAIDT terms, the report is not a substitute for run-level evidence; it is an interpretive layer built on that evidence. Without the underlying artefacts, reviewers cannot reliably reconstruct the run, compare it with similar runs, or challenge whether the account is accurate.

A practical example is a team using a GenAI tool to draft a supplier-risk summary. If the team later writes, "the model suggested moderate risk based on the available documents," that statement is too weak on its own. RAIDT would ask what documents were retrieved, which model and configuration were used, what the prompt requested, what output was first produced, what human edits followed, and who approved the final use. Generic AI governance often stops at policy compliance or broad usage guidance. RAIDT handles the issue better because it ties governance to evidence from the actual run, making post-run explanation accountable to recorded artefacts.

Practical example in RAIDT terms

Consider a healthcare setting where a clinician uses an approved GenAI assistant to draft a discharge summary from structured notes and recent observations. The run-level issue is that the summary may influence patient communication and continuity of care, yet the final text alone does not reveal how it was produced.

Evidence at point of use would require capture of the prompt template, model identifier and version, time of use, clinician role, source records accessed, any retrieval or context window used, draft output, subsequent edits, approval or sign-off step, and the final stored document reference. If an anomaly later appears, reviewers can test whether the model introduced unsupported wording, whether the source record was incomplete, or whether a human editor removed an important warning.

The most affected RAIDT pillars are Auditability and Traceability, with strong implications for Responsibility and Dependability. Auditability improves because the run can be reconstructed. Traceability improves because the final summary is linked to identifiable run artefacts. Responsibility improves because human roles and approvals are visible. Dependability improves because recurrent failure modes can be detected across runs. This makes the organisation more governance-ready than a process in which only the finished document survives.

Detailed link to RAIDT

Evidence at point of use links to RAIDT in four ways.

First, it operationalises RAIDT's core idea that governance should focus on what happened in an actual run rather than on general claims about a system.
Second, it supplies the contemporaneous artefacts that make run-level evidence credible and reviewable.
Third, it strengthens both the evidence pack and the RAIDT score profile by giving assessors stable material on which to base judgement.
Fourth, it supports reviewability, contestability, audit readiness, and organisational learning because later review is anchored in recorded artefacts rather than recollection.

Evidence at point of use ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

In this sense, the item is not merely about logging. It is about creating the evidential conditions under which RAIDT can function as a practical governance framework.

Link to the five RAIDT pillars

Responsibility

Evidence at point of use clarifies who initiated, configured, reviewed, approved, or overrode a run. It therefore supports accountable human oversight rather than diffuse responsibility.

Example evidence / implication:

Auditability

This item has a particularly strong effect on Auditability because an audit depends on contemporaneous records, stable identifiers, and retraceable artefacts.

Example evidence / implication:

Interpretability

Evidence at point of use improves interpretability by preserving the context needed to understand why an output appeared and how humans responded to it.

Example evidence / implication:

Dependability

Dependability benefits when repeated run evidence reveals whether a workflow behaves consistently, where it fails, and what controls improve it.

Example evidence / implication:

Traceability

This item also strongly affects Traceability because it links outputs, decisions, and downstream actions back to specific run events and artefacts.

Example evidence / implication:

Why this item is more than a generic concept

In general AI governance, evidence at point of use may be treated as a broad recommendation to keep good records during AI deployment. In RAIDT, it has a narrower and more operational meaning: capture the evidence required to reconstruct and evaluate a specific run as the unit of governance. The RAIDT meaning is therefore more practical because it ties evidence capture directly to run-level artefacts, evidence packs, scoring, and governance readiness.

Common misunderstanding

Misunderstanding

If the organisation keeps a final output and a short usage note, that is enough evidence of what happened.

Correction

A final output and a summary note may describe the end state, but they do not reliably show the process that produced it. For example, keeping only a finished GenAI-generated briefing cannot reveal whether the model relied on an outdated source, whether a human corrected a serious hallucination, or whether the final text was approved under the correct workflow. RAIDT corrects this by requiring evidence from the run itself, so that governance focuses on the chain of production and review rather than on the final artefact alone.

Boundary and limitation

Evidence at point of use does not by itself prove that a run was ethically justified, legally compliant, or substantively correct. It can preserve the trace of a poor decision just as effectively as the trace of a good one. It also depends on the surrounding workflow: if the wrong artefacts are captured, if timestamps are unreliable, or if staff bypass the process, the evidential value weakens.

RAIDT handles this limitation by pairing evidence capture with minimum metadata, audit trail logic, reconstructability, comparability, and scoring. In other words, point-of-use evidence is necessary but not sufficient. It enables governance; it does not replace governance judgement.

Implementation levels

Manual implementation

A researcher or small team can apply this manually by requiring a run sheet or template completed at the moment of use. The template can record the task, prompt, system, model version, source materials, key timestamps, output location, and reviewer decision. Even a disciplined spreadsheet or structured note is better than retrospective free-text reporting.

Semi-automated implementation

A semi-automated approach uses forms, metadata fields, prompt libraries, wrapper interfaces, and structured review checkpoints so that part of the evidence is captured automatically while users complete the remainder. This reduces omission risk and makes evidence packs easier to assemble across multiple runs.

Fully automated implementation

At scale, a platform or orchestration layer can capture prompts, model identifiers, retrieval context, tool calls, timestamps, user roles, approvals, output hashes, and exception events automatically, then route these into dashboards, governance pipelines, and RAIDT scoring workflows. In this implementation, evidence at point of use becomes a built-in property of the operating environment rather than an optional user habit.

Practical use in the RAIDT project

This item is especially useful in the RAIDT project because it bridges conceptual foundations and empirical validation. In Paper 08 Foundations, it helps explain why run-level governance needs contemporaneous evidence rather than generic assurance claims. In Paper 09 Empirical Validation, it supports observable criteria for assessing whether runs are actually reviewable and reconstructable. In Paper 10 Policy Pathways, it shows policymakers and organisational leaders what operational evidence expectations might look like in practice.

It is also central to sector playbooks, evidence pack design, and scoring rubric development. For supervision and viva preparation, the concept provides a clear answer to the question of how RAIDT moves from governance principles to auditable implementation. For journal positioning, it helps distinguish RAIDT from higher-level AI governance models by showing that RAIDT specifies what must be captured at the moment of use.

Key audience questions to prepare for

Q1. Is this just another name for logging?

No. Logging is one possible mechanism, but evidence at point of use is a governance requirement about what must be preserved from the run so that later review is possible. Some relevant evidence may come from logs, some from workflow metadata, some from human approvals, and some from linked artefacts.

Q2. Does RAIDT require every detail of every run to be recorded?

No. RAIDT requires proportionate evidence sufficient for reconstruction, comparison, and challenge. The exact level depends on context, risk, and organisational use case, but the principle is that the evidence must be captured when the run occurs, not improvised later.

Q3. Why is retrospective documentation not enough?

Because retrospective documentation often lacks the exact prompt, context, source state, timing, and intervention record that shaped the run. Without those, reviewers may know what participants say happened without being able to test whether that account is accurate.

Q4. How does this improve the RAIDT score profile?

It improves score credibility. Pillar scores become more defensible when based on run artefacts that show who acted, what system state existed, what output was produced, and how the run was reviewed.

Q5. What if point-of-use capture is technically difficult in a legacy workflow?

RAIDT still helps by making the evidential gap visible. An organisation can begin with manual capture, move to semi-automated controls, and use the identified gaps to prioritise workflow redesign rather than pretending that unsupported retrospective reports are sufficient evidence.

Suggested citation concepts to support this item
Short explanation for presentation

Evidence at point of use means capturing the relevant evidence while a GenAI run is actually happening, rather than trying to reconstruct the run later from memory or summary notes. In RAIDT, this matters because the run is the unit of governance. If prompts, model details, source inputs, timestamps, approvals, and outputs are not preserved at the moment of use, then later review becomes speculative. The concept therefore underpins the evidence pack and strengthens the five-pillar score profile, especially for Auditability and Traceability. Its value is practical: it turns AI governance from broad principle into reviewable organisational action. In supervision, viva, or policy discussion, this item helps explain why RAIDT is more operational than generic governance approaches that rely mainly on policies or post hoc reporting.

One-line takeaway

Evidence at point of use is the contemporaneous capture of run artefacts because RAIDT can only govern a run credibly when the evidence is created during that run.

Related items in run-level evidence logic
Anchored questions
Powered by Forestry.md