S11.07 - Evidence_capture_feasibility

S11.07 ? Evidence capture feasibility

flowchart LR
    A[Closed platforms and missing metadata] --> B[RAIDT
run-level evidence framework] H[Prompts, timestamps, outputs,
review notes, wrappers, logs] --> C[[Evidence capture feasibility
can the run be reconstructed?]] B --> C C --> D[Evidence pack] C --> E[RAIDT score profile] D --> F[Reviewer reconstruction
and contestability] E --> G[Governance readiness
and organisational learning] I[Procurement and implementation choices] --> C

? Star S11 - Boundaries, Limitations and Future Questions

Star context: Clarifies a practical boundary of RAIDT by showing that governance quality depends partly on whether a platform, workflow, or organisational setting can actually produce the evidence needed for run-level review.


Academic picture
Definition / background

Evidence capture feasibility is the practical question of whether enough relevant evidence can be recorded for a particular generative AI run to be reconstructed, reviewed, and evaluated. In RAIDT, the issue is not whether an organisation would like better documentation in principle, but whether the technical platform, workflow design, and organisational controls make evidence capture possible at the level of the individual run.

The concept matters because RAIDT treats the run as the unit of governance. A run-level evidence pack and a five-pillar score profile depend on the existence of a usable evidential record. If prompts, settings, source materials, outputs, review actions, or timestamps cannot be captured reliably, then the organisation cannot fully justify its governance claims for that run. In that sense, evidence capture feasibility is a condition of governance visibility.

This concept is different from general logging, transparency, or documentation quality. Logging may exist but still be infeasible for governance purposes if it omits contextual details, human interventions, or output versions. Transparency may be claimed at the vendor or policy level without giving an organisation access to the artefacts needed to reconstruct one concrete run. Evidence capture feasibility therefore sits between infrastructure capability and governance method: it asks whether the environment can support RAIDT's evidential demands.

Within RAIDT, the concept belongs in Boundaries, Limitations and Future Questions because it prevents overclaiming. RAIDT does not solve missing evidence by rhetorical force. If a platform does not expose sufficient metadata, if review steps occur outside the system, or if implementation is weak, RAIDT makes that limitation visible. The framework remains useful precisely because it can show when low Auditability or Traceability scores reflect a real evidence gap rather than a failure of interpretation.

Why this concept matters

Evidence capture feasibility matters because many organisations adopt generative AI tools whose evidential affordances are uneven, opaque, or poorly aligned with governance requirements. A governance framework that ignores this issue risks assuming that evidence can always be produced after the fact. In practice, many disputes arise only once a problematic output, contested decision, or review request forces the organisation to discover what was never captured.

The concept also prevents a common confusion between governance design and governance executability. An organisation may have a strong policy, a clear responsible-use statement, and a well-written assurance narrative, yet still be unable to reconstruct a run because its toolchain does not retain prompts, versioned outputs, or reviewer actions. RAIDT uses evidence capture feasibility to separate aspirational governance from operationally supportable governance.

If this concept is missing, organisations may overestimate their audit readiness, underestimate procurement risk, and misinterpret weak evidence as a minor documentation inconvenience rather than a structural limitation. By foregrounding feasibility, RAIDT helps move governance from principle statements to realistic operational judgement.

Key idea: Evidence capture feasibility matters because RAIDT can govern only what an organisation can meaningfully evidence at the level of the individual run.

What this item explains
Practical example / likely audience question

Audience question

What if the platform cannot log everything?

Answer

The concern behind this question is that RAIDT might appear to assume ideal technical visibility. The direct answer is no: RAIDT does not require perfection, but it does require that evidence limitations be made explicit rather than hidden. If a platform cannot log everything, the missing evidence becomes a governance fact about that implementation environment.

For example, an organisation may use a vendor chatbot that stores final outputs but does not retain prompt history, configuration details, or reviewer edits. In that case, RAIDT can still be applied, but the resulting evidence pack will be thinner and the score profile should reflect that limitation, especially in Auditability and Traceability. The issue is not that RAIDT has failed. The issue is that the platform does not support the level of evidence capture needed for stronger governance assurance.

This is where RAIDT is more useful than a generic AI governance approach. A generic approach may stop at recommending better documentation. RAIDT turns the limitation into an assessable governance finding. It shows that the gap may need to be addressed through procurement requirements, wrapper design, workflow redesign, logging infrastructure, or policy constraints on which tools are acceptable for certain classes of work.

Practical example in RAIDT terms

Consider an enterprise productivity setting in which staff use a generative AI assistant to draft contract summaries for internal procurement teams. The use case is attractive because it speeds up first-pass review of supplier terms, but the run-level issue is that the chosen platform only stores the final generated summary and a timestamp. It does not preserve the original prompt, attached contract excerpt, model settings, or the sequence of edits made by the employee before the summary is circulated.

The evidence needed for stronger RAIDT governance would include the task purpose, the source clause text supplied to the model, the prompt or instruction template, the model and version used, the generated draft, any employee edits, review comments from legal staff, and the final decision on whether the summary could be relied upon. Responsibility is affected because accountability for review and sign-off becomes harder to demonstrate. Auditability is affected because a later reviewer cannot reconstruct how the summary emerged. Interpretability is affected because the reasoning context of the output is under-documented. Dependability is affected because recurring output quality problems cannot be analysed properly across runs. Traceability is affected because the chain from source text to generated and approved output is incomplete.

In governance-readiness terms, evidence capture feasibility improves the organisation's position when the tool is wrapped with structured templates and logging controls, or when staff are required to submit source excerpts, prompts, and reviewer notes into an evidence form before relying on the output. RAIDT therefore makes the limitation actionable: it identifies what additional evidence infrastructure is required before the workflow can be treated as strongly governable.

Detailed link to RAIDT

Evidence capture feasibility links to RAIDT in four ways.

First, it tests RAIDT's core idea that responsible governance should be grounded in evidence from actual runs rather than broad claims about tools or policies.

Second, it determines whether the run can function as a meaningful unit of governance, because a run that cannot be evidenced adequately cannot be reviewed in depth.

Third, it shapes the quality of the evidence pack and the confidence with which a RAIDT score profile can be justified across the five pillars.

Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning by revealing where evidence infrastructure is sufficient and where governance is being constrained by technical or process limitations.

Evidence capture feasibility ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

Link to the five RAIDT pillars

Responsibility

Evidence capture feasibility supports Responsibility by determining whether the organisation can show who initiated, reviewed, approved, or relied on a run and under what authority.

Example evidence / implication:

Auditability

This item has a very strong effect on Auditability because poor evidence capture directly weakens the ability of another person to reconstruct and inspect the run after the event.

Example evidence / implication:

Interpretability

Evidence capture feasibility affects Interpretability because explanation of an output depends partly on whether the practical conditions of generation were recorded.

Example evidence / implication:

Dependability

This item affects Dependability by influencing whether repeated failures, inconsistencies, or process weaknesses can be detected across comparable runs.

Example evidence / implication:

Traceability

Evidence capture feasibility is especially central to Traceability because it determines whether the organisation can connect the run to its inputs, outputs, actors, timing, and downstream use.

Example evidence / implication:

Evidence capture feasibility affects all five pillars, but it is most immediately decisive for Auditability and Traceability because those pillars deteriorate quickly when run artefacts cannot be retained or reconstructed.

Why this item is more than a generic concept

In general AI governance, evidence capture feasibility might mean whether an organisation has enough documentation or logging to support oversight in a broad sense. In RAIDT, it has a more precise and operational meaning: whether one specific run can yield enough structured evidence to support an evidence pack, justify a five-pillar score profile, and withstand meaningful review.

The RAIDT meaning is more operational because it does not treat missing evidence as a vague implementation problem. It treats it as a measurable governance limitation attached to concrete runs, workflows, platforms, and deployment choices. That makes the concept useful for practice, procurement, and audit readiness rather than for abstract discussion alone.

Common misunderstanding

Misunderstanding

If evidence capture feasibility is low, RAIDT cannot be used.

Correction

Low feasibility does not make RAIDT irrelevant; it makes RAIDT diagnostically important. The framework can still assess the run, but it should show that the governance weakness lies in limited capture capability. For example, if a public-sector chatbot records only a final answer and not the prompt or source retrieval context, RAIDT can still be applied to that constrained record. The resulting assessment should then state clearly that weak Auditability and Traceability arise from missing evidence infrastructure. This is valuable because it turns an invisible limitation into an actionable governance finding.

Boundary and limitation

Evidence capture feasibility does not guarantee that the captured evidence is accurate, sufficient for every purpose, or ethically uncomplicated. A platform may expose many logs while still omitting crucial contextual meaning, and extensive capture may introduce privacy, retention, or proportionality concerns. The concept therefore does not prove governance quality by itself.

It also does not replace wider governance tasks such as model evaluation, legal review, procurement due diligence, workflow design, or staff training. Evidence capture feasibility tells us whether the conditions exist for reconstructable run-level evidence, not whether the underlying model is correct, fair, or safe in general.

The concept may fail in settings where evidence is distributed across multiple tools, where human actions occur off-platform, or where vendor restrictions prevent access to needed artefacts. RAIDT handles this by treating incomplete capture as part of the assessment itself. The framework does not conceal the limitation; it exposes it and shows where implementation or procurement change is needed.

Implementation levels

Manual implementation

A researcher or small team can apply this concept manually by using a structured run sheet that records prompts, source material, outputs, timestamps, reviewer identity, and decision notes outside the platform when the platform itself is limited. This is burdensome but often sufficient for pilot studies, viva examples, or small-scale governance trials.

Semi-automated implementation

Semi-automated implementation can use templates, wrappers, forms, and workflow checkpoints that capture some metadata automatically while asking users to complete contextual fields manually. For example, a browser-based wrapper around a GenAI tool might save prompt text and output versions while a review form records purpose, reliance level, and approval status.

Fully automated implementation

At scale, a platform, orchestration layer, or governance pipeline can automatically capture run identifiers, model details, prompt templates, source references, outputs, review actions, and scoring inputs. A dashboard can then assemble evidence packs, flag low-feasibility workflows, and inform procurement or architecture decisions about which tools are acceptable for high-accountability uses.

Practical use in the RAIDT project

Within the RAIDT project, this item is important for Paper 08 Foundations because it clarifies a central boundary condition: RAIDT depends on evidential access to runs, but it does not assume that such access is universally available. The concept therefore sharpens the framework's realism and guards against overclaiming about what responsible governance can achieve in closed or weakly instrumented environments.

For Paper 09 Empirical Validation, evidence capture feasibility is likely to be a major empirical variable. It can explain differences in score confidence across settings and help distinguish whether weak governance readiness is due to poor practice, poor tooling, or both. This is especially useful when comparing pilots, sector workflows, or implementation models.

For Paper 10 Policy Pathways, the item provides a route from conceptual governance to procurement and implementation guidance. It can inform policy recommendations about minimum evidence-capture requirements, acceptable platform features, documentation obligations, and proportional controls for different risk classes. It is also directly useful in sector playbooks, evidence-pack design, scoring-rubric justification, supervisor explanation, viva defence, and journal positioning because it shows that RAIDT is attentive to infrastructural constraints rather than assuming frictionless accountability.

Key audience questions to prepare for

Q1. Does evidence capture feasibility mean every GenAI tool must record everything?

No. RAIDT does not require maximal logging in every case. It requires sufficient, proportionate evidence for the level of governance scrutiny that the task demands. The key issue is adequacy for reconstruction and review, not indiscriminate data capture.

Q2. Is low feasibility mainly a technical issue?

Not only. It can be technical, but it can also arise from procurement choices, workflow design, weak review processes, fragmented tool use, or organisational decisions not to retain key artefacts. RAIDT treats all of these as governance-relevant causes.

Q3. Why not just compensate for missing platform logs with policy statements?

Because policy cannot recreate missing run artefacts after the event. A strong policy may explain intended practice, but it cannot prove what happened in one contested or high-stakes run if the evidence was never captured.

Q4. How does this help with viva or supervisor scrutiny?

It shows that RAIDT is not naively claiming universal observability. You can explain that the framework is designed to reveal when governance is limited by evidence infrastructure and to convert that limitation into a practical recommendation about implementation or procurement.

Q5. What is the governance value of scoring a run when evidence is incomplete?

The value lies in making incompleteness visible and assessable. A low-confidence or lower-scoring assessment can still show where a workflow fails to support auditability, traceability, and accountable review, which is often exactly the governance finding an organisation needs.

Suggested citation concepts to support this item
Short explanation for presentation

Evidence capture feasibility asks a simple but crucial question inside RAIDT: can this GenAI run actually be evidenced well enough to support review? RAIDT treats the run as the unit of governance, so the framework depends on being able to capture prompts, inputs, outputs, settings, review actions, and other context in a reconstructable way. If a platform hides those artefacts or a workflow fails to retain them, RAIDT does not ignore the gap. Instead, it turns that limitation into a governance finding, often reflected in weaker Auditability and Traceability. This makes the concept especially important for procurement, implementation design, and audit readiness. In short, evidence capture feasibility keeps RAIDT realistic: it shows that responsible governance depends not only on having principles, but on having the practical means to produce run-level evidence.

One-line takeaway

Evidence capture feasibility is the practical possibility of recording enough of a GenAI run to support RAIDT's evidence pack, score profile, and governance readiness judgement.

Related items in boundaries, limitations and future questions
Anchored questions

No anchored questions are currently listed in the source item.

Powered by Forestry.md