S1.06 - Audit_and_accountability_lineage

S1.06 - Audit and accountability lineage

flowchart LR
    A[Audit traditions and organisational accountability] --> B[RAIDT - run-level evidence framework]
    A2[Principle-only AI governance is hard to inspect] --> B
    P1[Healthcare drafting] --> C[[Audit and accountability lineage]]
    P2[Finance summarisation] --> C
    P3[Public-service casework] --> C
    P4[Logging and orchestration tools] --> C
    B --> C
    C --> D[Run-level evidence pack]
    C --> E[RAIDT score profile]
    C --> H[Evidence over assertion]
    D --> F[Reviewer reconstruction]
    E --> G[Governance readiness]
    D --> I[Organisational learning]
    H --> G

Star S1 - Origins, Background and History

Star context: Explains why RAIDT emerged from Responsible AI, managerial uncertainty, IS governance, audit traditions, and GenAI operational pressure, showing how audit logic becomes run-level governance evidence rather than a purely rhetorical commitment.


Academic picture
Definition / background

Audit and accountability lineage refers to the intellectual and practical inheritance RAIDT draws from audit, assurance, record-keeping, internal control, and answerability traditions. These traditions share a common premise: if an action, recommendation, or decision may later need to be explained, challenged, or justified, durable evidence should exist to support independent review.

Within generative AI governance, this lineage matters because many organisational uses of GenAI are probabilistic, configurable, and context-sensitive. A model output cannot be governed adequately by broad principles alone, because the relevant governance question is often not simply whether the system was allowed, but what happened in a particular use instance. RAIDT responds by treating the run as the unit of governance and by specifying what evidence should exist for that run.

This item is therefore distinct from a generic appeal to accountability. Accountability in a broad policy sense may refer to responsibility, liability, or ethical oversight. Audit lineage is narrower and more operational: it concerns what must be recorded, how review is made possible, and how an external or internal reviewer can reconstruct whether the run was conducted appropriately. RAIDT places this lineage inside the design of the evidence pack and the score profile so that answerability becomes inspectable rather than merely declarative.

It belongs inside RAIDT because RAIDT moves from abstract AI governance principles towards evidence-based governance. The run-level evidence pack embodies audit lineage by documenting configuration, purpose, timing, human roles, outputs, interventions, and control points. The five-pillar score profile then summarises how well that run supports Responsibility, Auditability, Interpretability, Dependability, and Traceability.

Why this concept matters

This concept matters because organisations increasingly use GenAI in settings where they may later need to explain what was done, by whom, under what conditions, and with what safeguards. Without an audit and accountability lineage, governance language can remain aspirational while operational practice stays opaque.

The concept solves a recurring problem in AI governance: the gap between policy-level commitment and case-level review. Many organisations can state that they use AI responsibly, but far fewer can reconstruct a single consequential use in a way that supports challenge, learning, or audit. RAIDT reduces that gap by connecting accountability to observable run evidence.

It also avoids the confusion that accountability is identical to blame allocation after something goes wrong. In RAIDT, accountability is enabled earlier and more constructively: evidence is assembled so that users, reviewers, managers, and auditors can inspect how a run was configured and handled before disputes become crises.

If this lineage is missing, organisations face several risks: weak contestability, poor incident investigation, limited assurance for senior decision-makers, difficulty demonstrating compliance, and shallow organisational learning. The result is governance by assertion rather than governance by evidence.

Key idea: Audit and accountability lineage matters because RAIDT turns the longstanding demand for answerable records into run-level evidence that can actually be reviewed, challenged, and improved.

What this item explains
Practical example / likely audience question

Audience question

Why use audit language?

Answer

The underlying concern behind this question is often that audit language sounds narrow, bureaucratic, or overly associated with financial compliance. The direct answer is that RAIDT uses audit language because governance becomes credible when independent reviewers can inspect durable evidence rather than rely on assurances from developers, vendors, or users.

A practical example is a GenAI-assisted case summary prepared for a public-service decision. If the summary is later challenged, a generic AI governance approach may only show that the organisation had a policy, conducted some training, and approved the tool category. RAIDT handles the issue better because it asks what evidence exists for that specific run: the model or service used, the prompt structure, the source material, the human reviewer, the edits made, the output retained, and the basis for sign-off. Audit language is therefore not rhetorical decoration; it identifies the minimum conditions for credible review.

In this sense, RAIDT borrows the strongest feature of audit traditions: answerability depends on records that allow another person to reconstruct and evaluate what occurred. That is why audit language is appropriate for GenAI governance when the goal is reviewability, contestability, and organisational readiness.

Practical example in RAIDT terms

Consider a healthcare organisation using a GenAI tool to draft outpatient follow-up letters from clinician notes. The use case seems low-friction, but the run-level issue is significant: a single run may omit a medication instruction, overstate a diagnosis, or introduce wording that was not present in the source notes.

In RAIDT terms, the evidence needed for that run would include the task purpose, model and version, prompt template, date and time, source-note provenance, any patient-data handling constraints, the identity or role of the human reviewer, edits made before approval, and the final issued text. The evidence pack would preserve these details so a reviewer could later understand whether the output was used appropriately.

The most affected pillars would be Auditability and Traceability, with strong implications for Responsibility and Dependability as well. Auditability matters because a clinical governance reviewer may need to inspect the run after a complaint. Traceability matters because the organisation must connect the final letter to the exact configuration and review chain. Responsibility matters because human oversight and sign-off must be clear. Dependability matters because repeated failure patterns across runs may indicate that the workflow is not stable enough for clinical use.

By placing this use case inside RAIDT, the organisation moves from general assurance that it has an AI policy to practical governance readiness for the specific clinical drafting event.

Detailed link to RAIDT

Audit and accountability lineage links to RAIDT in four ways.

First, it connects directly to RAIDT's core idea that GenAI governance should be grounded in reviewable evidence, not only in abstract principles or institutional claims.
Second, it supports RAIDT's focus on the run as the unit that must be reconstructable, because accountability questions usually arise around a specific configured use.
Third, it shapes the design of the evidence pack and the score profile by defining what kinds of records, controls, and review traces need to exist.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning by making each run available for retrospective examination and improvement.

Audit and accountability lineage -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness

This chain is important because it shows that the historical logic of audit is not left in the background. RAIDT operationalises it through the artefacts and assessments that make GenAI use inspectable in practice.

Link to the five RAIDT pillars

Responsibility

Audit and accountability lineage strengthens Responsibility by clarifying who initiated, reviewed, approved, and acted on a GenAI run. It makes role allocation visible instead of assumed.

Example evidence / implication:

Auditability

This item has one of its strongest effects on Auditability. It defines the expectation that a run leaves a record that another reviewer can inspect without relying on memory or informal explanation.

Example evidence / implication:

Interpretability

Audit lineage supports Interpretability indirectly by requiring enough contextual documentation for reviewers to understand what the system was asked to do and how the output should be read in context.

Example evidence / implication:

Dependability

Dependability is supported when repeated run evidence reveals patterns of stable or unstable performance. Audit lineage helps organisations assess whether the workflow is reliable enough for operational use.

Example evidence / implication:

Traceability

This item also strongly affects Traceability because accountability depends on being able to connect an output back to the exact model, settings, timing, inputs, and human interventions associated with that run.

Example evidence / implication:

Why this item is more than a generic concept

In general AI governance, audit and accountability lineage may simply mean that organisations should take accountability seriously and maintain some form of oversight. In RAIDT, it means something much more operational: each configured run should leave enough structured evidence for a reviewer to reconstruct what happened and judge whether governance expectations were met.

The RAIDT meaning is more operational because it is tied to run-level evidence, evidence packs, and scoring. It is therefore not satisfied by a policy statement, an annual review, or vendor assurance alone. It is satisfied when a specific use can be inspected in context.

Common misunderstanding

Misunderstanding

Audit and accountability lineage means importing a heavy, punitive compliance bureaucracy into every GenAI interaction.

Correction

The concept does not require every run to be treated like a formal financial audit. It requires governance to be proportionate but reviewable. A low-risk internal drafting run may need lightweight metadata and reviewer confirmation, while a high-impact decision-support run may require fuller evidence capture and stricter sign-off. RAIDT handles this pragmatically by tying expectations to the run context and by using the evidence pack and score profile to scale governance effort appropriately.

Boundary and limitation

This item does not prove that a GenAI output is correct, fair, lawful, or beneficial. It also does not replace substantive evaluation of model quality, domain expertise, or institutional governance. Audit lineage can show what happened and whether evidence exists, but evidence alone does not guarantee good judgement.

It may also fail if evidence capture is incomplete, if logging is selective, if human reviewers treat sign-off as a rubber-stamp exercise, or if the run boundary is defined too loosely. RAIDT handles this limitation by pairing audit lineage with pillar-based assessment, clearer run specification, and a structured evidence pack that can expose where documentation or controls are weak.

Implementation levels

Manual implementation

A researcher or small team can apply this item manually by using a run template, saving prompts and outputs, recording who reviewed the result, and storing short notes on purpose, context, and any corrective action.

Semi-automated implementation

Semi-automated implementation adds structured metadata fields, evidence-pack templates, form-based review steps, and lightweight dashboards that make it easier to capture run details consistently without relying on free-text notes alone.

Fully automated implementation

At scale, a platform, wrapper, orchestration layer, or governance pipeline can capture model identifiers, timestamps, prompt versions, output snapshots, reviewer actions, and workflow events automatically. RAIDT then becomes part of the operating environment rather than an after-the-fact documentation exercise.

Practical use in the RAIDT project

This item is useful across the RAIDT project because it explains why the framework is framed as an evidence-oriented governance framework rather than as a set of broad AI principles. In Paper 08 Foundations, it helps justify the theoretical move from responsible AI discourse towards run-level accountability. In Paper 09 Empirical Validation, it supports analysis of whether practitioners find evidence packs and score profiles credible as governance artefacts. In Paper 10 Policy Pathways, it helps show how RAIDT can connect organisational practice to assurance, compliance, and review expectations without reducing governance to regulation alone.

It is also useful for sector playbooks and governance interventions because different domains already understand audit in different ways. This item helps translate RAIDT into those settings by showing that the framework respects established assurance traditions while adapting them to GenAI operations. For supervision, viva defence, and journal positioning, the item clarifies that RAIDT is not an abstract ethics layer; it is a practical governance architecture built around reviewable evidence.

Key audience questions to prepare for

Q1. Is RAIDT just repackaging traditional audit for AI?

No. RAIDT borrows the logic of auditability and answerable records, but it applies that logic to the configured run of a probabilistic GenAI system. The contribution is not the word audit by itself; it is the operational translation of audit lineage into run-level evidence, evidence packs, and pillar-based scoring.

Q2. Why is accountability discussed at the run level rather than only at system level?

Because many real governance disputes arise from a specific use in a specific context. System-level documentation is necessary but insufficient when the relevant question is what happened in one consequential instance. The run level is where configuration, human action, and output come together.

Q3. Does this approach increase bureaucracy for ordinary users?

It can if implemented badly. RAIDT is intended to be proportionate. Lower-risk uses can have lightweight capture, while higher-risk runs require stronger evidence and review. The aim is not paperwork for its own sake, but targeted evidence that supports credible governance.

Q4. How does this help with contestability?

Contestability depends on being able to revisit a concrete case. If a person affected by an output wants to challenge it, the organisation needs more than a policy statement. It needs the records that show how the run was configured, reviewed, and acted upon.

Q5. Why not rely on vendor logs or model cards instead?

Vendor artefacts are useful, but they rarely capture the full organisational context of use. RAIDT adds the local context: task purpose, prompt choices, reviewer decisions, workflow events, and organisational controls. That local evidence is what makes accountability meaningful in practice.

Suggested citation concepts to support this item
Short explanation for presentation

Audit and accountability lineage explains why RAIDT is framed as an evidence-oriented governance framework rather than as a set of broad AI principles. RAIDT inherits from audit traditions the idea that actions become governable when they leave records that others can inspect. For generative AI, that logic has to be applied at the level of the run, because risk, configuration, and human intervention often vary from one use to the next. RAIDT therefore turns accountability into something operational: a run-level evidence pack records what was done, under what conditions, and with what controls, while the score profile summarises governance strength across five pillars. The result is a practical bridge from policy aspiration to reviewability, contestability, and audit readiness.

One-line takeaway

Audit and accountability lineage is RAIDT's inheritance from audit and assurance traditions because it makes each GenAI run governable through reviewable evidence rather than unsupported assertion.

Related items in star s1 (9)
Anchored questions

Audience question: Why use audit language? Answer: because governance becomes credible when independent reviewers can inspect durable evidence.

Powered by Forestry.md