Q051 - How_does_RAIDT_evidence_responsibility_in_one_run

Q051 — How does RAIDT evidence responsibility in one run?

← RAIDT · Star S5 - RAIDT Pillars and Scoring · primary item: S5.01 · Responsibility

Responsibility becomes credible only when ownership, review, and reliance are preserved inside the run-level evidence pack.

Appears in sources
Answer

RAIDT evidences Responsibility in one run by scoring the run-level evidence pack rather than the apparent quality of the output. The scoring appendix is explicit that the run is the scored object: one configured use for a task, at a specific time and context, with its prompt, model and tool configuration, retrieved context where relevant, output, checks and follow-on decisions. For Responsibility, evidence must show not only what the model produced, but also what constraints, checks and oversight governed that production and subsequent use.

In practice, a Responsibility judgement in one run draws on both behavioural evidence and process evidence. Behavioural evidence includes the actual output, any uncertainty statements, and whether limitations were communicated. Process evidence includes role constraints, policy references, safety or compliance checks, policy layer identifiers, reviewer sign-off, edits, exceptions, and escalation records. The evidence review paper strengthens this point by describing run-level evidence objects as bundles that preserve context-of-use, configuration provenance, outputs and oversight decisions in a reviewable form. That means Responsibility is evidenced when a reviewer can reconstruct why the run was permitted, what controls were active, who reviewed it, and whether the output was authorised for downstream use. Influence methods as governance interventions also matter here, but only if they leave an evidence trail. Responsibility is therefore demonstrated through inspectable artefacts anchored to the run, not inferred from fluent language or post hoc narrative.

Practical example

Consider the HR scenario from the evidence-review paper: a manager uses a GenAI assistant to draft a performance appraisal. To evidence Responsibility in that single run, the pack should record the appraisal purpose, the manager?s role, the prompt template version, any relevant HR policy text retrieved, the generated draft, and the manager?s review and approval actions before the appraisal is filed.

If an employee later disputes the appraisal, reviewers should be able to see whether the system was used only for drafting, whether sensitive or prohibited criteria were excluded, whether uncertainty or caveats were surfaced, and whether a human decision-maker actually approved the final text. Without those run-specific artefacts, the organisation cannot credibly evidence Responsibility.

Sources in RAIDT papers
Powered by Forestry.md