S5.01 - Responsibility
S5.01 ? Responsibility
flowchart LR
A[Background problem:
principles without run-level proof] --> B[RAIDT
run-level evidence framework]
A2[Traditional limitation:
unclear authority, weak boundaries,
informal oversight] --> B
B --> C[[Responsibility
justified, bounded, authorised,
and overseen use]]
H[Practical fields:
healthcare, finance, education,
public services, enterprise workflows] --> C
C --> D[Evidence pack:
purpose, limits, approvals,
escalation, safety controls]
C --> E[Score profile:
Responsibility plus four-pillar view]
C --> I[Governance move:
evidence over assertion]
D --> F[Reviewability and contestability]
E --> G[Governance readiness]
D --> J[Organisational learning and policy alignment]? Star S5 - RAIDT Pillars and Scoring
Star context: Responsibility is one of RAIDT's five governance pillars. It defines whether a specific GenAI run was appropriate, bounded, authorised, and overseen, so that scoring reflects accountable use rather than technical performance alone.
Academic picture
Definition / background
Responsibility in RAIDT asks whether a particular run of a generative AI system was appropriate, bounded, safe, authorised, and overseen in relation to the task being performed. It is therefore concerned with accountable use, not simply with whether the model generated a plausible or useful output. The concept matters because a technically successful answer can still be organisationally irresponsible if it was used outside scope, without authority, without safeguards, or without a route for escalation and review.
Conceptually, Responsibility sits at the intersection of governance, risk ownership, and justified action. In general AI governance discourse, responsibility is often discussed at the level of principles, organisational values, or high-level policy statements. RAIDT narrows the focus to the run as the unit of governance. That shift matters because organisations do not govern generative AI in the abstract; they govern situated uses by named roles, for specific tasks, at specific moments, under concrete constraints.
Responsibility differs from adjacent terms. It is not the same as auditability, which concerns whether a reviewer can inspect and reconstruct what happened. It is not the same as traceability, which concerns whether relevant artefacts, decisions, and transformations can be followed across the run. It is also not identical to dependability, which asks whether the process and outcome are sufficiently stable and reliable for the intended use. Responsibility instead asks whether the run should have occurred in the way it did, under the conditions it did, with the oversight it did.
Within RAIDT, Responsibility belongs inside the five-pillar profile because governance readiness requires more than records. A strong evidence pack with weak responsibility controls would still expose the organisation to misuse, unmanaged risk, and contestable decisions. For that reason, Responsibility connects the evidence pack to organisational accountability: it shows whether the run was normatively and procedurally justified, and whether that justification can be defended to supervisors, auditors, managers, and affected stakeholders.
Why this concept matters
Responsibility solves a recurring governance problem in organisational GenAI use: the gap between having general principles and being able to justify one concrete use. Without this concept, teams often assume that approved access to a tool, a broad acceptable-use policy, or a manager's informal endorsement is enough. RAIDT makes that assumption visible and testable by asking for run-specific evidence that the use was appropriate, bounded, and overseen.
The concept also prevents a common confusion between usefulness and legitimacy. A run may save time and produce a convincing output, but still be irresponsible if it involves sensitive information, bypasses review, exceeds delegated authority, or creates downstream risk for staff, clients, patients, students, or citizens. Responsibility therefore acts as a governance filter on model use, not as a quality judgement on text generation alone.
If Responsibility is missing, organisations drift towards principle-washing: they can claim alignment with responsible AI values while lacking evidence that real uses are governed in a disciplined way. By contrast, RAIDT operationalises responsibility through evidence that can be inspected, challenged, and improved over time. That move is central to RAIDT's wider aim of shifting governance from assertion to documented, reviewable practice.
Key idea: Responsibility matters because it shows whether a specific GenAI run was justified and governable, rather than merely possible or productive.
What this item captures
- Whether the run had a legitimate and clearly stated purpose.
- Whether the task fell within an authorised and appropriate use case.
- Whether limits, exclusions, and safety boundaries were defined before or during use.
- Whether accountable human roles were assigned for initiation, review, escalation, and sign-off.
- Whether the run was subject to policy, procedural, legal, ethical, or sector-specific constraints.
- Whether the use of outputs was controlled so that generated material was not treated as self-authorising.
- Whether there is evidence that oversight decisions were made, rather than assumed.
- Whether organisational accountability can be reconstructed if the run is challenged later.
Practical example / likely audience question
Audience question
If RAIDT already records prompts, outputs, and logs, why does it need a separate Responsibility pillar?
Answer
The concern behind this question is the assumption that documentation alone equals good governance. It does not. Logs can show what happened, but they do not by themselves show whether the run should have happened in that form, under that authority, and within those limits.
The direct answer is that Responsibility addresses the legitimacy of use, whereas logs mainly support reconstruction of use. A run can be perfectly logged and still be irresponsible. For example, a member of staff might use a generative AI system to draft advice for a vulnerable service user without checking whether that use case is permitted, whether the data entered were appropriate, or whether a qualified reviewer was required before the output was acted on.
RAIDT handles this issue better than a generic AI governance approach because it asks for evidence at the level of the specific run. Rather than relying on broad policy language such as ?staff should use AI responsibly?, RAIDT looks for concrete indicators: a purpose statement, task scope, approval route, safety constraints, escalation rules, reviewer role, and conditions of use for the generated output. That is more operational, more defensible, and more useful in supervision, audit, and post-hoc review.
Practical example in RAIDT terms
Consider a healthcare administration team using a GenAI assistant to draft patient appointment follow-up letters. The use case appears low-risk because it is administrative rather than diagnostic, but a specific run becomes governance-sensitive when staff include details about treatment pathways, missed appointments, or vulnerability indicators.
The run-level issue is not simply whether the output reads well. The real question is whether this use of GenAI was authorised for patient communication, whether the prompt content stayed within approved data boundaries, whether a human reviewer had to approve the final letter, and whether escalation rules existed if the generated draft introduced clinically inappropriate wording.
The required evidence would include the stated purpose of the run, the approved task category, the user's role, any data-handling restrictions, the review requirement before sending, the relevant policy link, and the decision rule for when GenAI drafting must not be used. Responsibility is the primary pillar here, but Auditability, Traceability, and Dependability are also affected because the organisation must reconstruct the run, understand what was generated, and trust that the process is stable enough for repeated administrative use.
In governance-readiness terms, Responsibility improves the organisation's position by showing that GenAI was used under defined authority with explicit boundaries. That is more defensible than claiming after the fact that the tool was merely ?helping with drafting?.
Detailed link to RAIDT
Responsibility links to RAIDT in four ways.
First, it expresses RAIDT's core idea that governance should focus on the run, not only on the model or policy environment.
Second, it attaches accountability to the specific configured use of GenAI for a task, at a time, in a context.
Third, it feeds the evidence pack and score profile by requiring explicit evidence of purpose, authority, boundaries, oversight, and control.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning because a challenged run can be justified in governance terms rather than defended by informal explanation.
Responsibility ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness
In this chain, Responsibility is the pillar that asks whether the run was appropriately governed before its outputs are treated as usable organisational artefacts.
Link to the five RAIDT pillars
Responsibility
Responsibility is the primary pillar because it establishes whether the run was appropriate, authorised, bounded, and overseen.
Example evidence / implication:
- A recorded purpose statement shows why the run was undertaken and by whom.
- An approval or escalation rule shows that use was subject to accountable judgement rather than convenience.
Auditability
Auditability supports Responsibility by allowing reviewers to inspect whether the claimed controls and approvals can actually be verified after the event.
Example evidence / implication:
- Review notes or sign-off records can confirm that oversight occurred.
- Time-stamped logs can show whether the run followed the expected review path.
Interpretability
Interpretability matters because a responsible run also requires that relevant users and reviewers can understand how the output should be read, limited, and used.
Example evidence / implication:
- Output annotations can explain that the text is a draft and not a final decision.
- Reviewer guidance can clarify where human judgement must override generated suggestions.
Dependability
Dependability affects Responsibility because an authorised process is still weak if it behaves erratically or cannot be relied upon for the intended task.
Example evidence / implication:
- Repeated runs under the same conditions should not produce unsafe variation for critical wording.
- Quality checks can show that the workflow is stable enough for the approved use case.
Traceability
Traceability allows the organisation to follow the artefacts, decisions, and transformations associated with a responsible run.
Example evidence / implication:
- Linked prompt, output, reviewer, and final-use records support downstream accountability.
- Version or workflow traces show how the generated draft became an organisational artefact.
Responsibility is therefore the normative anchor of the five pillars, but it depends on the others to make accountability inspectable in practice.
Why this item is more than a generic concept
In general AI governance, responsibility may mean moral accountability, broad organisational ownership, or compliance with stated principles. In RAIDT, it means something more operational and narrower: whether one specific run was justified, bounded, authorised, and overseen with sufficient evidence to support review.
The RAIDT meaning is more operational because it is tied to run-level evidence. It does not stop at saying that someone in the organisation is responsible for AI use in general. Instead, it asks who was responsible for this run, what conditions applied, what limits were set, what review path existed, and what evidence proves those claims. That shift makes the concept usable in governance design, scoring, supervision discussions, and empirical validation.
Common misunderstanding
Misunderstanding
Responsibility just means that a human remains ?in the loop?, so if a person saw the output, the run counts as responsible.
Correction
Human presence alone is not enough. A person can be involved in a process without clear authority, without understanding the task boundary, and without meaningful capacity to challenge or stop use. In RAIDT, Responsibility requires more than nominal human involvement: it requires defined roles, bounded use, explicit oversight conditions, and evidence that these controls applied to the run.
For example, if an analyst glances at a generated report summary before sending it onward, that does not by itself show responsible governance. A responsible run would also specify whether the analyst was authorised to approve the content, whether the use case was permitted, what should trigger escalation, and whether the summary could influence a consequential decision.
Boundary and limitation
Responsibility does not prove that a run was ethically perfect, legally unchallengeable, or free from harm. It also does not replace specialist review in domains where clinical, legal, safeguarding, or regulatory judgement is required. A run can score reasonably on Responsibility and still have weaknesses in interpretability, dependability, or downstream outcome quality.
The concept may also fail if evidence is superficial or performative. For instance, a template may contain a purpose statement and approval tick-boxes, but if users routinely complete them without reflection, the presence of documentation can create false confidence. RAIDT handles this limitation by treating Responsibility as evidence-based and reviewable rather than self-certified. Its strength therefore depends on the quality, specificity, and inspectability of the supporting evidence.
Implementation levels
Manual implementation
A researcher or small team can apply Responsibility manually by using a structured note or checklist for each run. The note should record the task purpose, user role, authority to use GenAI, boundaries on input data, expected review requirements, and any escalation triggers. This approach is feasible for pilot studies, small-scale evidence packs, and early RAIDT demonstrations.
Semi-automated implementation
Responsibility can be semi-automated through templates, metadata fields, and structured review workflows. For example, a run form can require completion of purpose, task category, sensitivity level, approved use case, and reviewer assignment before the run is marked complete. This reduces omission while still leaving substantive judgement with human reviewers.
Fully automated implementation
At scale, Responsibility can be embedded in a wrapper, orchestration layer, or governance platform that enforces policy-linked controls. The system can require approved task categories, assign reviewers based on risk level, block prohibited data types, log escalation events, and attach responsibility metadata directly to the evidence pack and scoring pipeline. In that form, Responsibility becomes part of the operational governance infrastructure rather than an after-the-fact narrative.
Practical use in the RAIDT project
In Paper 08 Foundations, Responsibility helps define the conceptual claim that governance must attach to the run as the actionable unit of analysis. In Paper 09 Empirical Validation, it provides one of the dimensions that can be assessed across real uses to test whether RAIDT produces meaningful discrimination between weakly and strongly governed runs. In Paper 10 Policy Pathways, it is the bridge from framework language to deployable organisational controls, because it translates principle-level commitments into implementable approval, oversight, and escalation structures.
Within the broader RAIDT project, Responsibility is also useful for sector playbooks, evidence-pack design, scoring rubrics, and governance interventions. It gives supervisors and examiners a clear answer to the question, ?What does RAIDT add beyond logging?? It also supports journal positioning by showing that the framework contributes an operational account of accountable GenAI use, rather than another abstract statement of responsible AI principles.
Key audience questions to prepare for
Q1. How is Responsibility different from ordinary compliance?
Compliance usually asks whether a rule or policy exists and has been followed in a broad sense. Responsibility in RAIDT asks whether a specific run was justified, bounded, authorised, and overseen with evidence that can be reviewed later.
Q2. Can a run be auditable but still irresponsible?
Yes. A run may be well logged and easy to reconstruct, yet still fall outside authorised use, lack adequate oversight, or involve inappropriate data. Auditability helps inspect the run; Responsibility helps judge whether the run was legitimate.
Q3. Why score Responsibility separately instead of embedding it everywhere?
A separate pillar prevents accountability issues from disappearing inside broader technical or procedural assessments. It ensures that governance legitimacy is visible, discussable, and comparable across runs.
Q4. Does Responsibility require heavy bureaucracy for every GenAI use?
No. The level of evidence should be proportionate to the task and risk. Low-stakes runs may need lightweight purpose and boundary documentation, whereas higher-stakes uses need formal approval, clearer escalation, and stronger review controls.
Q5. What is the minimum evidence that supports Responsibility in one run?
At minimum, the organisation should be able to show the run's purpose, who initiated it, why the use case was permitted, what boundaries applied, what review or oversight was expected, and how outputs were controlled before use.
Suggested citation concepts to support this item
- Responsible AI operationalisation in organisations
- Accountability in generative AI governance
- Human oversight and delegated authority in AI use
- Run-level governance for AI systems
- AI assurance and evidence-based governance
- Organisational accountability for automated decision support
- Policy-to-practice gaps in responsible AI
- Socio-technical governance of generative AI workflows
- Documentation, contestability, and audit readiness in AI systems
- Risk-based controls for high-impact AI use cases
Short explanation for presentation
Responsibility in RAIDT asks whether a specific GenAI run was appropriate, authorised, bounded, and overseen. That matters because useful output is not the same as legitimate use. An organisation may have broad AI policies, but RAIDT tests whether one actual run was carried out under clear purpose, role allocation, safety limits, and review conditions. In practice, this means Responsibility is supported by evidence such as purpose statements, approval routes, escalation rules, policy links, and controls on how outputs are used. The value of the pillar is that it moves responsible AI from principle-level claims to inspectable run-level evidence. In a supervision or viva setting, the key point is that RAIDT makes accountability operational: it shows not only what the system produced, but whether the organisation can justify having used it in that way.
One-line takeaway
Responsibility is the RAIDT pillar that tests whether a GenAI run was justified, bounded, and overseen because accountable governance must be evidenced at run level.
Related items in RAIDT pillars and scoring
Mentioned in reference-paper summaries (5)
Paper summaries live in Port/93-References/pdf_summaries/. Each file listed below contains the key term at least once.
_pilot_task.md_pilot_task_v2.md_pilot_task_v3.md_template.mdREF-001__A.G.-2017.md
Anchored questions
- Q050: What does Responsibility mean in RAIDT?
- Q051: How does RAIDT evidence responsibility in one run?
- Q052: How should Responsibility be scored without hiding trade-offs?
- Q137: What is the Responsibility pillar and what evidence supports it?
- Q234: Responsibility ? definition, example, and why it matters in RAIDT