S11.10 - Future_extension_agentic_AI

S11.10 — Future extension: agentic AI

flowchart LR
    A[Traditional run governance] --> B[Agentic systems add planning, tool use, memory, and multi-step action]
    B --> C[RAIDT
run-level evidence framework]
    C --> D[[Future extension: agentic AI
linked runs or bounded task episodes]]
    D --> E[Expanded evidence pack]
    D --> F[RAIDT score profile]
    D --> G[Governance move:
evidence over assertion]
    E --> H[Reviewer reconstruction]
    F --> I[Governance readiness]
    E --> J[Organisational learning]
    F --> K[Policy alignment]
    L[Procurement assistant] --> D
    M[Case-management support] --> D
    N[Cybersecurity triage agent] --> D
    O[Human approval checkpoints] --> D

← Star S11 - Boundaries, Limitations and Future Questions

Star context: Prevents overclaiming and explains what RAIDT can and cannot solve, while showing how the framework may be extended from single runs towards linked agentic episodes.

Academic picture

Definition / background

Agentic AI refers to generative AI systems that do more than produce a single response to a single prompt. In practice, such systems may decompose goals into sub-tasks, select tools, retrieve information, maintain temporary state, branch across decision steps, and sometimes trigger actions in external systems. The concept sits between conventional interactive GenAI and more autonomous socio-technical workflows, and it is often associated with planning, tool use, orchestration, and bounded autonomy.

Within RAIDT, this matters because the framework treats the run as the unit of governance: one configured use of a GenAI system for a specific task, at a specific time, in a specific context. Agentic AI does not remove the value of that unit, but it raises a design question about how a run should be represented when one task involves multiple linked steps, tool invocations, or approval checkpoints. This is why the item is framed as a future extension rather than as a settled current capability.

The concept also differs from adjacent terms. It is not identical to workflow automation, because agentic systems may make intermediate selections rather than follow only fixed rules. It is not identical to autonomy in the strongest sense, because many organisational agents remain bounded by permissions, policies, and human approvals. It is not simply multimodality either, because the central issue is not input type but the chaining of decisions and actions over time.

This item belongs inside RAIDT because run-level evidence becomes harder and more important once systems act through sequences. If an organisation cannot capture tool traces, intermediate decision points, escalation events, and final actions, then claims about responsibility, auditability, dependability, and traceability become weaker. Agentic AI therefore stresses the RAIDT model in a useful way: it clarifies what extra evidence would be needed to extend evidence packs and score profiles beyond relatively discrete runs.

Why this concept matters

This concept matters because many organisations are moving from simple prompt-based assistance towards systems that can coordinate tasks, query tools, and produce operational effects. Without a clear governance treatment of agentic behaviour, organisations may either overclaim control over systems they cannot reconstruct, or overreact by treating every agentic workflow as inherently ungovernable. RAIDT offers a more disciplined middle position: governance should follow the evidence available for a specific bounded episode of use.

The concept also prevents a common confusion in AI governance. Broad principles can say that an agent should be safe, supervised, or accountable, but they do not by themselves show what happened in a particular case. RAIDT matters here because it translates concern about agency into concrete evidential questions: what sequence occurred, which tools were used, where human approval was required, what intermediate choices were made, and whether the resulting outputs can be reviewed and contested.

Key idea: Agentic AI matters because it stretches the boundary of the run, so RAIDT must preserve evidence continuity across linked actions rather than abandon run-level governance.

What this item explains

When a seemingly single AI task is better understood as a linked chain of steps, sub-runs, or a bounded task episode.
Why tool calls, planning steps, retrieved artefacts, memory use, and escalation points become part of governance evidence.
How human approval checkpoints and system permissions define control boundaries within agentic workflows.
Why evidence packs may need to expand from single-run documentation towards cross-step reconstruction.
Why RAIDT score profiles for agentic settings must remain grounded in observable evidence rather than assumptions about autonomy.

Practical example / likely audience question

Audience question

Does agentic AI change the unit?

Answer

The concern behind this question is that once a system starts planning and using tools, the notion of a run may appear too small or too simplistic. The direct answer is that agentic AI may require RAIDT to represent a sequence of linked runs or a bounded task episode, but it does not invalidate the evidence-first logic of the framework. The governance problem is still anchored in a specific instance of use; the difference is that the instance may now contain several connected steps rather than one isolated exchange.

A practical example is an internal procurement assistant that receives a request for software, checks policy documents, queries approved vendor lists, drafts a comparison, and prepares an email for managerial approval. A generic AI governance approach might respond with a policy statement that all automated decisions must be supervised. RAIDT handles the issue more rigorously by asking what happened in this specific episode: which model and tools were used, which documents were retrieved, what interim recommendations were generated, whether a human sign-off occurred before any external action, and whether the evidence supports the resulting assessment across the five pillars.

The important point is that agentic AI changes the evidential granularity, not the need for evidence. RAIDT therefore remains applicable, but the framework would need richer representations of chains, checkpoints, and dependencies to maintain reviewability and audit readiness.

Practical example in RAIDT terms

Consider an enterprise productivity setting in which a procurement agent helps a university department source transcription software. The user provides the task goal, the agent searches internal policy notes, retrieves vendor information, drafts a shortlist, compares licence terms, and prepares a recommendation for a departmental approver.

The run-level issue is that the outcome is no longer just a generated paragraph. The meaningful governance object is the linked episode: the initial task framing, the retrieval steps, the tool calls, the intermediate ranking logic, the approval checkpoint, and the final recommendation. If only the final recommendation is retained, reviewers cannot reconstruct how the shortlist was produced or whether the system exceeded its authority.

The evidence needed would include the task prompt, model and tool configuration, timestamps, retrieved sources, tool-call logs, intermediate outputs, policy constraints applied, human approvals, and the final artefact delivered to the requester. The most affected RAIDT pillars would be Responsibility, Auditability, Dependability, and Traceability, with Interpretability also relevant where planning summaries or decision rationales can be captured. This item improves governance readiness because it shows exactly what extra evidence is required before an agentic workflow can be reviewed with confidence rather than merely trusted.

Detailed link to RAIDT

Future extension: agentic AI links to RAIDT in four ways.

First, it extends the core RAIDT idea that governance should focus on what happened in a concrete organisational use of GenAI, rather than on abstract system claims.
Second, it sharpens the run-level question by asking whether one run remains sufficient or whether linked runs or bounded task episodes must be represented when agency increases.
Third, it expands what an evidence pack may need to contain, including tool traces, intermediate decisions, approval gates, and cross-step provenance that can still justify a score profile.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning by making multi-step AI behaviour reconstructable rather than opaque.

Future extension: agentic AI → Linked runs or task episodes → Expanded evidence pack → RAIDT score profile → Governance readiness

Link to the five RAIDT pillars

This item most strongly affects Responsibility, Auditability, Dependability, and Traceability, while also increasing the practical importance of Interpretability.

Responsibility

Agentic AI raises the question of who is accountable for delegated action, tool permissions, escalation thresholds, and intervention points. RAIDT makes responsibility more operational by requiring evidence of role assignment and approval structure for a specific episode.