S11.10 - Future_extension_agentic_AI
S11.10 — Future extension: agentic AI
flowchart LR
A[Traditional run governance] --> B[Agentic systems add planning, tool use, memory, and multi-step action]
B --> C[RAIDT
run-level evidence framework]
C --> D[[Future extension: agentic AI
linked runs or bounded task episodes]]
D --> E[Expanded evidence pack]
D --> F[RAIDT score profile]
D --> G[Governance move:
evidence over assertion]
E --> H[Reviewer reconstruction]
F --> I[Governance readiness]
E --> J[Organisational learning]
F --> K[Policy alignment]
L[Procurement assistant] --> D
M[Case-management support] --> D
N[Cybersecurity triage agent] --> D
O[Human approval checkpoints] --> D← Star S11 - Boundaries, Limitations and Future Questions
Star context: Prevents overclaiming and explains what RAIDT can and cannot solve, while showing how the framework may be extended from single runs towards linked agentic episodes.
Academic picture
Definition / background
Agentic AI refers to generative AI systems that do more than produce a single response to a single prompt. In practice, such systems may decompose goals into sub-tasks, select tools, retrieve information, maintain temporary state, branch across decision steps, and sometimes trigger actions in external systems. The concept sits between conventional interactive GenAI and more autonomous socio-technical workflows, and it is often associated with planning, tool use, orchestration, and bounded autonomy.
Within RAIDT, this matters because the framework treats the run as the unit of governance: one configured use of a GenAI system for a specific task, at a specific time, in a specific context. Agentic AI does not remove the value of that unit, but it raises a design question about how a run should be represented when one task involves multiple linked steps, tool invocations, or approval checkpoints. This is why the item is framed as a future extension rather than as a settled current capability.
The concept also differs from adjacent terms. It is not identical to workflow automation, because agentic systems may make intermediate selections rather than follow only fixed rules. It is not identical to autonomy in the strongest sense, because many organisational agents remain bounded by permissions, policies, and human approvals. It is not simply multimodality either, because the central issue is not input type but the chaining of decisions and actions over time.
This item belongs inside RAIDT because run-level evidence becomes harder and more important once systems act through sequences. If an organisation cannot capture tool traces, intermediate decision points, escalation events, and final actions, then claims about responsibility, auditability, dependability, and traceability become weaker. Agentic AI therefore stresses the RAIDT model in a useful way: it clarifies what extra evidence would be needed to extend evidence packs and score profiles beyond relatively discrete runs.
Why this concept matters
This concept matters because many organisations are moving from simple prompt-based assistance towards systems that can coordinate tasks, query tools, and produce operational effects. Without a clear governance treatment of agentic behaviour, organisations may either overclaim control over systems they cannot reconstruct, or overreact by treating every agentic workflow as inherently ungovernable. RAIDT offers a more disciplined middle position: governance should follow the evidence available for a specific bounded episode of use.
The concept also prevents a common confusion in AI governance. Broad principles can say that an agent should be safe, supervised, or accountable, but they do not by themselves show what happened in a particular case. RAIDT matters here because it translates concern about agency into concrete evidential questions: what sequence occurred, which tools were used, where human approval was required, what intermediate choices were made, and whether the resulting outputs can be reviewed and contested.
Key idea: Agentic AI matters because it stretches the boundary of the run, so RAIDT must preserve evidence continuity across linked actions rather than abandon run-level governance.
What this item explains
- When a seemingly single AI task is better understood as a linked chain of steps, sub-runs, or a bounded task episode.
- Why tool calls, planning steps, retrieved artefacts, memory use, and escalation points become part of governance evidence.
- How human approval checkpoints and system permissions define control boundaries within agentic workflows.
- Why evidence packs may need to expand from single-run documentation towards cross-step reconstruction.
- Why RAIDT score profiles for agentic settings must remain grounded in observable evidence rather than assumptions about autonomy.
Practical example / likely audience question
Audience question
Does agentic AI change the unit?
Answer
The concern behind this question is that once a system starts planning and using tools, the notion of a run may appear too small or too simplistic. The direct answer is that agentic AI may require RAIDT to represent a sequence of linked runs or a bounded task episode, but it does not invalidate the evidence-first logic of the framework. The governance problem is still anchored in a specific instance of use; the difference is that the instance may now contain several connected steps rather than one isolated exchange.
A practical example is an internal procurement assistant that receives a request for software, checks policy documents, queries approved vendor lists, drafts a comparison, and prepares an email for managerial approval. A generic AI governance approach might respond with a policy statement that all automated decisions must be supervised. RAIDT handles the issue more rigorously by asking what happened in this specific episode: which model and tools were used, which documents were retrieved, what interim recommendations were generated, whether a human sign-off occurred before any external action, and whether the evidence supports the resulting assessment across the five pillars.
The important point is that agentic AI changes the evidential granularity, not the need for evidence. RAIDT therefore remains applicable, but the framework would need richer representations of chains, checkpoints, and dependencies to maintain reviewability and audit readiness.
Practical example in RAIDT terms
Consider an enterprise productivity setting in which a procurement agent helps a university department source transcription software. The user provides the task goal, the agent searches internal policy notes, retrieves vendor information, drafts a shortlist, compares licence terms, and prepares a recommendation for a departmental approver.
The run-level issue is that the outcome is no longer just a generated paragraph. The meaningful governance object is the linked episode: the initial task framing, the retrieval steps, the tool calls, the intermediate ranking logic, the approval checkpoint, and the final recommendation. If only the final recommendation is retained, reviewers cannot reconstruct how the shortlist was produced or whether the system exceeded its authority.
The evidence needed would include the task prompt, model and tool configuration, timestamps, retrieved sources, tool-call logs, intermediate outputs, policy constraints applied, human approvals, and the final artefact delivered to the requester. The most affected RAIDT pillars would be Responsibility, Auditability, Dependability, and Traceability, with Interpretability also relevant where planning summaries or decision rationales can be captured. This item improves governance readiness because it shows exactly what extra evidence is required before an agentic workflow can be reviewed with confidence rather than merely trusted.
Detailed link to RAIDT
Future extension: agentic AI links to RAIDT in four ways.
First, it extends the core RAIDT idea that governance should focus on what happened in a concrete organisational use of GenAI, rather than on abstract system claims.
Second, it sharpens the run-level question by asking whether one run remains sufficient or whether linked runs or bounded task episodes must be represented when agency increases.
Third, it expands what an evidence pack may need to contain, including tool traces, intermediate decisions, approval gates, and cross-step provenance that can still justify a score profile.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning by making multi-step AI behaviour reconstructable rather than opaque.
Future extension: agentic AI → Linked runs or task episodes → Expanded evidence pack → RAIDT score profile → Governance readiness
Link to the five RAIDT pillars
This item most strongly affects Responsibility, Auditability, Dependability, and Traceability, while also increasing the practical importance of Interpretability.
Responsibility
Agentic AI raises the question of who is accountable for delegated action, tool permissions, escalation thresholds, and intervention points. RAIDT makes responsibility more operational by requiring evidence of role assignment and approval structure for a specific episode.
Example evidence / implication:
- Named owner for the agent configuration, task boundary, and authority to act.
- Recorded approval or override checkpoints before external communication or consequential action.
Auditability
Auditability becomes more demanding when a task unfolds across multiple steps. Reviewers need enough evidence to reconstruct the sequence of planning, retrieval, tool use, and decision branching rather than only the final output.
Example evidence / implication:
- Timestamped log of prompts, tool calls, intermediate outputs, and system responses.
- Clear record of when the workflow paused, escalated, failed, or was manually corrected.
Interpretability
Interpretability in agentic settings is less about exposing inaccessible model internals and more about making the observable decision process understandable. RAIDT benefits from summaries of planning logic, tool-selection criteria, and human intervention notes where these can be captured responsibly.
Example evidence / implication:
- Planning summaries or structured rationale notes attached to key decision points.
- Distinction between retrieved evidence, generated inference, and human edits in the final recommendation.
Dependability
Agentic workflows create more opportunities for cumulative error, brittle chaining, looping behaviour, or action based on stale or low-quality retrieval. Dependability therefore depends on safeguards, stop conditions, retry logic, and bounded execution rules.
Example evidence / implication:
- Evidence of fallback behaviour, failure handling, confidence thresholds, or timeout controls.
- Logs showing whether the system stayed within permitted tools, data sources, and task scope.
Traceability
Traceability is central because linked actions must remain connected to the original task, context, configuration, and final outcome. Without this, reviewers cannot tell how an agentic episode unfolded or whether later actions remain attributable to the original request.
Example evidence / implication:
- Persistent identifiers linking the initial task, each sub-step, retrieved artefacts, and the final output.
- Versioned records of model settings, tool connectors, and data sources used during the episode.
Why this item is more than a generic concept
In general AI governance, agentic AI may simply mean that a system appears more autonomous, uses tools, or completes multi-step workflows. In RAIDT, the term has a more precise and operational meaning: it marks the point at which the run-level evidence model may need to expand to cover linked steps without losing specificity, reviewability, or audit discipline. The RAIDT meaning is therefore more practical because it is tied to what can be evidenced in a particular episode, what can be placed in an evidence pack, and what can legitimately support a five-pillar score profile.
Common misunderstanding
Misunderstanding
If an AI system behaves agentically, then run-level governance is no longer relevant and only platform-level policy matters.
Correction
Platform-level policy still matters, but it is not enough. An organisation also needs episode-level evidence showing how a specific agentic task unfolded. For example, a university may have a policy that AI agents must remain human-supervised, yet a reviewer still needs to see whether a particular procurement episode actually paused for approval before an email was prepared or a recommendation was circulated. RAIDT corrects the misunderstanding by keeping governance tied to reconstructable use rather than to policy assertion alone.
Boundary and limitation
This item does not claim that RAIDT already provides a complete governance solution for every form of agentic AI. It does not by itself prove that an agent is safe, lawful, fair, or technically robust. It does not replace access control, security engineering, human oversight design, or sector-specific compliance. It may also be difficult to apply where orchestration layers, tool calls, or intermediate states are not observable.
RAIDT handles this limitation by treating agentic AI as a future extension question rather than an overclaimed present capability. The practical implication is that organisations should not score highly on agentic governance merely because they deploy an agent wrapper. They need evidence that the workflow can be bounded, logged, reviewed, contested, and improved. Where such evidence is missing, RAIDT should surface that limitation rather than hide it.
Implementation levels
Manual implementation
A researcher or small team can apply this item manually by documenting a bounded agentic episode in a structured note or template. This would include the task goal, the sequence of prompts and tools, the retrieved sources, the intermediate outputs, the human intervention points, and the final result. Manual implementation is slower, but it is enough to test whether the governance questions can be answered.
Semi-automated implementation
Semi-automated implementation can use wrappers, templates, or metadata capture around an orchestration workflow. Session identifiers, tool logs, approval events, and version metadata can be stored automatically, while reviewers add short interpretive notes at key checkpoints. This is often the most realistic route for pilot deployments because it preserves oversight without requiring a full governance platform.
Fully automated implementation
At scale, a governance pipeline could treat an agentic episode as a graph of linked sub-runs with persistent identifiers, immutable logs, policy checks, and automated score-support evidence. An orchestration layer or dashboard could block unauthorised tools, require approval before external action, and generate evidence-pack components directly from telemetry. Fully automated implementation would make agentic governance more reliable, but only if the underlying logging and control architecture is itself dependable.
Practical use in the RAIDT project
In Paper 08 Foundations, this item helps position RAIDT as an extensible framework rather than one limited to simple single-turn prompting. It shows that the run-level model is conceptually strong enough to confront more complex AI arrangements without abandoning its evidential core. In Paper 09 Empirical Validation, the item can guide evaluation of whether reviewers can reconstruct a multi-step episode more accurately when tool traces, approvals, and sequence logs are present. In Paper 10 Policy Pathways, it supports concrete recommendations for governance wrappers, approval gates, and logging standards around higher-autonomy organisational AI.
The item also has value across sector playbooks, the evidence pack, and the scoring rubric. It helps explain to supervisors, reviewers, and viva examiners that RAIDT does not confuse fashionable language about "agents" with actual governability. Instead, it turns the discussion into operational questions about evidence capture, score justification, organisational control points, and realistic governance interventions.
Key audience questions to prepare for
Q1. If RAIDT is run-level, how can it handle agents that act over hours or days?
RAIDT can handle this by treating the activity as a bounded episode or a linked chain of sub-runs, provided that identifiers, timestamps, checkpoints, and outputs remain reconstructable. The key is not duration alone, but evidential coherence.
Q2. Does this mean every agent step must be separately scored?
Not necessarily. Some contexts may justify scoring a bounded episode as a whole, while others may require step-level review for particularly consequential actions. RAIDT would need a proportionate design rather than a rigid one-step-one-score rule.
Q3. Is agentic AI mainly a technical challenge or a governance challenge for RAIDT?
It is both, but the RAIDT contribution is primarily governance-oriented. The central question is whether technical traces and checkpoints can be converted into reviewable evidence that supports accountability and contestability.
Q4. What additional evidence becomes essential once tool use and action are introduced?
Tool-call logs, retrieved artefacts, approval records, intermediate decision points, permissions, timestamps, and failure or override events all become much more important. Without them, the final output is too thin to support serious governance claims.
Q5. Why not replace the run with the workflow as the unit of governance?
Because RAIDT gains its strength from specificity. Replacing the run entirely with a broad workflow concept risks losing the contextual precision that makes review possible. A better extension is to preserve run-level discipline while linking runs into bounded episodes when the task demands it.
Suggested citation concepts to support this item
- agentic AI governance
- governance of LLM agents in organisational settings
- audit trails for tool-using AI systems
- provenance and traceability in AI orchestration
- human-in-the-loop oversight for autonomous AI agents
- evaluation of multi-step language model agents
- accountability for AI systems with external tool use
- logging and telemetry for AI agent workflows
- socio-technical control points in AI automation
- bounded autonomy in generative AI governance
Short explanation for presentation
Future extension: agentic AI is important because it tests whether RAIDT can stay useful when GenAI systems move beyond single prompt-response exchanges into linked sequences of planning, retrieval, tool use, and action. The point is not to claim that RAIDT already solves agent governance in full. Rather, the item shows how RAIDT could extend its run-level logic by treating some tasks as linked runs or bounded episodes, provided that the evidence remains reconstructable. For supervision or viva discussion, the key argument is that governance should not become vaguer as AI becomes more capable. It should become more evidence-based. That means richer traces, approval checkpoints, and clearer provenance so that responsibility, auditability, dependability, and traceability remain defensible in practice.
One-line takeaway
Future extension: agentic AI is RAIDT's way of extending run-level governance to linked, tool-using AI episodes because governance still depends on concrete evidence from actual organisational use.
Related items in boundaries, limitations and future questions
Anchored questions
No anchored questions were present in the original item.