S11.08 - Component_drift
\
S11.08 ? Component drift
flowchart LR
A[Changing prompts retrieval indices adapters and policy layers] --> B[RAIDT - run-level evidence framework]
A2[Traditional limitation: weak reconstruction after updates] --> B
B --> C[[Component drift]]
C --> D[Evidence pack with component versions]
C --> E[Score profile interpreted in context]
D --> F[Reviewer reconstruction and contestability]
E --> G[Governance readiness and organisational learning]
H[Healthcare public services enterprise productivity] --> C
I[Prompt ID retrieval snapshot policy-layer version wrapper release] --> C? Star S11 - Boundaries, Limitations and Future Questions
Star context: This item locates a major boundary condition for RAIDT: even well-designed governance can weaken if the technical components behind a run change over time without being recorded, compared, and reviewed.
Academic picture
Definition / background
Component drift refers to the change over time in one or more technical or procedural elements that shape a generative AI run. In practice, these elements can include prompt templates, model versions, retrieval corpora or indices, fine-tuned adapters, policy rules, safety settings, routing logic, external tools, and human review steps. The concept matters because a run is never produced by a model alone; it is produced by a configured system operating in a particular organisational context.
In GenAI governance, component drift is closely related to but distinct from model drift. Model drift usually refers to changes in model behaviour or performance. Component drift is broader: it includes changes in the surrounding stack that can alter outputs, risk exposure, or governance status even when the underlying model remains unchanged. A prompt revision, a refreshed knowledge base, or a modified access policy can all change the meaning of a run-level result.
This belongs inside RAIDT because RAIDT treats the run as the unit of governance and asks whether that run can be reconstructed, reviewed, contested, and compared. If the components that shaped a run are not versioned, then the evidence pack becomes incomplete and the score profile becomes harder to interpret across time. Component drift therefore sits at the boundary between technical maintenance and governance evidence: it is about preserving the integrity of what a run-level claim actually refers to.
Within RAIDT, component drift directly affects run-level evidence, the structure of the evidence pack, and the credibility of the five-pillar score profile. A score is meaningful only if the assessed configuration is known. Without that, organisations risk comparing unlike with unlike and mistaking silent system change for improvement, degradation, or inconsistency in human practice.
Why this concept matters
Component drift matters because governance claims are only as reliable as the stability and traceability of the system being governed. When a team says that a use case is dependable, interpretable, or audit-ready, that claim implicitly assumes a particular system configuration. If the configuration changes and the change is not recorded, then the governance claim can become detached from the current reality of use.
This concept solves a practical problem that appears in long-running deployments: stakeholders often notice that outputs feel different, but cannot determine whether the cause lies in the model, the prompt, the retrieved evidence base, the safety settings, or the task context. RAIDT avoids that confusion by requiring run-level evidence about the components involved in each run and by supporting comparison across runs over time.
For organisations, the absence of this concept creates familiar risks: failed reconstruction during audit, disputes about why outcomes changed, weak incident analysis, poor comparability of evaluations, and overconfident claims of consistency. By making component drift explicit, RAIDT shifts governance from principle-level assurance to operational reviewability.
Key idea: Component drift matters because governance judgments are only trustworthy when the components that shaped each run can be identified, versioned, and compared over time.
What this item captures
- Changes in prompts, templates, guardrails, model endpoints, adapters, retrieval sources, indices, or workflow logic that alter how a run is produced.
- The gap between a nominally "same" AI service and the actual configured system used at different points in time.
- The evidential requirement to record component versions so that runs can be reconstructed and compared.
- The risk that score profiles become misleading when underlying components change without documentation.
- The organisational need to distinguish intentional updates from silent drift.
- The connection between technical change management and governance readiness.
Practical example / likely audience question
Audience question
Why keep versions of prompts, retrieval indices, and policy layers if the use case and the model name stay the same?
Answer
The concern behind this question is a common misconception that the model name is the main determinant of system behaviour. In reality, many governance-relevant changes happen outside the base model. A revised prompt can narrow or expand the scope of an answer. A retrieval index refresh can introduce new source material or remove old guidance. A policy layer can block responses that were previously allowed. These are not superficial implementation details; they shape what the run actually was.
The direct answer is that versioning is necessary because otherwise a run cannot be reliably reconstructed months later. Suppose an organisation evaluates a drafting assistant in January and again in June. If the June system uses a refined prompt, an updated retrieval corpus, and stricter moderation rules, then improved or degraded outcomes cannot be interpreted properly unless those differences are visible in the evidence.
RAIDT handles this better than a generic AI governance approach because it does not stop at broad principles such as accountability or transparency. It ties those principles to the run itself. That means the evidence pack can show which components were active, and the score profile can be interpreted in light of that specific configuration rather than being treated as a floating claim about an abstract system.
Practical example in RAIDT terms
Consider a hospital using a GenAI assistant to draft discharge summaries for clinicians. In February, a run is performed using one prompt template, one retrieval index containing local discharge guidance, and one policy layer for redaction checks. By May, the hospital has updated the prompt to improve brevity, refreshed the retrieval index with newer guidance, and inserted an additional safety rule for medication references.
The run-level issue is that a later summary may differ for several reasons, but without component evidence the team cannot say which changes affected the result. The evidence needed includes prompt version IDs, retrieval index or corpus version identifiers, policy-layer version or rule-set references, timestamps, model endpoint metadata, and any wrapper or orchestration version used.
The most affected RAIDT pillars are Auditability, Dependability, and Traceability, with Responsibility and Interpretability also implicated. Auditability depends on reconstructing the conditions of the run. Dependability depends on knowing whether performance changed because the system changed. Traceability depends on linking an output to the components that produced it. Capturing component drift improves governance readiness because supervisors, auditors, and clinical leads can distinguish real quality changes from undocumented configuration changes.
Detailed link to RAIDT
Component drift links to RAIDT in four ways.
First, it supports RAIDT's core idea that governance should attach to a specific run rather than to general claims about a model or product.
Second, it sharpens the meaning of run-level evidence by showing that the run includes the configured stack around the model, not only the final prompt and output.
Third, it affects the evidence pack and the score profile because both become more defensible when component versions and changes are explicitly recorded.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning by making it possible to explain why outcomes differ across time.
Component drift -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness
In that chain, component drift is the reason version-aware evidence is needed; the evidence pack is where that information is assembled; the score profile is where interpretation depends on knowing whether the assessed run is comparable to earlier or later runs; and governance readiness is improved when those links are reviewable.
Link to the five RAIDT pillars
Responsibility
Component drift affects Responsibility because organisational actors remain accountable for the configured system they choose to deploy, update, and maintain. If changes are made without oversight or documentation, accountability becomes blurred.
Example evidence / implication:
- Change logs showing who approved a prompt revision or retrieval update.
- Records linking component changes to governance review or risk acceptance.
Auditability
This item has a strong effect on Auditability. Auditors cannot meaningfully reconstruct or assess a run if the operative components are unknown or only partially documented.
Example evidence / implication:
- Version identifiers for prompts, model endpoints, adapters, and policy rules stored alongside each run.
- Comparison records showing what changed between evaluated runs.
Interpretability
Component drift affects Interpretability because explanations of output behaviour are weaker when the governing components have shifted. Interpretation requires knowing which system configuration generated the output under review.
Example evidence / implication:
- Notes explaining how prompt structure or retrieval policy changed expected system behaviour.
- Reviewer annotations connecting output differences to documented configuration changes.
Dependability
This item has a strong effect on Dependability because observed reliability cannot be separated from undocumented system change. Dependability claims require stable or at least well-characterised operating conditions.
Example evidence / implication:
- Benchmark or quality-review results tied to specific component versions.
- Alerts when evaluated components have changed enough that prior dependability claims should be revisited.
Traceability
Component drift has a very strong effect on Traceability. The purpose of traceability is not just to store outputs, but to connect each output to the configuration, context, and workflow that produced it.
Example evidence / implication:
- Run records linking outputs to prompt version, retrieval snapshot, policy layer, and wrapper release.
- Evidence-pack fields that allow a reviewer to trace a disputed output back to its exact component set.
Component drift touches all five pillars, but it is especially central to Auditability, Dependability, and Traceability.
Why this item is more than a generic concept
In general AI governance, component drift may be discussed as a lifecycle or maintenance issue: systems change, so organisations should monitor them. In RAIDT, the meaning is more operational and more exact. The question is not simply whether change occurred, but whether each run remains evidentially intelligible after change occurs.
That RAIDT meaning is stronger because it ties drift to concrete run-level evidence. A generic governance framework may call for documentation of updates. RAIDT asks whether a reviewer can look at a specific run, identify the component state that shaped it, compare it with other runs, and judge how that affects evidence-pack completeness and score-profile interpretation.
Common misunderstanding
Misunderstanding
Component drift only matters when the underlying model is replaced or fine-tuned.
Correction
That is too narrow. A system can drift in governance-relevant ways even when the same base model remains in use. For example, a customer-service assistant may keep the same model but switch to a new prompt template and a revised retrieval index containing updated policies. If output style, confidence, or risk behaviour changes, those changes still matter for governance. RAIDT treats such drift as operationally significant because the run has changed in substance, even if the model label has not.
Boundary and limitation
Component drift does not by itself prove that a system has become worse, safer, or non-compliant. It identifies and evidences change; it does not automatically supply a causal evaluation of the consequences of that change. A well-documented drift event may have negligible practical impact, while a poorly documented minor change may produce major downstream effects.
It also does not replace substantive testing, domain review, or outcome monitoring. Version records alone cannot show whether a new retrieval corpus improved quality or introduced hidden bias. For this reason, RAIDT treats component drift as a necessary but not sufficient part of governance. It works best when combined with evaluation results, reviewer judgement, incident review, and clear criteria for when a component change triggers re-scoring or re-approval.
Implementation levels
Manual implementation
A researcher or small team can manage component drift manually by recording prompt versions, model identifiers, retrieval snapshots, and policy settings in a structured run log or evidence-pack template. Even a disciplined spreadsheet or markdown template can substantially improve later reconstruction.
Semi-automated implementation
Semi-automated implementation can capture component metadata through templates, wrappers, form-based run submission, or lightweight logging scripts. These methods reduce omissions by populating standard fields for prompt IDs, retrieval versions, tool configuration, timestamps, and reviewer notes.
Fully automated implementation
At scale, a platform or orchestration layer can automatically stamp every run with component fingerprints, wrapper versions, policy-layer hashes, retrieval index versions, and change-detection events. A governance dashboard can then flag when drift has occurred, indicate whether existing score profiles remain comparable, and trigger review workflows when thresholds are crossed.
Practical use in the RAIDT project
In the RAIDT project, this item is useful in several places. In Paper 08 Foundations, it helps define why the run must include the configured socio-technical stack rather than only the prompt-output pair. In Paper 09 Empirical Validation, it supports methodological rigour by explaining why repeated runs or longitudinal comparisons must account for configuration change. In Paper 10 Policy Pathways, it gives policymakers a concrete route from abstract lifecycle governance to evidence-based oversight.
It also supports sector playbooks because real deployments in healthcare, public services, education, and enterprise settings all face rolling updates to prompts, retrieval resources, and control layers. For the evidence pack, it motivates explicit component-version fields. For the scoring rubric, it helps justify rules about when a prior score is still interpretable and when drift requires re-review. In supervision, viva defence, and journal positioning, it shows that RAIDT is attentive not only to responsible design but also to the temporal instability of deployed GenAI systems.
Key audience questions to prepare for
Q1. Is component drift just another name for software updates?
Not exactly. Software updates are one source of change, but component drift is the governance-relevant accumulation of changes across the configured AI stack that alter how a run is produced or interpreted. The concept matters because those changes affect evidence, comparability, and reviewability.
Q2. Why is this especially important for RAIDT rather than for any evaluation framework?
Because RAIDT evaluates governance at the run level. If the run is the unit of evidence, then undocumented change in run components directly weakens the integrity of the framework's evidence pack and the meaning of its score profile.
Q3. Does every component change require a full reassessment?
No. RAIDT can support proportionate review. The key requirement is to record the change and determine whether it is material to the use case, risk profile, or interpretation of prior evidence. Minor changes may need annotation; major changes may require re-scoring or renewed review.
Q4. What is the main organisational risk if component drift is ignored?
The main risk is false confidence. An organisation may believe it is governing one system while actually operating a materially changed one, making audit trails weaker and governance claims less defensible.
Q5. How does this help with contestability?
It allows a challenged output to be examined in relation to the exact configuration that produced it. Without component-level version evidence, a user or reviewer may be unable to contest the decision-making conditions in any meaningful way.
Suggested citation concepts to support this item
- model drift versus system drift in generative AI governance
- configuration management for machine learning and large language model systems
- prompt versioning and reproducibility in LLM applications
- retrieval-augmented generation governance and corpus version control
- audit trails for AI system updates in organisational settings
- MLOps change management and evidence capture for AI deployment
- sociotechnical traceability in AI assurance and accountability
- reproducibility and longitudinal evaluation of foundation-model applications
Short explanation for presentation
Component drift means that the practical system behind a GenAI run can change over time even when the use case appears unchanged. In RAIDT, that matters because governance attaches to the run, not to a vague idea of the tool. If prompts, retrieval indices, adapters, or policy layers shift without clear version records, then later reviewers cannot reliably reconstruct what produced a given output or compare results across time. This weakens auditability, traceability, and the interpretation of score profiles. RAIDT therefore treats component drift as a governance issue, not just a maintenance issue. The point is not to prevent all change, but to make change evidentially visible so that organisations can review it, contest it, and decide when reassessment is needed.
One-line takeaway
Component drift is the change in the configured elements of a GenAI run over time, and in RAIDT it matters because those changes must be evidenced if run-level governance claims are to remain reviewable.
Related items in boundaries, limitations and future questions
Anchored questions
No anchored questions were present in the original note.