S1.09 - Runtime_configuration

S1.09 ? Runtime configuration

flowchart LR
    A[Model-centric governance
misses configured behaviour] --> B[RAIDT
run-level evidence framework] B --> C[[Runtime configuration
prompts, settings, retrieval, tools, controls]] C --> D[Evidence pack] C --> E[RAIDT score profile] C --> F[Reviewer reconstruction] D --> G[Audit readiness] E --> G F --> H[Contestability and organisational learning] I[Healthcare] --> C J[Finance] --> C K[Public services] --> C L[Cybersecurity] --> C M[Enterprise productivity] --> C

? Star S1 - Origins, Background and History

Star context: Explains why RAIDT emerges from the recognition that GenAI behaviour is shaped not only by the model itself, but by prompts, retrieval settings, tool permissions, orchestration choices and review controls that vary from run to run. This item therefore links the historical background of the project to RAIDT's central move from principle-level AI governance to operational, run-level governance.


Academic picture
Definition / background

Runtime configuration is the set of operational parameters, components and control choices active when a generative AI system is used for a particular task. It includes not only obvious settings such as model choice or temperature, but also the system prompt, user prompt framing, retrieval source selection, tool availability, adapters or fine-tuned layers, guardrails, escalation rules, access permissions, workflow routing and human review steps. In other words, it is the practical assembly through which a run is produced.

Conceptually, this matters because GenAI behaviour is not governed solely by stable software logic in the way many traditional information systems were. The same underlying model can generate very different outputs depending on how it is configured at runtime. A governance approach that evaluates only the model therefore misses a major determinant of performance, risk and accountability.

Within RAIDT, runtime configuration belongs at the centre of the framework because RAIDT treats the run, rather than the abstract model, as the unit of governance. A run-level evidence pack is incomplete if it cannot show how the system was configured when the output was produced. Similarly, the RAIDT score profile cannot be interpreted properly without knowing whether the run involved high randomness, external retrieval, broad tool permissions, weak oversight or strong review controls.

This item also differs from broader terms such as deployment architecture or system design. Deployment architecture describes the general environment in which the system operates; runtime configuration focuses on the specific settings and components active in a concrete instance of use. That distinction is essential for reviewability. RAIDT needs to know not merely how a system is usually deployed, but how this run was actually configured when a decision, recommendation or draft was generated.

Why this concept matters

Runtime configuration matters because it prevents a misleading model-centric view of GenAI governance. Without it, organisations may believe they have governed a system simply by approving a vendor, naming a model or publishing high-level principles. In practice, however, risk often enters through the configured use of that model: the prompt may overstate authority, retrieval may pull from outdated documents, a tool may be allowed to act without adequate checks, or a safety layer may be disabled for speed.

For organisations using GenAI in operational settings, this concept helps distinguish between abstract capability and situated behaviour. It provides a disciplined way to ask what was enabled, constrained, retrieved, instructed, reviewed and logged in the specific run under scrutiny. That is what allows governance to move from general assurance statements to concrete evidence.

If runtime configuration is missing from governance, failures become difficult to explain and improvements become difficult to target. Teams may know that an output was problematic but remain unable to determine whether the cause lay in prompting, retrieval quality, tool use, human oversight or model settings. RAIDT addresses this by making configuration an evidential object rather than a hidden implementation detail.

Key idea: Runtime configuration matters because GenAI governance becomes operational only when the actual settings and controls of a specific run are visible, reviewable and contestable.

What this item controls
Practical example / likely audience question

Audience question

If we have already approved the model, why do we need to document every runtime configuration?

Answer

The concern behind this question is usually administrative burden: people assume that once the model has passed procurement, safety testing or policy approval, the governance problem has largely been solved. RAIDT rejects that assumption because the same approved model can behave very differently under different runtime conditions.

A direct answer is that model approval does not tell you how the model was actually used in the run being reviewed. A low-temperature summarisation prompt operating on a fixed internal knowledge base is materially different from a high-variability agentic workflow that retrieves from live web sources and calls external tools. Treating both cases as though they were governed simply because they share a model name would be analytically weak and operationally risky.

Consider a practical example. An enterprise team may use the same foundation model for two tasks: drafting internal meeting notes and generating customer-facing policy guidance. The first run may involve a fixed template, no external retrieval and light post-editing. The second may involve dynamic retrieval, a compliance prompt layer and mandatory human sign-off. RAIDT handles this better than generic AI governance because it documents the actual run configuration rather than relying on generic statements that "the model is approved". That makes the resulting evidence more useful for audit, challenge and redesign.

Practical example in RAIDT terms

A healthcare trust uses a GenAI assistant to draft discharge instructions for clinicians. In one run, the assistant is configured with a prompt that asks for clear patient-friendly language, retrieval from the trust's internal medicines guidance, temperature set low for consistency, and a mandatory clinician review before anything is shown to the patient.

The run-level issue appears when a later incident reveals that the retrieval connector was pointed to an outdated guideline index and the prompt template had been modified to make the assistant sound more decisive. The underlying model had not changed, but the runtime configuration had changed in a way that increased the risk of overconfident and outdated advice.

The evidence needed in RAIDT terms would include the model and version identifier, system prompt version, prompt template hash, retrieval index version, document timestamp, temperature and related generation settings, enabled tool list, reviewer identity and sign-off record, and the output shown to the clinician. The most affected RAIDT pillars would be Dependability, Auditability and Traceability, with clear Responsibility implications for who approved the configuration and who reviewed the run. Capturing runtime configuration in this way improves governance readiness because the organisation can reconstruct the event, contest the adequacy of controls and implement targeted remediation rather than issuing vague assurances.

Detailed link to RAIDT

Runtime configuration links to RAIDT in four ways.

First, it operationalises RAIDT's core claim that the governable unit is not the abstract model but the specific configured run in context.
Second, it defines a major part of what must be captured as run-level evidence in order to explain why a run produced the output that it did.
Third, it populates the evidence pack and shapes the RAIDT score profile, because different configurations change the level of assurance required across the five pillars.
Fourth, it supports reviewability, contestability, audit readiness and organisational learning by letting reviewers reconstruct and compare runs rather than rely on general vendor or policy claims.

Runtime configuration -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness

Link to the five RAIDT pillars

Responsibility

Runtime configuration supports Responsibility because it makes visible who selected, approved or altered the settings that shaped the run. Without this, accountability can be displaced onto an abstract "AI system" rather than assigned to identifiable design and oversight choices.

Example evidence / implication:

Auditability

This item strongly affects Auditability because auditors need to know how the run was assembled if they are to test whether controls were present and followed. Configuration data makes the difference between a reconstructable event and an opaque one.

Example evidence / implication:

Interpretability

Runtime configuration improves Interpretability by helping reviewers explain why the system behaved as it did. It does not make a large model internally transparent, but it does make the run more intelligible at the socio-technical level.

Example evidence / implication:

Dependability

This item strongly affects Dependability because stable and safe organisational use depends on knowing which settings produce acceptable performance and which introduce fragility. Configuration drift is often a practical source of failure.

Example evidence / implication:

Traceability

Runtime configuration is central to Traceability because it provides a chain from output back to the settings, sources and controls active at the time of generation. Without this, it is hard to establish provenance for the run.

Example evidence / implication:

This item affects all five pillars, but it is especially foundational for Auditability, Dependability and Traceability.

Why this item is more than a generic concept

In general AI governance, runtime configuration may be treated as a technical deployment detail, an MLOps concern or an implementation note for engineers. In RAIDT, it becomes a formal governance object. The RAIDT meaning is more operational because configuration is tied to a specific run, documented as evidence, reviewed in relation to the five pillars and used to support contestability, audit readiness and continuous improvement. That shift is important: RAIDT does not ask only whether a system exists, but how it was actually configured when it acted in an organisational context.

Common misunderstanding

Misunderstanding

Runtime configuration is just a technical fine-tuning detail; if the model is trustworthy, the configuration does not need separate governance attention.

Correction

This is incorrect because configuration choices often determine whether a model is used conservatively or riskily in practice. A trustworthy model can still be embedded in an unsafe run if the prompt encourages unwarranted certainty, if external retrieval is not controlled, or if tools are allowed to execute actions without sufficient review. For example, a finance assistant using an approved model may still become problematic if live market retrieval and persuasive output formatting are enabled without adequate disclosure or human checks. RAIDT therefore treats runtime configuration as governable evidence, not as background noise.

Boundary and limitation

Runtime configuration does not by itself prove that a run was correct, fair, lawful or contextually appropriate. It helps explain how the run was set up, but it does not replace output evaluation, domain expertise, human judgement or policy interpretation. A well-documented configuration can still produce a poor result, and a poorly documented configuration does not automatically indicate harmful intent.

There are also practical limits. Some vendor platforms expose only partial configuration information; some orchestration layers change dynamically; and some important influences, such as user intent or organisational pressure, are not reducible to technical settings. RAIDT handles these limitations by combining configuration evidence with output review, contextual notes, oversight records, incident analysis and explicit acknowledgement of uncertainty where visibility is incomplete.

Implementation levels

Manual implementation

A researcher or small team can apply this manually by recording the model used, prompt text, key settings, retrieval source, tool access and human review steps for each important run in a structured note or template.

Semi-automated implementation

Metadata capture, prompt registries, template fields, reviewer checklists and structured logging can reduce manual burden while still allowing analysts to validate whether the recorded configuration matches the real workflow.

Fully automated implementation

At scale, a wrapper, orchestration layer, governance dashboard or logging pipeline can automatically capture configuration state at run time, bind it to a run ID, store prompt and tool versions, record retrieval provenance, trigger policy checks and feed the evidence directly into RAIDT evidence packs and score calculations.

Practical use in the RAIDT project

In Paper 08 Foundations, runtime configuration helps justify why RAIDT governs the run rather than the model alone. It gives theoretical substance to the claim that governance-relevant behaviour emerges from socio-technical assembly at the point of use.

In Paper 09 Empirical Validation, this item supports comparison across real or simulated runs by showing how different configurations change auditability, consistency, reconstructability and reviewer confidence. It is therefore useful when testing whether RAIDT can discriminate between superficially similar uses that carry different governance risk.

In Paper 10 Policy Pathways and sector playbooks, runtime configuration becomes a practical policy lever. Organisations can specify which configurations are permitted for which task types, what evidence must be retained, which settings trigger enhanced review and how configuration changes should be governed over time. For viva defence, supervision meetings and journal positioning, this item is useful because it makes RAIDT visibly more operational than principle-only governance models.

Key audience questions to prepare for

Q1. How granular should runtime configuration evidence be?

It should be granular enough to reconstruct materially relevant aspects of the run without collapsing into meaningless exhaustiveness. The test is whether a reviewer could understand what shaped the output and whether a challenge could be investigated responsibly.

Q2. Does documenting runtime configuration create too much administrative burden?

It can if done badly, which is why RAIDT is compatible with tiered evidence capture. Low-risk runs may need lightweight metadata, while high-impact or contested runs justify deeper configuration records.

Q3. What if a vendor does not expose full configuration details?

Then the limitation itself becomes governance-relevant evidence. RAIDT can still record what is known, identify visibility gaps and reflect that reduced transparency in the score profile and assurance claims.

Q4. Is runtime configuration just another name for prompt logging?

No. Prompt logging is only one component. Runtime configuration also includes model selection, generation settings, retrieval scope, tool access, safety layers, workflow routing and human review arrangements.

Q5. Why is runtime configuration especially important for GenAI compared with traditional systems?

Because GenAI outputs are highly sensitive to contextual settings and orchestration choices. Traditional systems often rely on more stable, pre-specified logic, whereas GenAI behaviour is more contingent on run-time assembly.

Suggested citation concepts to support this item
Short explanation for presentation

Runtime configuration refers to the concrete prompts, settings, retrieval sources, tools, guardrails and review steps active in a particular GenAI run. It matters in RAIDT because the same model can behave very differently depending on how it is configured at the point of use. That means governance cannot stop at model approval or broad policy statements. RAIDT treats configuration as run-level evidence: something that must be captured if a reviewer is to understand, challenge or audit what happened. In practice, this makes RAIDT more operational than principle-led governance because it connects behaviour to evidence packs, score profiles and reconstructable organisational decisions. The concept is especially important in high-impact settings where configuration drift, hidden tool use or weak retrieval controls can materially change risk.

One-line takeaway

Runtime configuration is the governable set of prompts, settings, tools and controls active in a specific GenAI run because RAIDT ties those run-level choices to evidence, scoring and governance readiness.

Related items in origins, background and history
Anchored questions
Powered by Forestry.md