S10.04 - 6_configurations

S10.04 ? 6 configurations

flowchart LR
    A[Background problem:
organisations compare outputs,
not governance effects of configuration] --> B[RAIDT:
run-level evidence framework] B --> C[[6 configurations:
comparative intervention set]] C1[Baseline prompting] --> C C2[Structured prompting] --> C C3[RAG] --> C C4[PEFT / LoRA] --> C C5[RLHF-type controls] --> C C6[Stacked influence] --> C H[Healthcare] --> C I[Finance] --> C J[Public services] --> C K[Cybersecurity] --> C L[Supply chain] --> C C --> D[Evidence pack] C --> E[RAIDT score profile] C --> M[Governance move:
evidence over assertion,
reviewability, contestability,
audit readiness] D --> F[Reviewer reconstruction] E --> G[Governance readiness] F --> N[Organisational learning] G --> N

? Star S10 - Empirical Programme, Domains and Sector Playbooks

Star context: Places RAIDT's empirical programme inside a comparative design space by showing how the framework is tested across six distinct configuration conditions rather than being asserted from a single model setup.


Academic picture
Definition / background

In this item, the six configurations are the main comparative conditions through which RAIDT examines how a generative AI run is shaped by different forms of influence and control. They include baseline prompting, structured prompting, retrieval-augmented generation (RAG), parameter-efficient fine-tuning such as PEFT/LoRA, RLHF-type controls, and stacked influence, where several control layers are combined. The concept comes from experimental comparison: if the task remains broadly stable while the configuration changes, the analyst can examine what difference the configuration makes.

Within RAIDT, a configuration is not merely a technical setting. It is a governance-relevant arrangement of prompts, retrieval sources, model adaptations, feedback constraints, and control layers that affects how a run behaves and how that behaviour can later be evidenced. This matters because RAIDT treats the run as the unit of governance. If the run is configured differently, it is not simply the same run with a cosmetic variation; it is a different evidential condition with different review and assurance implications.

This item therefore belongs centrally inside RAIDT's empirical programme. The framework is not intended to rest on abstract principles alone. It is intended to show, through structured comparison, how governance readiness changes when different influence methods are applied. The six configurations make that comparison visible and operational. They help explain why a run-level evidence pack and a five-pillar score profile are more meaningful when they are produced across comparable configuration conditions rather than in isolation.

The idea also helps distinguish RAIDT from generic capability benchmarking. A benchmark may tell an organisation whether a model performs well on a task. S10.04 asks a different question: under which configuration is that performance most responsible, auditable, interpretable, dependable, and traceable? That shift is what makes the item important for governance rather than only optimisation.

Why this concept matters

Many organisations evaluate generative AI systems as if governance were separate from configuration, when in practice the configuration often determines what can be checked, justified, reproduced, and challenged later. If configuration differences are ignored, quality improvements may be mistaken for governance improvements, and governance failures may be wrongly attributed to the model alone rather than to the way the run was assembled.

The six configurations solve that problem by turning influence methods into explicit comparative conditions. This avoids confusion between model capability, deployment design, and assurance quality. It also reduces the risk of making broad governance claims from a single setup that happens to perform well in one domain but is poorly evidenced, weakly traceable, or difficult to reconstruct under review.

For organisations using GenAI in real work, this matters because procurement, policy, and internal oversight often ask whether controls are proportionate and effective. RAIDT can answer that question more convincingly when it can show how governance readiness changes across baseline, structured, retrieval-based, tuned, feedback-shaped, and stacked configurations. That is a move from principle to operational governance.

Key idea: The six configurations matter because they make configuration itself visible as an empirical governance variable rather than leaving it hidden behind output quality claims.

What this item enables
Practical example / likely audience question

Audience question

Why compare configurations?

Answer

The concern behind this question is usually that configuration sounds like an engineering detail, while governance is assumed to sit in policy, oversight committees, or post hoc review. RAIDT rejects that separation. The direct answer is that configuration determines how governable the run is, not just how fluent or accurate the output appears.

Consider a case where the same organisational task is run under two conditions. In the first, a user enters a free-form prompt into a general model. In the second, the task is carried out with a structured prompt, controlled retrieval sources, and a documented review step. Even if both outputs appear acceptable, the second configuration gives a stronger basis for explanation, reconstruction, challenge, and accountability. That means the organisation has changed governance readiness, not merely surface quality.

RAIDT handles this better than a generic AI governance approach because it does not stop at saying that controls should exist. It compares how different controls behave at run level and records the evidence of those differences. That makes the answer empirically defensible: the organisation can show why a chosen configuration is preferable, under what conditions, and with what trade-offs.

Practical example in RAIDT terms

A hospital operations team uses a generative AI system to draft discharge-planning summaries for clinicians. In a baseline prompting configuration, the model receives a short free-text instruction and produces a plausible summary, but the source basis for its recommendations is unclear. In a structured prompting plus RAG configuration, the same task is run with a discharge-summary template, retrieval from approved hospital guidance, and a reviewer sign-off step.

The run-level issue is not only whether the text reads well. It is whether the organisation can show what instructions were used, what knowledge sources informed the output, whether local guidance was current, and how a reviewer checked the result before use. The evidence needed includes the prompt template version, model identifier, retrieval corpus version and timestamps, access logs for the guidance set, reviewer notes, and the rationale for the resulting RAIDT pillar scores.

In this case, Responsibility is improved because role allocation and sign-off are clearer. Auditability and Traceability improve because the retrieved sources and prompt structure can be reconstructed. Interpretability improves because the output follows a known template and source basis. Dependability improves if repeated runs show stable behaviour under the structured configuration. Governance readiness is therefore increased not because the model became magically safe, but because the configuration made the run more governable and reviewable.

Detailed link to RAIDT

6 configurations links to RAIDT in four ways.

First, it turns RAIDT from a static governance claim into an empirical comparison framework for different intervention conditions.
Second, it fits RAIDT's core unit of analysis because each configuration creates a distinct run condition that can be documented and reviewed at run level.
Third, it feeds directly into the evidence pack and the five-pillar score profile by showing how governance readiness shifts when retrieval, tuning, feedback, or stacked controls are introduced.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning because reviewers can reconstruct why one configuration was selected over another and what evidence supports that choice.

6 configurations ? Run variants ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

Link to the five RAIDT pillars

Responsibility

The six configurations affect who is accountable for shaping the run and for approving its outputs. As configuration becomes more structured, responsibility can be allocated more clearly across prompt design, retrieval governance, model adaptation, review, and deployment ownership.

Example evidence / implication:

Auditability

This item strongly affects Auditability because each configuration changes what can be inspected after the event. Baseline prompting often leaves thin evidence, whereas structured prompting, RAG, and controlled adaptations can leave richer records for later review.

Example evidence / implication:

Interpretability

Interpretability is shaped by how understandable the run is to a human reviewer. A structured configuration can make the output easier to interpret by constraining form, surfacing sources, and linking behaviour to known control layers.

Example evidence / implication:

Dependability

Dependability concerns whether the run behaves consistently enough for organisational use. Comparing six configurations helps identify which setups produce more stable and robust outcomes across repeated runs and realistic scenarios.

Example evidence / implication:

Traceability

Traceability is one of the clearest benefits of explicit configuration comparison. The more fully a configuration is specified, the easier it is to trace an output back to prompt logic, retrieved material, adaptation state, and reviewer intervention.

Example evidence / implication:

Why this item is more than a generic concept

In generic AI governance, configuration may mean an engineering choice about performance, cost, latency, or user experience. In RAIDT, configuration is more than a deployment parameter. It is a governable intervention condition whose effect on responsibility, auditability, interpretability, dependability, and traceability must be evidenced at run level.

The RAIDT meaning is more operational because the six configurations are tied to run-level evidence rather than abstract statements about best practice. Instead of saying that retrieval or feedback mechanisms are helpful in principle, RAIDT asks what difference they make in a specific run, what evidence that difference leaves behind, and whether the result improves governance readiness in a defensible way.

Common misunderstanding

Misunderstanding

If a configuration has more layers of control, it is automatically the most governable option.

Correction

More control layers do not automatically produce better governance. A stacked configuration can improve reviewability, but it can also introduce new opacity, new maintenance burdens, or weak provenance if the added layers are not themselves documented. For example, a RAG setup without proper source versioning may improve factual grounding while simultaneously weakening Traceability because reviewers cannot later confirm exactly what was retrieved at the time of the run.

The correct RAIDT interpretation is comparative and evidential: a configuration is preferable only if the evidence shows that it improves governance readiness for the task and context in question.

Boundary and limitation

This item does not prove that one configuration is universally best across all domains, tasks, or risk levels. The six configurations are a structured comparison set, not an exhaustive map of every possible intervention. They also do not replace scenario design, repeated runs, human review, or domain-specific judgement.

A configuration comparison may fail if the scenarios are too narrow, if evidence is inconsistently captured, or if different configurations are compared without controlling for the task context. It may also overstate readiness if better output quality is mistaken for better governance. RAIDT handles these limitations by combining configuration comparison with repeated runs, domain playbooks, run-level evidence packs, and pillar-based scoring rather than relying on a single metric or single demonstration.

Implementation levels

Manual implementation

A researcher or small team manually defines the six configurations, runs the same task under each one, records the prompts and outputs, notes any retrieval sources or tuning layers, and scores each run against the five RAIDT pillars. This is labour-intensive but useful for early-stage conceptual work, supervision discussions, and proof-of-concept studies.

Semi-automated implementation

Templates, metadata forms, and structured review sheets are used to capture configuration details consistently. Prompt libraries, versioned source lists, run logs, and scoring rubrics reduce inconsistency and make cross-configuration comparison easier without requiring a full platform.

Fully automated implementation

A governance-aware orchestration layer records configuration metadata automatically for every run, including prompt version, retrieval index version, adapter or fine-tuning identifier, feedback-control state, reviewer actions, and resulting RAIDT scores. Dashboards can then compare the six configurations across domains, highlight trade-offs, and support audit preparation at scale.

Practical use in the RAIDT project

Within the RAIDT project, S10.04 is useful for positioning the framework as an empirical programme rather than a purely normative proposal. It helps explain how Paper 08 Foundations can frame configuration as a governance variable, how Paper 09 Empirical Validation can compare run conditions systematically, and how Paper 10 Policy Pathways can translate those comparisons into organisational assurance guidance.

It is also central to sector playbooks because different domains may justify different preferred configurations. Healthcare, finance, law, public services, cybersecurity, and supply chain contexts do not require identical control stacks. The six configurations offer a common comparison language while still allowing domain-sensitive recommendations.

For the evidence pack and scoring rubric, this item clarifies why influence methods belong inside governance analysis. For viva defence and supervisor explanation, it provides a strong answer to the question of how RAIDT moves from conceptual principles to testable, reviewable interventions.

Key audience questions to prepare for

Because RAIDT is testing how governance readiness changes under different intervention conditions. A single recommended setup would hide the comparative evidence needed to justify why one configuration is preferable for a given domain or task.

Q2. Are the six configurations meant to form a maturity ladder?

Not in a simplistic sense. They represent increasingly different ways of shaping runs, but a more complex configuration is not automatically better. The relevant question is whether the added configuration improves run-level evidence and governance readiness in context.

Q3. Why is baseline prompting still included if it is weakly governed?

It provides an essential comparison condition. Without a baseline, it is harder to show what additional governance value is created by structure, retrieval, tuning, feedback, or stacked controls.

Q4. Can the same organisation use different configurations for different tasks?

Yes, and that is one reason the item matters. RAIDT supports proportionate governance, so low-risk and high-risk tasks may justify different configurations provided the choice is evidenced and reviewable.

Q5. Does configuration comparison replace human oversight?

No. It helps specify where human oversight is needed, what reviewers should inspect, and which control layers support that oversight. Human judgement remains necessary, especially in high-stakes or domain-sensitive settings.

Suggested citation concepts to support this item
Short explanation for presentation

This item explains why RAIDT compares six different configurations rather than judging generative AI from a single setup. The key point is that configuration is not only a technical choice about performance; it is also a governance choice that changes what evidence exists, what reviewers can reconstruct, and how confidently an organisation can defend the run. By comparing baseline prompting, structured prompting, RAG, PEFT or LoRA adaptation, RLHF-type controls, and stacked influence, RAIDT can show how governance readiness shifts across real intervention conditions. That makes the framework stronger for supervision, empirical validation, and policy discussion, because it ties assurance claims to run-level evidence, evidence packs, and the five-pillar score profile rather than to broad principles alone.

One-line takeaway

6 configurations is the comparative intervention set through which RAIDT shows that governance readiness depends on how a run is configured, evidenced, and reviewed.

Related items in empirical programme, domains and sector playbooks
Anchored questions

Audience question: Why compare configurations? Answer: to show that influence methods change governance readiness, not only output quality.

Powered by Forestry.md