Q271 - How_interventions_change_governance_outcomes

Q271 — How interventions change governance outcomes

← RAIDT · Star S6 - Influence Methods as Governance Interventions · primary item: S6.01 · Governance interventions

The project does not treat influence methods as performance tricks alone. It treats them as governance interventions because they change what evidence exists and how runs score.

Appears in sources

workshop_dense_100#slide 94

Answer

Interventions change governance outcomes in RAIDT because each one redistributes control and evidence across the pipeline. Prompting alters instruction, role framing, output structure, and uncertainty language, so it often changes Responsibility and Interpretability first. LoRA changes governance outcomes by localising behavioural change into adapters that can be hashed, versioned, reviewed, and rolled back; this tends to improve Dependability and Auditability. RAG changes outcomes by attaching claims to retrieved sources and retrieval policies, making provenance inspectable and thereby raising Traceability and Interpretability. RLHF and DPO change outcomes through preference-shaped behaviour, often improving safety signalling and tone, but only reliably improving governance when reward provenance, reviewer governance, and preference logs are explicit.

The papers therefore imply that governance outcomes are not fixed properties of the base model. They are effects of intervention design plus documentation. Two systems built on the same model can have markedly different score profile outcomes because one stores prompt versions, retrieval IDs, and output hashes while the other does not. This is why RAIDT treats run as the unit of governance. The relevant object of assessment is not the abstract method name, but the concrete run-level evidence pack produced by that method in practice.

At branch level, the strongest pattern is compositional. Prompt-only variants remain fragile in high-risk settings. LoRA strengthens stability but does not by itself ground claims. RAG supplies provenance but adds operational overhead. RLHF can improve responsibility but can also reduce auditability when reward channels are opaque. Stacked configurations therefore shift governance outcomes most substantially, because they combine behavioural steering with richer artefact production. Supervisors should read those changes through the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability) and the anchors 1=missing / 3=partial / 5=audit-ready, rather than through performance metrics alone.

Practical example

In a cybersecurity setting, an SOC may compare incident narratives produced by the same base model under four interventions. A prompt-only version gives a neat summary but weak forensics. Adding LoRA stabilises the house style for severity labels. Adding RAG links the narrative to concrete flow features, time windows, and prior incident references. Adding an RLHF-style overlay may improve cautionary language and escalation phrasing, but only if reviewer and preference logs are retained.

Those interventions change governance outcomes because the analyst, auditor, and incident lead can do different things with the result. With a full run-level evidence pack, they can replay the run, inspect cited evidence, understand why the narrative was accepted, and trace which intervention caused a failure. Without that pack, the narrative may still look good, but it is much harder to govern.

Sources in RAIDT papers

05-RAIDT_LoRA_V2
06-RAIDT_RAG_V1
07-RAIDT_RLHF_V1