Q134 - Why_do_alignment_policy_versions_matter

Q134 — Why do alignment policy versions matter?

← RAIDT · Star S4 - Evidence Architecture and Artefacts · primary item: S4.14 · Alignment policy ID

Appears in sources
Answer

Alignment policy versions matter because RAIDT is designed for systems whose behaviour changes through governed configuration over time, not only through changes to the underlying model. The papers repeatedly stress that generative AI is dynamic: prompts change, corpora drift, tools evolve, adapters are updated, and safety settings are adjusted. Within that logic, an alignment layer is a governed artefact with its own change history. Recording only that an alignment control existed is therefore insufficient; reviewers need to know which version was active in the contested run. Otherwise, reconstruction becomes approximate rather than evidential.

Versioning matters for three linked reasons. First, it supports post-hoc reconstruction. RAIDT requires organisations to show what happened in one configured use, including which policies were active. Second, it supports comparison across runs. If two runs use the same task, prompt, and base model but different alignment policy versions, differences in refusal behaviour, caution, or permissible output structure may be attributable to the policy layer rather than to random variance. Third, it supports governance improvement and change control. The score profile is meant to be comparable across runs, and the anchors 1=missing / 3=partial / 5=audit-ready only become defensible when key governed artefacts are versioned. In short, versioning prevents alignment from becoming an invisible background assumption. It makes policy change auditable, links behaviour to governance intervention, and supports RAIDT?s claim that run as the unit of governance is the correct level for organisational review.

Practical example

Consider a public-service eligibility assistant that summarises entitlement rules for caseworkers. In April, the organisation deploys alignment policy APS-2.3, which encourages cautious wording and frequent escalation to human review. In June, it switches to APS-2.4, which is less conservative and produces more direct answers. A claimant later challenges inconsistent advice.

If the run-level evidence pack records only the model version and prompt template, the organisation cannot explain whether the difference arose from the model, the retrieval corpus, or the safety layer. If it records the Alignment policy ID with version, reviewers can see that a governance intervention changed. That supports auditability, traceability, and a more defensible score profile for the disputed runs.

Sources in RAIDT papers
Powered by Forestry.md