Q229 - Alignment_policy_ID_definition_example_and_why_it_matters_in

Q229 — Alignment policy ID — definition, example, and why it matters in RAIDT

← RAIDT · Star S4 - Evidence Architecture and Artefacts · primary item: S4.14 · Alignment policy ID

D. Evidence Architecture | Ordered by mind-map priority: inner circles first, then operational detail.

Appears in sources
Answer

In RAIDT terms, an Alignment policy ID is the run-record identifier for the active safety, preference, or RLHF/DPO-style layer that shaped the response in a particular configured use. The exact field label is an implementation choice, but the underlying requirement is clear in the papers: alignment controls should be logged as an active policy layer, and traceability should preserve which versions and policies were active. This places the field squarely inside the run-level evidence pack rather than in static model documentation, because RAIDT assumes that behaviour materialises at run time and must be evidenced where it occurs.

A suitable example would be a value such as align-clinical-safe-v3.1, linked to a documented refusal template set and change record. In the run-level evidence pack, that identifier sits alongside the prompt version, model/provider/version, retrieval snapshot identifiers, output hash, and reviewer decision. Its contribution is governance rather than mere engineering convenience. It helps reviewers reconstruct why a response was more conservative, why certain requests were refused, or why uncertainty language appeared. It also improves comparability across runs and strengthens the score profile across the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability). Because RAIDT couples run evidence with anchored scoring, the presence of a stable Alignment policy ID helps move assessment away from narrative assurance and towards anchors 1=missing / 3=partial / 5=audit-ready. In that sense, the field matters because it records one of the influence methods as governance interventions that materially shape organisational risk.

Practical example

In cybersecurity alert triage, an organisation may use GenAI to draft the first-pass explanation of whether an alert looks benign, suspicious, or urgent. Suppose the same base model is used in both the security operations centre and a later incident review. The outputs differ: one is cautious and asks for analyst confirmation, while another offers a firmer recommendation.

If the run-level evidence pack includes an Alignment policy ID such as soc-safe-triage-v5, reviewers can determine whether the difference arose from the active safety layer rather than from prompt drift alone. That matters in incident review, because the organisation can then assess whether the policy version improved responsible escalation behaviour, or whether it introduced over-confidence that reduced Dependability and Responsibility in a high-stakes workflow.

Sources in RAIDT papers
Powered by Forestry.md