S4.06 - Prompt_ID_and_version
S4.06 — Prompt ID and version
flowchart LR
A[Informal prompt use, prompt drift, weak reconstruction] --> B[RAIDT
run-level evidence framework]
H[Prompt registry, version label, prompt hash,
approval note, task label, model pairing] --> C[[Prompt ID and version
exact governed prompt state used in a run]]
B --> C
C --> D[Evidence pack]
C --> E[RAIDT score profile]
D --> F[Reviewer reconstruction and contestability]
E --> G[Governance readiness and organisational learning]← Star S4 - Evidence Architecture and Artefacts
Star context: Specifies the concrete fields and artefacts that make a run record inspectable. Within this star, prompt ID and version identify which controlled instruction artefact governed a run, so a reviewer can connect one output to one approved prompt state rather than to a vague account of prompting practice.
Academic picture
Definition / background
Prompt ID and version identify the exact prompt artefact used in a run. The prompt ID is the stable identifier for a prompt template, instruction set, or controlled prompt object within a registry. The version specifies which released state of that prompt was active when the run occurred. Together, they answer a simple but crucial governance question: which approved instruction artefact shaped this output?
This concept originates in ordinary disciplines of version control, document control, and configuration management, but it has a distinct role in generative AI governance. Prompts are not merely user text; in many organisational settings they are operational instructions that influence output quality, safety, consistency, and compliance. When prompts are revised, even small changes in wording, role framing, constraints, examples, or formatting logic can materially alter downstream outputs.
Within RAIDT, prompt ID and version are not interchangeable with prompt text, prompt registry, or prompt hash. The registry records what prompt artefacts exist and how they are governed. The ID and version record which one was used for this run. The prompt hash can later support integrity checking of the prompt content itself. The distinction matters because RAIDT is concerned with run-level evidence: a reviewer must be able to connect a specific output to a specific prompt state, not merely to a general prompt family.
This item therefore belongs inside Evidence Architecture and Artefacts because it is one of the fields that make a run record inspectable. It supports the assembly of a run-level evidence pack and strengthens the defensibility of the RAIDT five-pillar score profile. Without prompt ID and version, governance claims about prompting are harder to verify, compare, or contest.
Why this concept matters
Prompt ID and version solve the problem of prompt ambiguity. In many real deployments, teams say that they used "the standard prompt", "the approved prompt", or "the latest prompt", but such descriptions are too vague for rigorous review. If several similar prompt variants exist, or if a prompt has evolved over time, reviewers cannot know which instructions actually shaped the run unless the record names the prompt artefact and its version.
The concept also prevents confusion between prompt design and prompt governance. A well-written prompt is not automatically a controlled prompt. Governance requires change history, release discipline, and the ability to link one run to one prompt state. This is especially important when organisations compare output quality across time, investigate incidents, or need to explain why two apparently similar runs produced different results.
If prompt ID and version are missing, several risks follow: prompt drift becomes hard to detect, audit reconstruction weakens, reviewers may compare the wrong prompt to the wrong output, and organisational learning about prompt improvements becomes anecdotal rather than evidential. RAIDT uses this field to move prompting from tacit practice towards accountable operational governance.
Key idea: Prompt ID and version matter because RAIDT can only review a run properly if the exact governed prompt state used in that run is identifiable.
What this item captures
- The stable identifier of the prompt artefact or prompt template used in the run.
- The specific released version of that prompt artefact at the time of execution.
- The controlled prompt state that shaped the run, even if later prompt revisions occurred.
- The link between prompt governance and run-level reconstruction.
- The basis for comparing outputs produced under different prompt versions.
- The evidential reference needed to connect the run record to registry entries, approval notes, change logs, and related hashes.
- The prompt-level change-control context that supports scoring, review, and organisational learning.
Practical example / likely audience question
Audience question
If the prompt text is stored somewhere, why does every RAIDT run still need a prompt ID and version?
Answer
The concern behind this question is that storing the full prompt text may appear sufficient. However, text alone does not provide dependable governance context. A reviewer still needs to know whether that text was the approved version, whether it had been superseded, whether multiple variants existed for different workflows, and whether the run can be linked back to a governed change history.
A direct answer is that prompt ID and version turn prompt content into a controlled artefact reference. For example, a university may keep many admissions-support prompts in shared documents. If one applicant receives an inconsistent or overly assertive draft response, investigators need more than a copied prompt paragraph. They need to know whether the run used admissions_triage_prompt v2.4 or admissions_triage_prompt v3.0, whether version v3.0 had already introduced stricter escalation wording, and whether staff followed the approved release.
RAIDT handles this better than a generic AI governance approach because it links the prompt reference directly to the run as the unit of governance. Rather than asking only whether prompts are documented somewhere, RAIDT asks whether the exact prompt artefact used in the reviewed event is identifiable, reconstructable, and contestable.
Practical example in RAIDT terms
Consider a public-service setting in which a local authority uses GenAI to draft first-pass housing-support letters for citizens. The use case is administratively useful, but the run-level issue is whether the draft used the correct approved prompt for vulnerable applicants, including the required safeguarding and escalation language.
The evidence needed for one disputed run includes the task label, timestamp, operator role, prompt ID, prompt version, prompt hash, model/provider/version, retrieval inputs if any, generated draft, human edits, and reviewer decision. If the authority cannot show whether the run used housing_support_letter_prompt v1.6 or an older v1.3, it cannot reliably explain why the draft omitted a safeguarding paragraph that newer prompt versions were designed to include.
In RAIDT terms, Responsibility is affected because prompt release and use should be governed by named roles. Auditability is affected because the reviewer must reconstruct the exact instruction state. Interpretability is affected because the explanation of the output depends partly on which instructions were active. Dependability is affected because prompt version changes may improve or degrade consistency. Traceability is strongly affected because the prompt reference is part of the chain linking configuration to outcome. Recording prompt ID and version therefore improves governance readiness by making prompt change visible at the level of one real run.
Detailed link to RAIDT
Prompt ID and version link to RAIDT in four ways.
First, they operationalise the RAIDT core idea that governance should attach to what happened in one real GenAI use event, not only to broad statements that prompts are "managed" or "approved".
Second, they strengthen the run as the unit of governance by recording which governed instruction artefact shaped that run.
Third, they improve the evidence pack and the RAIDT score profile because reviewers can connect outputs, review notes, and performance judgements to a precise prompt state rather than to a vague prompting practice.
Fourth, they support reviewability, contestability, audit readiness, and organisational learning by making prompt changes visible across time and across comparable runs.
Prompt ID and version -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness
Link to the five RAIDT pillars
Responsibility
Prompt ID and version support Responsibility by clarifying which prompt artefact was authorised for use and which governance roles were responsible for approving or maintaining it.
Example evidence / implication:
- A run record shows that only approved prompt versions may be used for a regulated workflow.
- Reviewer notes can distinguish operator misuse from a defect in the approved prompt version itself.
Auditability
This item has a particularly strong effect on Auditability because reviewers cannot reliably reconstruct prompt-dependent behaviour if the prompt state is unidentified.
Example evidence / implication:
- An auditor can trace a contested output back to
prompt_id=claims_draftingandversion=4.2. - Change-control records can be checked against the run to determine whether the correct prompt release was in force.
Interpretability
Prompt ID and version support Interpretability by making the instruction context of the output intelligible. They do not open the model internals, but they help explain what task framing and constraints the system received.
Example evidence / implication:
- Reviewers can compare how version
2.1and version2.3framed the same classification task. - Explanatory notes can link an output pattern to a known prompt revision rather than to speculation.
Dependability
This item supports Dependability because stable version references allow organisations to test whether prompt updates improve consistency, reduce failure, or introduce regressions.
Example evidence / implication:
- Quality assurance teams can compare output reliability before and after a prompt revision.
- Incident reviews can identify whether a run failure coincided with an untested prompt change.
Traceability
Prompt ID and version are central to Traceability because they connect the run to a specific controlled instruction artefact within the wider evidence chain.
Example evidence / implication:
- The run record can be linked from the prompt registry to the prompt hash and then to the produced output.
- Cross-run analysis can identify which outputs were generated under the same prompt version and which were not.
This item affects all five pillars, but it is especially strong for Auditability and Traceability because those pillars weaken immediately when the governing prompt state cannot be identified.
Why this item is more than a generic concept
In general AI governance, prompt versioning may be treated as a useful engineering practice or as part of informal prompt management. In RAIDT, prompt ID and version have a stricter meaning: they are recorded run-level evidence fields that connect one output to one governed prompt artefact state.
The RAIDT meaning is more operational because it is tied to evidence packs, score profiles, incident review, and governance readiness. The question is not merely whether a team keeps prompt versions somewhere. The question is whether a reviewer examining one run can identify the exact prompt state that influenced that run and assess its governance implications.
Common misunderstanding
Misunderstanding
A prompt version is only useful for prompt engineers and does not matter once the output has been generated.
Correction
That is too narrow. Prompt version is not only a design convenience; it is governance evidence. Suppose a compliance team challenges a generated draft that appears more permissive than expected. Without the prompt version, the organisation may assume that the operator acted carelessly, when in fact the approved prompt was recently changed and the new wording removed an important caution. Recording the prompt ID and version allows reviewers to distinguish user behaviour, prompt design, and system behaviour more accurately.
Boundary and limitation
Prompt ID and version do not by themselves prove that a run is reproducible, safe, lawful, or high quality. A run may still differ because of model updates, decoding parameters, retrieval context, tool use, or changes in source materials. This field therefore supports reconstruction, but it does not guarantee exact replay.
The item also does not replace neighbouring controls. A prompt registry is still needed to govern the catalogue of prompts. A prompt hash may still be needed to verify content integrity. Model/provider/version identifiers remain necessary because the same prompt version can behave differently across models or provider updates.
The concept works only if the organisation actually maintains prompt control discipline. If prompt IDs are inconsistently assigned, versions are overwritten, or staff can use untracked prompt variants without consequence, the field becomes nominal rather than evidential. RAIDT handles this limitation by treating prompt ID and version as one component in a wider evidence architecture rather than as a stand-alone proof of good governance.
Implementation levels
Manual implementation
A researcher or small team can implement this manually by keeping a simple prompt register and requiring each significant run record to include the prompt ID and version used. Version numbers can be maintained in a spreadsheet or controlled note, with short change notes explaining what changed and why.
Semi-automated implementation
Semi-automated implementation can use structured templates, prompt libraries, and metadata fields in a form or wrapper. When an operator selects a prompt from an approved set, the system can auto-fill the prompt ID and current version while still allowing a reviewer to record contextual notes about why that prompt was chosen.
Fully automated implementation
At scale, an orchestration layer, prompt management platform, or governance pipeline can issue immutable prompt identifiers, enforce release workflows, attach version metadata to every run, link prompt hashes and approval records, and surface prompt-level analytics in a dashboard. This allows evidence packs and RAIDT scoring workflows to consume prompt references automatically rather than relying on manual recollection.
Practical use in the RAIDT project
Within the RAIDT project, this item is useful in Paper 08 Foundations because it shows that prompt control is not merely a usability issue but an evidential requirement at run level. It also matters in Paper 09 Empirical Validation, where prompt-dependent variation must be interpreted against a recorded prompt version rather than treated as unexplained noise.
For Paper 10 Policy Pathways, prompt ID and version help translate broad policy demands for accountability into a concrete metadata requirement that organisations can implement. The item also supports sector playbooks because different sectors may govern different prompt families, but all still need a way to identify which prompt state governed a given run.
In the evidence pack and scoring rubric, this field helps justify pillar-level assessments concerning reconstruction, review, and change control. In supervision, viva defence, and journal positioning, it is useful because it shows that RAIDT does not speak about prompting in abstract terms. It asks for inspectable evidence of which prompt artefact was actually used.
Key audience questions to prepare for
Q1. Why record prompt ID and version if the full prompt text is already archived?
Because archived text does not by itself show release status, approval lineage, or which governed prompt state was active for the run. RAIDT needs the controlled reference, not only the raw wording.
Q2. Is prompt versioning only necessary for high-risk deployments?
The need is strongest where consequences are serious, but the logic is broader. Any organisation that wants to compare runs, investigate failures, or learn from prompt changes benefits from identifying prompt versions consistently.
Q3. How is prompt ID and version different from prompt hash?
The ID and version provide a human-readable governance reference. The hash supports integrity checking of the prompt content. In RAIDT they are complementary, not interchangeable.
Q4. What happens if staff bypass the approved prompt library?
That is itself a governance finding. A missing or invalid prompt ID/version reveals control failure, weakens the evidence pack, and should affect pillar scoring, especially for Auditability and Traceability.
Q5. Does this field guarantee that a run can be reproduced exactly?
No. It improves reconstruction and explanation, but exact reproduction may still be limited by model updates, stochastic settings, retrieval changes, and external tool behaviour.
Suggested citation concepts to support this item
- prompt versioning in generative AI governance
- configuration management for AI prompts
- prompt engineering change control in organisational workflows
- AI audit trails for prompt-based systems
- provenance and traceability of prompts in large language model applications
- documentation standards for prompt libraries and prompt registries
- reproducibility challenges in prompt-based AI systems
- governance of prompt drift in enterprise GenAI deployment
- sociotechnical accountability for prompt-mediated AI outputs
- operational metadata for AI assurance and reviewability
Short explanation for presentation
Prompt ID and version are the fields that tell RAIDT exactly which controlled prompt artefact was used in a particular run. That matters because prompts change over time, and even small prompt revisions can alter output quality, tone, escalation behaviour, or compliance performance. If a run record says only that "the standard prompt" was used, reviewers cannot know which instruction state actually shaped the output. RAIDT therefore treats prompt ID and version as run-level evidence, not as optional engineering detail. They help organisations reconstruct disputed runs, compare behaviour across prompt updates, connect outputs to approved prompt releases, and justify score-profile judgements on auditability and traceability. In short, this item turns prompting from informal practice into inspectable governance evidence.
One-line takeaway
Prompt ID and version are the controlled reference to the exact prompt state used in a run because RAIDT governs generative AI through inspectable run-level evidence.
Related items in evidence architecture and artefacts
- S4.01 · run_id
- S4.02 · Timestamp
- S4.03 · User role / operator role
- S4.04 · Task and domain label
- S4.05 · Prompt registry
- S4.07 · Prompt hash
- S4.08 · Model/provider/version identifier
- S4.09 · Decoding parameters
- S4.10 · Retrieval query and index ID
- S4.11 · Retrieved document IDs and hashes
- S4.12 · Tool-chain trace
- S4.13 · Adapter ID / PEFT lineage
- S4.14 · Alignment policy ID
- S4.15 · Output hash
- S4.16 · Review decision and reviewer notes
- … and 1 more