S4.05 - Prompt_registry
S4.05 ? Prompt registry
flowchart LR
A1[Prompts treated as disposable text]
A2[Unclear ownership and approval]
A3[Hidden wording changes]
A4[Weak reviewer reconstruction]
B[RAIDT - run-level evidence framework]
C[[Prompt registry - governed prompt artefact record]]
D[Governance move - evidence over assertion]
E[Run-level evidence pack]
F[RAIDT score profile]
G[Reviewer reconstruction]
H[Organisational learning]
I[Policy-aligned governance readiness]
J1[Prompt ID and version]
J2[Prompt hash]
J3[Owner and approver]
J4[Change rationale]
J5[Task/domain label]
J6[Linked run record]
A1 --> B
A2 --> B
A3 --> B
A4 --> B
B --> C
C --> D
C --> E
C --> F
C --> G
C --> H
E --> I
F --> I
G --> I
H --> I
J1 --> C
J2 --> C
J3 --> C
J4 --> C
J5 --> C
J6 --> C? Star S4 - Evidence Architecture and Artefacts
Star context: Specifies the concrete fields and artefacts that make a run record inspectable. In RAIDT, the prompt registry is the governance layer that makes prompt use reviewable rather than informal, hidden, or anecdotal.
Academic picture
Definition / background
A prompt registry is the governed record through which an organisation defines, stores, maintains, and controls prompt templates used in generative AI work. In practical terms, it holds the canonical version of a prompt, who owns it, what status it has, how it changed over time, and why those changes were made. In RAIDT, this matters because prompt wording is not incidental. It can alter task framing, output style, acceptable evidence, risk exposure, and the degree to which a run can later be interpreted or challenged.
Conceptually, the prompt registry sits between informal prompt practice and formal run documentation. A saved prompt text on its own is useful, but it is not yet governance. Governance requires identity, version control, change rationale, accountable ownership, and the ability to connect a specific run back to the exact prompt artefact that shaped it. That is why RAIDT distinguishes the broader prompt registry from narrower items such as prompt ID and version, prompt hash, and run-level logging.
The item belongs inside RAIDT because RAIDT treats the run as the unit of governance. If a run is to be reviewable, the prompt used in that run must be reconstructable as a controlled artefact rather than remembered approximately. The prompt registry therefore supports the run-level evidence pack by linking prompt design decisions to actual system use, and it supports the five-pillar score profile by making those decisions assessable rather than assumed.
A prompt registry is also different from a generic prompt library. A library may support reuse and convenience. A registry supports evidence, accountability, and control. In RAIDT terms, it helps move prompt management from ad hoc craft practice to inspectable organisational governance.
Why this concept matters
Without a prompt registry, organisations often know that a model was used but cannot reliably show which prompt template governed the run, whether the wording had been changed recently, or whether the person using it departed from approved practice. This creates avoidable ambiguity in responsibility assignment, output interpretation, audit review, and post hoc learning.
The concept matters because prompts are a material part of system behaviour. If wording changes the scope of a task, the role assigned to the model, the instructions about evidence use, or the threshold for escalation, then prompt management is a governance issue rather than a convenience issue. A registry reduces the risk that meaningful behavioural change is hidden inside undocumented wording variation.
For organisations using GenAI, the registry helps separate three questions that are often confused: what prompt design was approved, what prompt version was actually used in a given run, and whether that prompt was suitable for the context. RAIDT needs all three distinctions because it aims to support reviewability, contestability, and continuous improvement at the level of concrete use.
Key idea: A prompt registry matters because RAIDT can only govern prompt-driven behaviour if prompts are treated as controlled artefacts linked to specific runs.
What this item controls
- The canonical prompt templates that an organisation recognises as legitimate governance artefacts.
- The identity and status of each prompt, including owner, approval state, and intended task domain.
- Version history, including what changed, when it changed, and why the change was made.
- Connections between prompt records and related evidence such as prompt IDs, hashes, task labels, tool use, and model settings.
- The boundary between approved prompt design and informal prompt improvisation.
- The conditions for reviewer reconstruction when a run needs to be examined, challenged, or compared.
Practical example / likely audience question
Audience question
Why is a prompt registry needed if RAIDT already stores the prompt text used in the run?
Answer
The concern behind this question is usually that saving the final text string appears sufficient for traceability. It is sufficient for partial reconstruction, but not for governance. A single saved text does not show whether that text came from an approved template, whether it was superseded, whether it had been modified by an operator, or why its wording had changed since earlier runs.
The direct answer is that a registry adds institutional meaning to prompt text. It turns a prompt from a raw input into a governed artefact with provenance, accountability, and change history. For example, a clinical summarisation team may save the exact prompt used in a run, but without a registry they may still be unable to show whether the wording that instructed the model to "prioritise likely diagnosis" was newly introduced, experimentally altered, or formally approved. That gap matters because reviewers need to know whether the behaviour arose from the model, the prompt design, or a departure from procedure.
RAIDT handles this better than generic AI governance because it ties the registry to run-level evidence rather than treating prompt management as a separate policy document. In RAIDT, the question is not merely whether prompts are documented somewhere. The question is whether a specific run can be connected to the exact governed prompt artefact that shaped that run and can therefore be reviewed on evidence.
Practical example in RAIDT terms
Consider a healthcare provider using a GenAI assistant to draft discharge-summary explanations for patients in plain English. One run produces a clinically acceptable output; another run, using the same model and task label, produces language that overstates certainty and omits a medication warning. The run-level issue is whether the difference arose from data, operator behaviour, decoding settings, or prompt wording.
A prompt registry allows reviewers to discover that the second run used a newly edited prompt template that instructed the model to "keep the message concise and confident" and that this change had been introduced to improve readability. The evidence needed includes the prompt ID, version, hash, owner, change rationale, approval status, timestamp of the revision, and the run record showing which version was invoked.
The RAIDT pillars most affected are Responsibility, Auditability, Interpretability, and Traceability, with Dependability also relevant because prompt drift can destabilise output quality. Governance readiness improves because the organisation can explain what changed, who authorised it, how it affected behaviour, and what corrective action is justified.
Detailed link to RAIDT
Prompt registry links to RAIDT in four ways.
First, it supports RAIDT's core idea that governance should attach to a concrete run rather than to abstract claims about safe or responsible AI use.
Second, it links the run to the governed prompt artefact that shaped the run's behaviour and therefore strengthens run-level evidence.
Third, it improves both the evidence pack and the score profile because reviewers can inspect prompt provenance, change control, and approval status rather than relying on unsupported assertions.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning by allowing prompt changes to be examined as explicit governance decisions.
Prompt registry -> Prompt identity and change control -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness
Link to the five RAIDT pillars
Responsibility
The prompt registry clarifies who is accountable for prompt design, revision, approval, and deployment. It prevents prompt wording from becoming an unowned source of behavioural change.
Example evidence / implication:
- Named prompt owner, approver, or responsible team attached to the template record.
- Recorded change rationale showing why a risky or consequential wording alteration was introduced.
Auditability
This item strongly affects Auditability because it creates a structured trail that reviewers can inspect. A run can be audited more credibly when the prompt used is not only visible but institutionally identifiable.
Example evidence / implication:
- Version history showing when the prompt changed and which runs used which version.
- Approval status and review notes indicating whether the prompt was operational, experimental, or deprecated.
Interpretability
The prompt registry improves interpretability by helping reviewers understand why the system responded in a particular way. Prompt instructions shape framing, style, constraints, and evidential posture.
Example evidence / implication:
- Prompt text linked to task framing such as summarise, classify, advise, or draft.
- Change notes explaining why an instruction about tone, certainty, or escalation was added or removed.
Dependability
The registry supports Dependability indirectly but meaningfully. Controlled prompt change reduces unexplained behavioural drift and supports more stable operational performance across repeated runs.
Example evidence / implication:
- Records showing that prompt changes were tested before adoption in routine use.
- Evidence that deprecated prompts are not still circulating in local copies or informal workflows.
Traceability
This item strongly affects Traceability because it connects prompt artefacts to specific runs, timestamps, and related evidence objects. It allows a reviewer to trace from output back to prompt provenance.
Example evidence / implication:
- Prompt ID, version, and hash linked from the run record to the registry entry.
- Cross-reference from the registry to related items such as task label, tool-chain trace, and output hash.
If one pillar is most strongly influenced here, it is Auditability and Traceability, with Responsibility close behind.
Why this item is more than a generic concept
In general AI governance, a prompt registry may simply mean a managed repository of reusable prompts. In RAIDT, it means a run-linked governance artefact that supports evidence, reconstruction, and scoring. The RAIDT meaning is more operational because it is tied to the exact conditions under which a system run occurred and to the evidence pack used for review.
That difference matters. A generic registry helps teams organise assets. A RAIDT prompt registry helps organisations justify decisions, interpret behaviour, challenge outcomes, and learn from changes in prompt design over time.
Common misunderstanding
Misunderstanding
A prompt registry is just an internal folder where teams keep useful prompts.
Correction
A folder may support storage, but it does not by itself provide governance. A registry requires controlled identity, version history, ownership, status, and change rationale. For example, if two analysts use slightly different copies of what they both call the "fraud triage prompt", a shared folder may not reveal that difference clearly enough for audit or review. A registry does, because it distinguishes the approved prompt artefact from local variants and links each run to the exact governed version used.
Boundary and limitation
A prompt registry does not prove that a prompt is good, fair, lawful, or effective. It does not replace task evaluation, model evaluation, human oversight, or domain-specific safeguards. It also does not eliminate the possibility that an operator modifies prompt text outside approved channels or that downstream tool or retrieval changes alter outcomes in ways the registry alone cannot explain.
Its value depends on disciplined implementation. If prompts are not actually linked to runs, if informal copies proliferate, or if version records are incomplete, the registry becomes a symbolic control rather than a working governance mechanism. RAIDT handles this limitation by placing the registry inside a wider evidence architecture that also includes prompt IDs, hashes, run IDs, timestamps, model identifiers, tool traces, retrieved documents, output hashes, and review notes.
Implementation levels
Manual implementation
A researcher or small team can maintain a spreadsheet or structured note containing prompt name, prompt text, owner, date created, current version, status, and reason for each revision. Each run record then cites the relevant prompt entry manually.
Semi-automated implementation
A semi-automated approach uses templates, metadata forms, version-controlled markdown or JSON records, and lightweight review workflows. Prompt IDs, hashes, and status fields can be generated automatically while approval and rationale remain human-entered.
Fully automated implementation
At scale, the registry is implemented through a platform, orchestration layer, or governance pipeline that stores prompt artefacts centrally, issues versioned identifiers, calculates hashes, enforces status controls, links prompt versions to runs automatically, and exposes registry data to dashboards, evidence-pack generation, and scoring workflows.
Practical use in the RAIDT project
Within the RAIDT project, this item is useful in several connected ways. In Paper 08 Foundations, it helps explain why prompts must be treated as inspectable artefacts rather than ephemeral text. In Paper 09 Empirical Validation, it supports analysis of whether prompt control improves reviewer confidence, reconstruction quality, and cross-run comparison. In Paper 10 Policy Pathways, it provides a concrete governance mechanism that organisations can adopt without waiting for a fully standardised external regime.
It also helps with sector playbooks because prompt sensitivity varies by domain. In healthcare, public services, law, and cybersecurity, prompt wording can alter risk posture in ways that matter for escalation, documentation, and defensibility. For supervision meetings, viva defence, and journal positioning, the prompt registry gives a clear example of RAIDT's broader argument: governance becomes stronger when evidence is attached to actual runs and to the artefacts that shape them.
Key audience questions to prepare for
Q1. Is a prompt registry mainly a technical convenience or a governance control?
It is a governance control. Its technical form may look like a repository, but its real purpose is to make prompt-driven behaviour attributable, reviewable, and comparable across runs.
Q2. Why not rely on standard software version control alone?
Standard version control is useful, but it may not capture operational status, intended task domain, approval logic, or direct links from prompt versions to run-level evidence. RAIDT needs those governance connections, not only code history.
Q3. Does every small prompt edit need to be tracked?
Not every edit carries equal governance significance, but any edit that can materially influence output behaviour, risk posture, or reviewer interpretation should be tracked as a governed change.
Q4. How does this differ from prompt ID and prompt hash?
The registry is the broader governance structure. Prompt ID and version identify a prompt instance, and the prompt hash helps verify exact content integrity. The registry is the place where those elements are organised, contextualised, and controlled.
Q5. What happens if staff improvise prompts outside the registry?
That does not make the registry irrelevant; it reveals a control gap. RAIDT makes such departures visible by comparing the governed artefact with the actual run evidence and by recording exceptions for review.
Suggested citation concepts to support this item
- prompt management as governance artefact in generative AI
- prompt version control and auditability
- provenance and traceability of prompts in large language model systems
- configuration management for AI system behaviour
- human-AI interaction and prompt sensitivity in organisational settings
- documentation practices for foundation model deployment
- model behaviour variation induced by prompt wording
- AI audit trails and evidence-based governance
- socio-technical accountability for prompt engineering
- operational controls for enterprise generative AI use
Short explanation for presentation
A prompt registry is the structured record that treats prompts as governed artefacts rather than disposable text. In RAIDT, that matters because a prompt can materially alter how a model frames a task, what kind of answer it produces, and how reviewers later interpret the run. Saving prompt text alone is not enough. We also need prompt identity, version history, ownership, approval status, and change rationale so that a specific run can be connected to a specific governed prompt artefact. This strengthens the run-level evidence pack, supports scoring across the RAIDT pillars, and improves audit readiness. In short, the prompt registry helps RAIDT show not just that a model was used, but which governed instructions shaped that use and whether those instructions were properly controlled.
One-line takeaway
Prompt registry is the governed record of prompt artefacts because RAIDT can only assess prompt-shaped behaviour credibly when prompts are linked to run-level evidence.
Related items in evidence architecture and artefacts
- S4.01 ? run_id
- S4.02 ? Timestamp
- S4.03 ? User role / operator role
- S4.04 ? Task and domain label
- S4.06 ? Prompt ID and version
- S4.07 ? Prompt hash
- S4.08 ? Model/provider/version identifier
- S4.09 ? Decoding parameters
- S4.10 ? Retrieval query and index ID
- S4.11 ? Retrieved document IDs and hashes
- S4.12 ? Tool-chain trace
- S4.13 ? Adapter ID / PEFT lineage
- S4.14 ? Alignment policy ID
- S4.15 ? Output hash
- S4.16 ? Review decision and reviewer notes
- ? and 1 more