S6.03 - Prompting

S6.03 ? Prompting

flowchart LR
    A[Traditional view: prompting as craft] --> C[RAIDT - run-level evidence framework]
    B[Problem: hidden instructions weaken reconstruction] --> C
    C --> D[[Prompting - governed instruction layer]]
    D --> E[Prompt version and rationale]
    D --> F[Evidence pack]
    D --> G[RAIDT score profile]
    D --> H[Reviewability and contestability]
    F --> I[Reviewer reconstruction]
    G --> J[Governance readiness]
    H --> K[Organisational learning]
    L[Healthcare summarisation] --> D
    M[Public-service triage] --> D
    N[Legal drafting] --> D
    O[Enterprise productivity] --> D
    P[Compliance support] --> D

? Star S6 - Influence Methods as Governance Interventions

Star context: Prompting sits in Star S6 as the most immediate influence method at the instruction layer. It helps shape model behaviour within a run, while RAIDT treats that influence as something to be logged, reviewed, compared with alternatives such as RAG or PEFT/LoRA, and governed as evidence rather than presented as the project core.


Academic picture
Definition / background

Prompting is the practice of shaping a generative model's behaviour through instructions, constraints, examples, formatting cues, role specifications, and other textual or structured inputs supplied at inference time. In everyday AI use, prompting is often discussed as a technique for obtaining clearer, safer, or more useful outputs. In RAIDT, that basic idea is retained, but it is reframed as a governance-relevant intervention at the level of the run.

This distinction matters. RAIDT treats a run as one configured use of a generative AI system for a specific task, at a specific time, in a specific context. The prompt is therefore not a peripheral convenience. It is one of the most direct ways in which human intent is translated into model behaviour during that run. A change in wording, structure, role framing, or output constraints can materially alter what the system produces, how dependable that production is, and how easily the result can later be reviewed.

Prompting is conceptually different from neighbouring items in Star S6. It is not the same as RAG, which changes the evidential context available to the model; it is not the same as PEFT or LoRA, which alter model adaptation; and it is not the same as RLHF- or DPO-type controls, which shape behaviour through training or preference optimisation. Prompting operates at the instruction layer. Precisely because it is lightweight, fast, and widely used, it requires governance discipline rather than casual treatment.

Within RAIDT, prompting belongs inside the evidence pack because the prompt helps explain why an output appeared, whether the run was responsibly configured, and whether another reviewer could reconstruct the decision pathway. It also matters for the five-pillar score profile. Prompt quality and prompt traceability can influence Responsibility through intent and safeguards, Auditability through documentation, Interpretability through explicit task framing, Dependability through consistency of outputs, and Traceability through version control and run reconstruction.

Why this concept matters

Prompting matters because many organisational GenAI failures are not caused only by model choice. They also arise from weak instructions, ambiguous task framing, missing constraints, or undocumented changes to prompt templates. If those elements are invisible, governance discussions stay at the level of principle and cannot explain what was actually done in a particular run.

This concept helps avoid two common confusions. First, it avoids treating prompting as a purely creative optimisation exercise with no accountability implications. Second, it avoids overstating prompting as if it were sufficient on its own to guarantee quality, safety, or compliance. RAIDT places prompting in the correct middle ground: important enough to govern, but not enough to stand alone.

If prompting is missing from the governance picture, organisations cannot reliably compare runs, understand why performance shifts occurred, or justify why a system was configured in a particular way. That weakens reviewability, makes contestation harder, and reduces the value of any audit trail. By contrast, when prompting is logged and assessed as a governed artefact, prompt choices can be scrutinised alongside outputs, reviewer decisions, and broader operational context.

Key idea: Prompting matters in RAIDT because instruction choices shape run behaviour and therefore must become evidence, not hidden craft knowledge.

What this item controls
Practical example / likely audience question

Audience question

Why not make prompt engineering the project if prompting so strongly affects model outputs?

Answer

The concern behind this question is understandable: prompt wording often has visible effects on output quality, so it can appear to be the main lever that matters in practice. The problem is that this view confuses one intervention with the governance framework needed to evaluate interventions. RAIDT is not a method for writing better prompts in isolation. It is a framework for producing run-level evidence about how a generative AI system was configured, how it behaved, and how that behaviour should be assessed.

A practical example makes the distinction clear. Suppose a team improves an output by changing a prompt from a vague instruction to a structured template with explicit constraints. That is useful, but on its own it still leaves unanswered governance questions: who changed the prompt, when, why, against what baseline, with what effect on the five pillars, and with what residual risks? RAIDT answers those questions because it evaluates prompting as one governed intervention among several.

This is where RAIDT is stronger than generic AI governance language. A generic approach may say that prompts should be documented or that prompt engineering is important. RAIDT goes further by tying prompts to a specific run, preserving prompt versions in the evidence pack, linking them to scores and reviewer observations, and making them available for later reconstruction, contestation, and organisational learning.

Practical example in RAIDT terms

In healthcare, consider a hospital team using a generative AI system to draft discharge-summary explanations for patients. A run-level issue emerges when one prompt asks the model to "make the explanation simple and brief" while another prompt requires the model to "preserve medication changes, follow-up actions, and explicit warning signs". The first version may produce a smoother summary but omit clinically important detail. In RAIDT terms, the evidence needed includes the exact prompt text, prompt template version, system instruction, inserted patient-context fields, model/version, time of use, reviewer comments, and any corrections made after review. The most affected pillars are Responsibility, Interpretability, Dependability, and Traceability. Prompting improves governance readiness here because the organisation can show not only the final output, but the governed instruction choice that shaped it and the reason that choice was accepted, revised, or rejected.

Detailed link to RAIDT

Prompting links to RAIDT in four ways.

First, it links to RAIDT's core idea because RAIDT governs configured uses of generative AI rather than abstract model claims. Prompting is one of the clearest configuration choices within a run.

Second, it links directly to the run and run-level evidence because prompt wording, prompt structure, and prompt version are part of what must be recorded if a reviewer is to understand how the run was set up.

Third, it links to the evidence pack and score profile because prompt design affects what evidence is available, how easy the run is to interpret, and how consistently the system behaves across repeated or comparable tasks.

Fourth, it links to reviewability, contestability, audit readiness, and organisational learning because governed prompts allow teams to reconstruct decisions, compare alternatives, explain changes, and improve future practice on the basis of evidence rather than memory.

Prompting -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness

Link to the five RAIDT pillars

Prompting has especially strong direct effects on Auditability, Interpretability, and Traceability, but it also materially influences Responsibility and Dependability.

Responsibility

Prompting contributes to Responsibility because it expresses the immediate human intention behind a run. It shows what the organisation asked the model to do, what boundaries were imposed, and whether the instructions reflected responsible use for the task and context.

Example evidence / implication:

Auditability

Prompting is central to Auditability because auditors cannot meaningfully review a run if they do not know what the model was instructed to do. Prompt records make the run inspectable rather than opaque.

Example evidence / implication:

Interpretability

Prompting supports Interpretability by making the requested task framing explicit. A reviewer can better understand why an output took a particular form when the governing instruction layer is visible.

Example evidence / implication:

Dependability

Prompting affects Dependability because instruction quality can improve or reduce consistency across comparable runs. Poorly specified prompts often create unstable behaviour even when the model remains the same.

Example evidence / implication:

Traceability

Prompting is integral to Traceability because the prompt is part of the causal path from user intent to system output. Without prompt records, a run cannot be fully reconstructed.

Example evidence / implication:

Why this item is more than a generic concept

In general AI governance, prompting may be treated as a usability skill, a prompt-engineering tactic, or a practical means of improving output quality. In RAIDT, prompting means a governed intervention at the run level whose wording, structure, provenance, and revision history can be examined as evidence.

The RAIDT meaning is more operational because it does not stop at saying prompts matter. It asks how prompts are logged, how prompt variants are compared, how prompt changes influence the evidence pack, and how prompt governance affects the five-pillar profile. That shift from advice to evidence is what makes the concept fit the RAIDT project.

Common misunderstanding

Misunderstanding

If a team writes down a prompt somewhere, prompting has already been governed sufficiently.

Correction

A prompt written in isolation is not yet governed in a RAIDT sense. Governance requires the prompt to be tied to a specific run, version, model context, inserted variables, reviewer process, and outcome. For example, a compliance team may keep a generic prompt template in a document, but if a live run silently adds different policy excerpts or changes the system instruction, the actual intervention is no longer the same as the one recorded. RAIDT corrects this by treating the effective prompt configuration as evidence, not merely the nominal template text.

Boundary and limitation

Prompting does not prove that an output is true, safe, fair, lawful, or fit for purpose. A well-written prompt can still produce a weak result if the model is unsuitable, the task is underspecified, the context is poor, or the surrounding workflow is flawed. Prompting also does not replace other influence methods such as retrieval, model adaptation, evaluation, or human review.

Its effectiveness may fail when prompts are unstable across users, when system-level instructions are hidden, when context windows truncate important content, or when model updates change behaviour without visible prompt edits. RAIDT handles these limitations by placing prompting inside a larger run-level evidence framework. The prompt is therefore assessed alongside outputs, model metadata, reviewer observations, complementary interventions, and the five-pillar score profile rather than treated as a self-sufficient control.

Implementation levels

Manual implementation

A researcher or small team can implement prompt governance manually by storing the exact prompt text used in each run, naming prompt versions consistently, recording the task context, and noting why a prompt was chosen or changed. Even a simple evidence table or structured note can improve reviewability if prompt artefacts are captured reliably.

Semi-automated implementation

Semi-automated implementation adds templates, metadata fields, and structured review steps. Prompt forms, reusable prompt libraries, prompt IDs, and run logs can support more consistent recording while still allowing human reviewers to approve prompt changes and comment on observed effects.

Fully automated implementation

At scale, a platform or orchestration layer can automatically capture system prompts, user prompts, injected variables, retrieval snippets, model/version metadata, timestamps, reviewer actions, and prompt lineage. Dashboards or governance pipelines can then compare prompt variants across runs, link prompts to evidence packs, and feed results into RAIDT scoring and organisational reporting.

Practical use in the RAIDT project

Within the RAIDT project, prompting helps clarify that influence methods are governed components rather than the central theory of governance. In Paper 08 Foundations, it can be positioned as a direct instruction-layer intervention that must be evidenced at run level. In Paper 09 Empirical Validation, prompt variants can be compared to show how changes in instructions alter evidence quality, reviewer confidence, and five-pillar scores. In Paper 10 Policy Pathways, prompting can be translated into practical control language for organisations that need auditable procedures rather than generic advice. The same item is also useful in sector playbooks, scoring-rubric design, evidence-pack templates, supervisor discussions, viva defence, and journal positioning because it gives a precise answer to the question of how human instruction becomes governable evidence in real GenAI use.

Key audience questions to prepare for

Q1. Is prompting just another name for telling the model what to do?

At a basic level, yes, but RAIDT adds the governance layer. Prompting is not only the act of instructing the model; it is the documented instruction configuration that shaped a specific run and can therefore be reviewed as evidence.

Q2. Why is prompting not enough on its own?

Because output quality and governance quality depend on more than wording. Retrieval context, model choice, adaptation methods, reviewer checks, workflow controls, and organisational purpose all matter. RAIDT treats prompting as one influence mechanism inside that broader evidence framework.

Q3. What should be captured as evidence for prompting?

At minimum, the effective prompt text, system instruction, prompt version, key variables inserted into the template, time of use, model/version, and reviewer comments should be captured. Without these elements, later reconstruction is weak.

Q4. How does prompting relate to the RAIDT score profile?

Prompting affects the score profile because it changes how visible intent is, how reproducible the run is, how interpretable the output structure becomes, and how dependable comparable runs are. It therefore has direct implications for several pillars, especially Auditability, Interpretability, and Traceability.

Q5. What is the strongest supervisory argument for including prompting in RAIDT?

The strongest argument is that prompting is one of the most immediate and variable configuration choices in real organisational GenAI use. If RAIDT ignored it, the framework would miss a major part of what actually shapes run behaviour and would weaken its claim to produce reviewable run-level evidence.

Suggested citation concepts to support this item
Short explanation for presentation

Prompting is the instruction layer through which people shape what a generative AI system does in a particular run. In RAIDT, that matters because the run is the unit of governance. A prompt is therefore not just a clever piece of text for getting better answers; it is part of the evidence needed to explain why an output was produced, whether the configuration was responsible, and whether another reviewer could reconstruct the run. This is why RAIDT treats prompting as a governance intervention rather than as the whole project. Prompting can improve quality, but it can also introduce ambiguity, inconsistency, or hidden risk if it is not logged, versioned, and reviewed. RAIDT makes prompting operational by tying it to evidence packs, five-pillar scoring, and audit readiness.

One-line takeaway

Prompting is the governed instruction layer of a run because RAIDT turns prompt choices into reviewable evidence for scoring, reconstruction, and governance readiness.

Mentioned in reference-paper summaries (5)

Paper summaries live in Port/93-References/pdf_summaries/. Each file listed below contains the key term at least once.

Related items in influence methods as governance interventions
Anchored questions
Powered by Forestry.md