S6.03 - Prompting

S6.03 ? Prompting

flowchart LR
    A[Traditional view: prompting as craft] --> C[RAIDT - run-level evidence framework]
    B[Problem: hidden instructions weaken reconstruction] --> C
    C --> D[[Prompting - governed instruction layer]]
    D --> E[Prompt version and rationale]
    D --> F[Evidence pack]
    D --> G[RAIDT score profile]
    D --> H[Reviewability and contestability]
    F --> I[Reviewer reconstruction]
    G --> J[Governance readiness]
    H --> K[Organisational learning]
    L[Healthcare summarisation] --> D
    M[Public-service triage] --> D
    N[Legal drafting] --> D
    O[Enterprise productivity] --> D
    P[Compliance support] --> D

? Star S6 - Influence Methods as Governance Interventions

Star context: Prompting sits in Star S6 as the most immediate influence method at the instruction layer. It helps shape model behaviour within a run, while RAIDT treats that influence as something to be logged, reviewed, compared with alternatives such as RAG or PEFT/LoRA, and governed as evidence rather than presented as the project core.

Academic picture

Definition / background

Prompting is the practice of shaping a generative model's behaviour through instructions, constraints, examples, formatting cues, role specifications, and other textual or structured inputs supplied at inference time. In everyday AI use, prompting is often discussed as a technique for obtaining clearer, safer, or more useful outputs. In RAIDT, that basic idea is retained, but it is reframed as a governance-relevant intervention at the level of the run.

This distinction matters. RAIDT treats a run as one configured use of a generative AI system for a specific task, at a specific time, in a specific context. The prompt is therefore not a peripheral convenience. It is one of the most direct ways in which human intent is translated into model behaviour during that run. A change in wording, structure, role framing, or output constraints can materially alter what the system produces, how dependable that production is, and how easily the result can later be reviewed.

Prompting is conceptually different from neighbouring items in Star S6. It is not the same as RAG, which changes the evidential context available to the model; it is not the same as PEFT or LoRA, which alter model adaptation; and it is not the same as RLHF- or DPO-type controls, which shape behaviour through training or preference optimisation. Prompting operates at the instruction layer. Precisely because it is lightweight, fast, and widely used, it requires governance discipline rather than casual treatment.

Within RAIDT, prompting belongs inside the evidence pack because the prompt helps explain why an output appeared, whether the run was responsibly configured, and whether another reviewer could reconstruct the decision pathway. It also matters for the five-pillar score profile. Prompt quality and prompt traceability can influence Responsibility through intent and safeguards, Auditability through documentation, Interpretability through explicit task framing, Dependability through consistency of outputs, and Traceability through version control and run reconstruction.

Why this concept matters

Prompting matters because many organisational GenAI failures are not caused only by model choice. They also arise from weak instructions, ambiguous task framing, missing constraints, or undocumented changes to prompt templates. If those elements are invisible, governance discussions stay at the level of principle and cannot explain what was actually done in a particular run.

This concept helps avoid two common confusions. First, it avoids treating prompting as a purely creative optimisation exercise with no accountability implications. Second, it avoids overstating prompting as if it were sufficient on its own to guarantee quality, safety, or compliance. RAIDT places prompting in the correct middle ground: important enough to govern, but not enough to stand alone.

If prompting is missing from the governance picture, organisations cannot reliably compare runs, understand why performance shifts occurred, or justify why a system was configured in a particular way. That weakens reviewability, makes contestation harder, and reduces the value of any audit trail. By contrast, when prompting is logged and assessed as a governed artefact, prompt choices can be scrutinised alongside outputs, reviewer decisions, and broader operational context.

Key idea: Prompting matters in RAIDT because instruction choices shape run behaviour and therefore must become evidence, not hidden craft knowledge.

What this item controls

The task framing given to the model, including scope, purpose, and intended output.
The behavioural constraints applied within a run, such as tone, exclusions, safety boundaries, and required formatting.
The role or perspective the model is asked to adopt, including domain-specific framing and audience adaptation.
The degree of structure imposed on outputs, for example templates, fields, JSON schemas, checklists, or decision rationales.
The insertion of variables, retrieved context, or examples into reusable prompt templates.
The comparability of runs when prompt variants are tested, revised, approved, or challenged.
The visibility of human intent in the evidence pack and the ease with which a reviewer can reconstruct the run.

Practical example / likely audience question

Audience question

Why not make prompt engineering the project if prompting so strongly affects model outputs?

Answer

The concern behind this question is understandable: prompt wording often has visible effects on output quality, so it can appear to be the main lever that matters in practice. The problem is that this view confuses one intervention with the governance framework needed to evaluate interventions. RAIDT is not a method for writing better prompts in isolation. It is a framework for producing run-level evidence about how a generative AI system was configured, how it behaved, and how that behaviour should be assessed.

A practical example makes the distinction clear. Suppose a team improves an output by changing a prompt from a vague instruction to a structured template with explicit constraints. That is useful, but on its own it still leaves unanswered governance questions: who changed the prompt, when, why, against what baseline, with what effect on the five pillars, and with what residual risks? RAIDT answers those questions because it evaluates prompting as one governed intervention among several.

This is where RAIDT is stronger than generic AI governance language. A generic approach may say that prompts should be documented or that prompt engineering is important. RAIDT goes further by tying prompts to a specific run, preserving prompt versions in the evidence pack, linking them to scores and reviewer observations, and making them available for later reconstruction, contestation, and organisational learning.

Practical example in RAIDT terms

In healthcare, consider a hospital team using a generative AI system to draft discharge-summary explanations for patients. A run-level issue emerges when one prompt asks the model to "make the explanation simple and brief" while another prompt requires the model to "preserve medication changes, follow-up actions, and explicit warning signs". The first version may produce a smoother summary but omit clinically important detail. In RAIDT terms, the evidence needed includes the exact prompt text, prompt template version, system instruction, inserted patient-context fields, model/version, time of use, reviewer comments, and any corrections made after review. The most affected pillars are Responsibility, Interpretability, Dependability, and Traceability. Prompting improves governance readiness here because the organisation can show not only the final output, but the governed instruction choice that shaped it and the reason that choice was accepted, revised, or rejected.

Detailed link to RAIDT

Prompting links to RAIDT in four ways.

First, it links to RAIDT's core idea because RAIDT governs configured uses of generative AI rather than abstract model claims. Prompting is one of the clearest configuration choices within a run.

Second, it links directly to the run and run-level evidence because prompt wording, prompt structure, and prompt version are part of what must be recorded if a reviewer is to understand how the run was set up.

Third, it links to the evidence pack and score profile because prompt design affects what evidence is available, how easy the run is to interpret, and how consistently the system behaves across repeated or comparable tasks.

Fourth, it links to reviewability, contestability, audit readiness, and organisational learning because governed prompts allow teams to reconstruct decisions, compare alternatives, explain changes, and improve future practice on the basis of evidence rather than memory.

Prompting -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness

Link to the five RAIDT pillars

Prompting has especially strong direct effects on Auditability, Interpretability, and Traceability, but it also materially influences Responsibility and Dependability.

Responsibility

Prompting contributes to Responsibility because it expresses the immediate human intention behind a run. It shows what the organisation asked the model to do, what boundaries were imposed, and whether the instructions reflected responsible use for the task and context.