S6.02 - Baseline_prompting
S6.02 ? Baseline prompting
flowchart LR
A1[Minimal prompt use] --> B[RAIDT - run-level evidence framework]
A2[No comparison point] --> B
A3[Claims of improvement without evidence] --> B
B --> C[[Baseline prompting - reference condition]]
C --> D[Run-level evidence pack]
C --> E[Five-pillar score profile]
C --> F[Reviewer reconstruction]
C --> G[Organisational learning]
C --> H[Governance move: evidence over assertion]
H --> I[Audit readiness and contestability]
J[Healthcare drafting] --> C
K[Finance compliance] --> C
L[Public services] --> C
M[Education support] --> C
N[Enterprise productivity] --> C? Star S6 - Influence Methods as Governance Interventions
Star context: Positions baseline prompting as the reference condition against which stronger influence methods can be assessed, so that RAIDT can show how governance readiness changes when additional controls are introduced.
Academic picture
Definition / background
Baseline prompting is the minimally enhanced prompt condition used as the reference point for evaluating a generative AI run before more explicit governance interventions are added. In practice, it is the prompt configuration that captures what the system would do under ordinary or default instruction conditions, without the extra scaffolding introduced by structured prompting, role specification, retrieval augmentation, or fine-tuned intervention layers.
Conceptually, baseline prompting inherits the logic of a control condition from experimental design. The point is not that the baseline is perfect, neutral, or context-free. The point is that it provides a stable comparison state from which a researcher, auditor, or governance reviewer can judge whether later interventions produce meaningful improvements in output quality, interpretability, safety, dependability, or traceability.
In generative AI governance, this matters because organisations frequently describe control measures without showing what those measures changed. A baseline prompt allows the evaluator to distinguish between performance that was already achievable with a simple instruction and performance that genuinely depends on a stronger governance mechanism. This reduces overclaiming and improves analytic discipline.
Within RAIDT, baseline prompting belongs inside the framework because RAIDT treats the run as the unit of governance. A run-level evidence pack is more defensible when it can show the unenhanced prompt condition alongside the governed condition, explain the differences in evidence, and connect those differences to the five-pillar score profile. Baseline prompting therefore supports comparison, attribution, and reviewability rather than acting as an end-state governance solution.
Baseline prompting also differs from related terms. It is not identical to zero-shot prompting, because a baseline may still contain a short task instruction or local formatting request. It is not the same as prompting in general, because its role is comparative rather than simply operational. In RAIDT terms, baseline prompting is the reference condition that helps establish whether additional influence methods are improving governance readiness or merely adding procedural complexity.
Why this concept matters
Baseline prompting matters because governance claims are weak when they lack a comparator. If an organisation says that structured prompting, RAG, or another intervention improves responsible use, the obvious follow-up question is: compared with what? A baseline prompt answers that question by establishing the simplest documented run condition against which changes can be assessed.
This avoids a common confusion in AI governance, namely the assumption that any visible control automatically creates better governance. Some controls improve performance, some improve documentation, and some do both, but without a baseline the organisation cannot demonstrate which effect has occurred. Baseline prompting therefore prevents unsupported assertions about improvement.
If baseline prompting is missing, two risks appear. First, governance teams may mistake ordinary model capability for the effect of a specific intervention. Second, they may underestimate the residual risks that remain even after a control is added, because they do not know what the uncontrolled or minimally controlled condition looked like. In both cases, the result is weaker audit readiness and weaker organisational learning.
For organisations using generative AI in real work, baseline prompting helps translate abstract governance principles into operational comparison. It supports evidence-led review, helps supervisors and examiners understand what changed between runs, and gives a practical basis for improving governance interventions over time.
Key idea: Baseline prompting matters because it gives RAIDT a documented comparison point from which governance improvement can be evidenced rather than merely claimed.
What this item enables
- A documented reference condition for comparing governed and less-governed runs.
- More credible claims about the value of structured prompting, RAG, PEFT/LoRA, or other influence methods.
- Clearer attribution of changes in output quality, reliability, and traceability.
- Stronger run-level evidence packs because the pre-intervention condition is visible.
- Better scoring discipline across the five RAIDT pillars by reducing unsupported assumptions.
- Organisational learning about where prompting alone is sufficient and where stronger controls are necessary.
Practical example / likely audience question
Audience question
Why include baseline?
Answer
The concern behind this question is usually that a baseline appears too simple to be useful. The direct answer is that baseline prompting is useful precisely because it is simple: it reveals what the model can do before governance scaffolding is added, and therefore shows what extra intervention is genuinely contributing.
For example, suppose a team introduces a structured prompt template for drafting compliance summaries. If they only evaluate the templated version, they may conclude that the template produces trustworthy outputs. However, without a baseline they cannot tell whether the trustworthiness came from the template itself, from the underlying model, or from reviewer correction after the fact. A documented baseline run makes this comparison visible.
RAIDT handles this better than a generic AI governance approach because it ties the comparison to a specific run, a specific context, and a specific evidence pack. The question is not only whether a prompt seems better in general, but whether the evidence at run level shows a measurable governance improvement that can be reviewed, contested, and repeated.
Practical example in RAIDT terms
Consider a healthcare administration use case in which a generative AI tool drafts patient discharge letters for clinician review. A baseline-prompting run uses a simple instruction such as: draft a discharge letter from the supplied notes. The run-level issue is that the output may be fluent but inconsistent in structure, may omit explanation of uncertainty, and may not make source dependence visible.
In RAIDT terms, the evidence needed would include the exact baseline prompt, the model and version used, the task context, the input notes available to the model, the output produced, reviewer comments, and a comparison against a more controlled run such as a structured prompt or provenance-first RAG configuration. The pillars most affected are Dependability, Interpretability, and Traceability, with Responsibility and Auditability also engaged once reviewers must justify deployment decisions.
Baseline prompting improves governance readiness here because it shows the organisation what the model does under minimal instruction before stronger safeguards are introduced. That evidence helps determine whether later controls genuinely improve consistency, explanation quality, traceability to source material, and reviewer confidence, rather than simply making the workflow look more formal.
Detailed link to RAIDT
Baseline prompting links to RAIDT in four ways.
First, it supports RAIDT's core idea that governance should be grounded in evidence about actual configured uses of generative AI rather than general claims about models.
Second, it links directly to the run because the baseline must be specified for a particular task, time, model configuration, and organisational context.
Third, it strengthens both the evidence pack and the score profile by making it possible to compare a minimally controlled condition with a more governed one.
Fourth, it improves reviewability, contestability, audit readiness, and organisational learning because reviewers can see what changed, why it changed, and whether the change improved governance performance.
Baseline prompting ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness
The chain matters because baseline prompting is the starting comparator that turns later intervention claims into assessable governance evidence.
Link to the five RAIDT pillars
Responsibility
Baseline prompting helps clarify what level of human and organisational responsibility is needed when the system is used with minimal control. It exposes whether governance claims rely too heavily on trust in the model rather than accountable oversight.
Example evidence / implication:
- Documentation of who approved use of the baseline condition and for what task.
- Evidence that reviewers understood the risks of using a minimally structured prompt.
Auditability
Baseline prompting improves auditability because it gives auditors a reference run that can be reconstructed and compared with later governed runs. Without this baseline, claims about improvement are harder to test.
Example evidence / implication:
- Stored record of the exact baseline instruction, run metadata, and resulting output.
- Comparison notes showing what changed after a governance intervention was applied.
Interpretability
Baseline prompting often reveals where outputs are hard to interpret because the model has not been guided to explain structure, uncertainty, or reasoning boundaries. This makes interpretability gaps visible.
Example evidence / implication:
- Reviewer notes identifying ambiguous wording or unexplained output structure in the baseline run.
- Comparative evidence showing whether later prompting improves clarity and explanation quality.
Dependability
Baseline prompting is particularly important for dependability because it shows how consistent or inconsistent the model is before stronger controls are introduced. It should not be assumed to be dependable merely because it can produce plausible text.
Example evidence / implication:
- Repeated baseline runs showing variance in completeness, format, or factual stability.
- Error patterns that justify escalation to stronger controls for high-stakes tasks.
Traceability
Baseline prompting often affects traceability most strongly when outputs are generated without explicit links to sources, provenance constraints, or retrieval boundaries. It therefore helps identify where traceability is weakest.
Example evidence / implication:
- Evidence that the baseline output cannot be cleanly traced back to authoritative input materials.
- Comparative records showing that provenance-focused interventions improve traceability beyond the baseline.
Baseline prompting affects all five pillars, but its strongest practical value is often in Auditability, Dependability, and Traceability because these are the areas where the absence of a comparator most clearly weakens governance claims.
Why this item is more than a generic concept
In general AI governance, baseline prompting may simply mean the first or simplest prompt used in testing. In RAIDT, it means a documented run-level comparator that helps determine whether a governance intervention has produced observable improvement in evidence, scoring, and reviewability.
The RAIDT meaning is more operational because it is not satisfied by an informal example prompt. The baseline has to be tied to a specific run, preserved as evidence, and used in relation to the evidence pack and five-pillar profile. This turns baseline prompting from a casual evaluation habit into a governance-relevant method for comparison and judgement.
Common misunderstanding
Misunderstanding
Baseline prompting is just a weak prompt, so it has little governance value.
Correction
The governance value of baseline prompting lies in its comparative role, not in its sophistication. A baseline is useful because it shows what happens before stronger controls are introduced. For instance, if a public-sector summarisation tool appears reliable under a structured prompt, that is informative only if reviewers can compare it with a baseline run and demonstrate what the structure improved. Without that baseline, the organisation cannot clearly attribute the governance benefit.
Boundary and limitation
Baseline prompting does not by itself make a system governed, safe, or suitable for high-stakes deployment. It does not prove that the minimally controlled condition is acceptable, nor does it replace domain review, policy controls, provenance measures, or monitoring. In some cases, especially in high-risk settings, a baseline may reveal such poor performance that it should be treated only as an evaluation reference and not as a deployment option.
It may also fail if the baseline is defined inconsistently across runs, if reviewers quietly add undocumented instructions, or if comparison conditions are not kept stable enough for interpretation. RAIDT handles this limitation by requiring run-specific documentation, comparison-ready evidence, and explicit links between prompting conditions and pillar-level assessment.
Implementation levels
Manual implementation
A researcher or small team can implement baseline prompting manually by storing the exact minimal prompt used for a task, capturing the output, and comparing it with a more controlled run using a simple review template.
Semi-automated implementation
Semi-automated implementation can use metadata fields, prompt templates, evaluation sheets, and evidence-pack forms that require a baseline condition to be recorded before stronger interventions are assessed.
Fully automated implementation
At scale, a platform or orchestration layer can automatically log the baseline prompt, model version, task metadata, outputs, reviewer actions, and comparison metrics, then feed those records into a governance dashboard or RAIDT scoring pipeline.
Practical use in the RAIDT project
Within the RAIDT project, baseline prompting is useful in Paper 08 Foundations because it clarifies why governance interventions should be assessed against a documented reference condition rather than described abstractly. In Paper 09 Empirical Validation, it provides a practical comparison state for testing whether structured prompting or other influence methods improve run-level scores and reviewer confidence. In Paper 10 Policy Pathways, it helps explain to organisations and policymakers why evidence of improvement requires a baseline and not merely a claim of good practice.
It also supports sector playbooks by showing practitioners how to compare minimally controlled use with more governed use in context. For the evidence pack and scoring rubric, baseline prompting provides a defensible starting point for judgement. For supervisor explanation, viva defence, and journal positioning, it helps articulate that RAIDT evaluates governance interventions empirically at the level of real runs rather than relying on principle statements alone.
Key audience questions to prepare for
Q1. Is baseline prompting the same as zero-shot prompting?
No. A baseline may be zero-shot, but the key feature is not shot count. The key feature is that it acts as the minimally enhanced comparison condition against which stronger governance interventions are assessed.
Q2. Why not evaluate only the best prompt configuration?
Because governance requires evidence about improvement, not only evidence about peak performance. Without a baseline, it is difficult to show what the intervention changed or whether the added complexity is justified.
Q3. Does baseline prompting matter in high-risk settings if it would never be deployed as-is?
Yes. Even if the baseline condition would never be used operationally, it still matters as an evaluation reference because it reveals the size and nature of the governance gap that stronger controls must close.
Q4. What kind of evidence should accompany a baseline prompt in RAIDT?
At minimum, the exact prompt text, run metadata, model configuration, input context, output, reviewer observations, and comparison with a more governed condition should be preserved so that the effect of the intervention can be evaluated.
Q5. What does baseline prompting add to a governance discussion that principles alone do not?
It adds empirical comparison. Principles can state that governance controls are desirable, but baseline prompting helps show whether a particular control changed run-level evidence in a way that improves auditability, dependability, or traceability.
Suggested citation concepts to support this item
- baseline condition in prompt engineering evaluation
- control condition in human-centred AI assessment
- comparative evaluation of prompting strategies in large language models
- prompt sensitivity and output variability in generative AI
- run-level auditing of generative AI systems
- evidence-based AI governance and reviewability
- traceability and provenance in organisational GenAI use
- reproducibility and audit trails for language model deployments
- governance evaluation of retrieval and prompting interventions
- operationalising responsible AI through documented workflow evidence
Short explanation for presentation
Baseline prompting is the reference condition used to show what a generative AI system produces before stronger governance interventions are added. In RAIDT, that matters because the framework is not satisfied by broad claims that a control improves responsible use. It asks what happened in a specific run, under a specific prompt condition, with what evidence. A baseline prompt therefore gives us a disciplined comparison point. It helps us distinguish ordinary model capability from the effect of structured prompting, RAG, or other influence methods. That strengthens the evidence pack, improves the credibility of the five-pillar score profile, and makes review more contestable and auditable. In short, baseline prompting is valuable in RAIDT because it turns claims of improvement into documented comparisons at run level.
One-line takeaway
Baseline prompting is the minimally enhanced reference condition because RAIDT needs a run-level comparator to evidence governance improvement.
Related items in influence methods as governance interventions
- S6.01 ? Governance interventions
- S6.03 ? Prompting
- S6.04 ? Structured prompting
- S6.05 ? Role-based prompting
- S6.06 ? Zero-shot prompting
- S6.07 ? Chain-of-thought controlled use
- S6.08 ? RAG
- S6.09 ? Provenance-first RAG
- S6.10 ? PEFT / LoRA
- S6.11 ? Adapter lineage
- S6.12 ? RLHF-type / DPO controls
- S6.13 ? Stacked influence
Anchored questions
- Audience question: Why include baseline? Answer: it shows what is missing when no structured governance intervention is applied.