S2.06 - Continuous_improvement

S2.06 ? Continuous improvement

flowchart LR
    A[Static governance problem
post-hoc review only
recurring evidence gaps] --> B[RAIDT
run-level evidence framework] B --> C[[Continuous improvement
evidence-guided revision of future runs]] C --> D[Evidence pack strengthened] C --> E[Score profile becomes actionable] C --> F[Reviewer reconstruction improved] C --> G[Organisational learning] C --> H[Governance readiness] I[Healthcare triage support] --> C J[Public-service case summaries] --> C K[Finance drafting] --> C L[Education feedback] --> C M[Cybersecurity analysis] --> C

? Star S2 - Governance Meaning and Problem Context

Star context: Clarifies governance as oversight, control, accountability, reviewability, contestability and continuous improvement, so RAIDT treats governance as an evidence-guided organisational practice rather than a vague ethics label.


Academic picture
Definition / background

Continuous improvement in RAIDT means using run-level evidence to refine how generative AI is configured, governed, reviewed, and used over time. In practical terms, weak scores, recurring evidence gaps, repeated reviewer concerns, or unstable task performance should trigger a change in the socio-technical arrangement around a run: for example, better prompt design, stricter logging, stronger retrieval governance, revised human-review checkpoints, clearer role accountability, or targeted user training.

The concept has roots in quality improvement, audit learning, safety management, and organisational learning, but RAIDT gives it a more specific governance meaning. It is not simply the idea that systems should get better. It is the claim that improvement should be traceable to documented runs, evidenced weaknesses, and defensible governance interventions. This matters because many AI-governance discussions stop at principles, while RAIDT asks what an organisation can actually review, reconstruct, challenge, and improve after a specific use of a GenAI system.

Inside RAIDT, continuous improvement belongs naturally with run-level evidence, evidence packs, and score profiles. A run-level evidence pack makes weaknesses visible in context. The five-pillar score profile makes patterns comparable across runs. Continuous improvement is the organisational response that converts those findings into better future practice. In that sense, it closes the loop between evidence collection and governance action.

It also differs from adjacent ideas such as optimisation or model fine-tuning. Continuous improvement in RAIDT may involve model or prompt changes, but it can also involve process redesign, reviewer assignment, access restrictions, escalation criteria, provenance capture, or documentation standards. The focus is therefore not only system performance, but governance readiness.

Why this concept matters

Continuous improvement matters because governance fails if evidence is collected but never used. An organisation may produce logs, scores, and review comments, yet still repeat the same weaknesses if there is no structured pathway from findings to intervention. RAIDT avoids that failure by treating review outputs as inputs to future governance design.

This concept also prevents a common confusion in GenAI governance: the assumption that auditability is enough on its own. Auditability tells an organisation what happened and how to inspect it. Continuous improvement adds the next step, namely how documented weaknesses alter future runs. Without that step, governance remains descriptive rather than corrective.

For organisations using GenAI in meaningful work, the absence of continuous improvement creates practical risk. Known prompt failures persist, documentation gaps recur, weak review rules remain unaddressed, and pillar scores become a reporting ritual instead of a governance mechanism. Continuous improvement makes RAIDT operational because it converts run-level evidence into concrete changes in controls, practices, and organisational learning.

Key idea: Continuous improvement matters in RAIDT because governance should not end with documenting a run; it should use run-level evidence to improve the next run.

What this item enables
Practical example / likely audience question

Audience question

Is RAIDT only post-hoc audit?

Answer

No. The concern behind the question is that a run-level evidence framework might appear to operate only after the event, producing documentation about completed use without changing future behaviour. RAIDT does include post-hoc review, but it is not limited to it. Its design makes review actionable.

A weak score or recurring evidence gap can justify a specific intervention before the next comparable run takes place. For example, if a drafting assistant repeatedly produces unsupported claims because retrieval provenance is missing, RAIDT does not merely record that weakness. It can justify a change in retrieval rules, a new requirement for source capture, an added reviewer checkpoint, or a narrower task boundary for future use.

This is stronger than a generic AI-governance approach because generic frameworks often identify principles such as accountability or safety without specifying the operational unit through which learning occurs. RAIDT uses the run as that unit. That makes the pathway from evidence to improvement much more concrete, reviewable, and governable.

Practical example in RAIDT terms

Consider a public-service team using a GenAI assistant to draft case summaries for housing-support assessments. In one run, the system produces a fluent summary, but the reviewer notices that important claimant circumstances were compressed and the provenance of key statements is unclear. The run-level issue is not merely that the output could have been better; it is that the evidence pack shows weak source traceability, an incomplete review note, and an over-confident final summary.

The evidence needed includes the prompt, the input materials used, retrieval or source references, timestamps, reviewer comments, the final edited output, and the RAIDT pillar scores for that run. The most affected pillars are Auditability, Traceability, and Dependability, with Responsibility also implicated because reviewer roles and escalation thresholds may need tightening.

Continuous improvement then appears as a governance response. The organisation updates the summarisation prompt, requires explicit citation of case-file passages, adds a mandatory reviewer checklist for vulnerable-case indicators, and records that future runs of the same task should trigger escalation when provenance is incomplete. Governance readiness improves because the next run is better structured for review, challenge, and safe organisational use.

Detailed link to RAIDT

Continuous improvement links to RAIDT in four ways.

First, it supports RAIDT's core idea that responsible GenAI governance should be based on evidence rather than assertion. Improvement becomes credible only when it is tied to documented weaknesses in actual runs.

Second, it depends on the run as the unit of governance. RAIDT does not ask organisations to improve AI use in the abstract; it asks them to examine a specific configured use at a specific time in a specific context, and to learn from that case.

Third, it uses RAIDT outputs directly. The evidence pack captures what happened, while the score profile helps identify where governance weaknesses are concentrated. Together, they provide a practical basis for redesigning prompts, workflows, review rules, and accountability arrangements.

Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning. When a reviewer, supervisor, or external stakeholder asks what changed after a weak run, RAIDT can show not only the evidence but also the improvement response.

Continuous improvement -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness

Link to the five RAIDT pillars

Continuous improvement affects all five RAIDT pillars, but it has especially strong implications for Auditability, Dependability, and Traceability because those pillars often reveal where organisational adjustment is needed.

Responsibility

Continuous improvement supports Responsibility by making it clear who must act when weaknesses are found and who owns the resulting change.

Example evidence / implication:

Auditability

Continuous improvement strengthens Auditability because an auditor can see not only the original run, but also the documented response to its weaknesses.

Example evidence / implication:

Interpretability

Continuous improvement supports Interpretability when confusing outputs, opaque reasoning, or unclear prompt effects lead to clearer instructions, explanation requirements, or reviewer guidance.

Example evidence / implication:

Dependability

Continuous improvement is central to Dependability because repeated weaknesses should lead to interventions that stabilise future performance.

Example evidence / implication:

Traceability

Continuous improvement strengthens Traceability by making provenance gaps visible and by driving better capture of inputs, sources, transformations, and review steps.

Example evidence / implication:

Why this item is more than a generic concept

In general AI governance, continuous improvement may simply mean updating policy over time, learning from incidents, or periodically revising controls. In RAIDT, it has a more operational meaning: improvement is tied to the evidence produced by specific runs and to the governance signals generated by evidence packs and score profiles.

That RAIDT meaning is more concrete because it defines what improvement should be responding to. It is not an aspiration to improve eventually. It is a structured response to identifiable weaknesses in prompts, documentation, review processes, retrieval settings, output quality, or accountability arrangements. This makes the concept far more usable in supervision, evaluation, and organisational implementation.

Common misunderstanding

Misunderstanding

Continuous improvement means the model itself is automatically getting better after every run.

Correction

Not necessarily. In RAIDT, continuous improvement refers to improving the governance of use, not just the technical system. A weak run may justify a model change, but it may equally justify a tighter workflow, a clearer reviewer instruction, a stronger provenance requirement, or a narrower permitted use case. For example, if a legal-drafting assistant produces plausible but weakly sourced text, the most effective improvement may be mandatory citation checks and reviewer sign-off rather than changing the underlying model.

Boundary and limitation

Continuous improvement does not prove that a system is safe, fair, or compliant in every future context. It also does not replace formal assurance, domain regulation, expert oversight, or deeper causal analysis of why failures occur. An organisation can document lessons learned yet still implement weak or superficial changes.

The concept also depends on conditions that may not always hold. Improvement requires reasonably good evidence capture, meaningful review, and organisational willingness to act on findings. If runs are poorly documented, if scores are inconsistent, or if known weaknesses are not escalated, then the improvement loop becomes symbolic rather than effective.

RAIDT handles this limitation by tying improvement to evidence quality, reviewer reconstruction, and repeated scoring across runs. In other words, RAIDT does not assume that learning has happened; it creates a structure in which learning can be evidenced, challenged, and reviewed.

Implementation levels

Manual implementation

A researcher or small team can apply continuous improvement manually by reviewing completed runs, noting recurring evidence gaps, comparing pillar scores across similar tasks, and maintaining a simple improvement log that records what changed before the next run.

Semi-automated implementation

Semi-automated implementation can use structured templates, metadata fields, scoring rubrics, and review dashboards to flag low-scoring runs or repeated weaknesses. This supports more consistent detection of improvement targets while keeping humans responsible for judgment and intervention design.

Fully automated implementation

At scale, a platform or governance pipeline can automatically collect run metadata, generate evidence-pack components, surface recurring low-score patterns, trigger policy workflows, and require sign-off before a modified configuration is reused. In this model, orchestration layers, logging systems, and governance dashboards make continuous improvement systematic across teams and use cases.

Practical use in the RAIDT project

In Paper 08 Foundations, this item helps explain that RAIDT is not only a taxonomy of governance concepts but a framework for evidence-based learning across runs. In Paper 09 Empirical Validation, it supports analysis of whether weak scores and evidence gaps actually identify actionable improvement points in practice. In Paper 10 Policy Pathways, it helps position RAIDT as a bridge between high-level governance expectations and operational change mechanisms.

The item is also useful in sector playbooks because supervisors, reviewers, and practitioners often ask what organisations are meant to do once a weak run has been identified. Continuous improvement provides that answer. It supports the evidence pack by turning findings into interventions, the scoring rubric by making low scores actionable, and governance interventions by showing how RAIDT can influence prompts, workflows, review design, and escalation rules.

For viva defence and journal positioning, this concept is especially valuable because it demonstrates that RAIDT is not merely diagnostic. It offers a structured account of how organisations can learn from documented GenAI use and improve governance readiness over time.

Key audience questions to prepare for

Q1. How is continuous improvement different from ordinary quality assurance?

Ordinary quality assurance may review outputs or processes in general terms. RAIDT makes improvement more specific by tying it to a documented run, an evidence pack, and a pillar-based score profile. That gives improvement a clearer evidential basis and makes it easier to justify, compare, and audit.

Q2. Does continuous improvement only happen after failures?

No. Failures are one trigger, but so are weak scores, recurring ambiguities, incomplete provenance, near misses, or repeated reviewer friction. RAIDT supports improvement before serious harm occurs by treating these signals as governance-relevant evidence.

Q3. Why not just improve the model and leave governance aside?

Because many important weaknesses are socio-technical rather than purely model-based. Prompt design, source selection, reviewer capacity, task boundaries, and escalation rules often determine whether GenAI use is governable in practice. RAIDT captures these wider factors.

Q4. How can you show that improvement actually occurred?

You can compare successive runs, document prompt or policy changes, record revised review procedures, and examine whether pillar scores or evidence quality improve over time. RAIDT makes those comparisons more defensible because it structures what evidence is collected for each run.

Q5. Why is continuous improvement important for responsible AI claims?

Because responsible-AI claims are weak if they cannot show what the organisation changed after identifying weaknesses. Continuous improvement turns responsibility from a statement of intent into a documented pattern of learning and intervention.

Suggested citation concepts to support this item
Short explanation for presentation

Continuous improvement in RAIDT means that evidence from one GenAI run should improve the governance of the next one. If a run shows weak provenance, poor reviewer reconstruction, unstable output quality, or low pillar scores, RAIDT treats those findings as triggers for action rather than as static observations. The organisation can then revise prompts, strengthen review rules, improve documentation, tighten task boundaries, or add training and escalation steps. This matters because governance is weak if it only records problems after the fact. RAIDT makes continuous improvement operational by linking it to the run as the unit of analysis, the evidence pack as the record of what happened, and the score profile as a structured signal of weakness. In short, it closes the loop between evaluation and organisational learning.

One-line takeaway

Continuous improvement is the evidence-guided revision of future GenAI use because RAIDT turns run-level findings into governance action.

Related items in governance meaning and problem context
Anchored questions
Powered by Forestry.md