S9.11 - Post-market_monitoring

S9.11 ? Post-market monitoring

flowchart LR
    A[Post-deployment problems:
drift, weak evidence, incidents,
changing prompts, false assurance] --> B[RAIDT:
run-level evidence framework] H[Practical fields:
healthcare, public services,
enterprise drafting, dashboards] --> C[[Post-market monitoring]] B --> C C --> D[Evidence pack refresh
and completeness checks] C --> E[Score profile trends
across five pillars] C --> F[Reviewer reconstruction,
escalation, intervention] D --> G[Governance readiness,
organisational learning,
policy alignment] E --> G F --> G

? Star S9 - Policy, Standards and Assurance

Star context: Connects RAIDT to policy instruments, standards, assurance, procurement, audit and organisational accountability, with post-market monitoring showing how governance continues after deployment through structured evidence, review, and corrective action.


Academic picture
Definition / background

Post-market monitoring is the ongoing observation and review of a system after it has entered operational use. In conventional governance language, it refers to the set of activities through which an organisation watches for performance changes, emerging harms, compliance failures, and shifts in real-world conditions after deployment. In the context of generative AI, the idea is especially important because system behaviour is shaped not only by the model itself, but also by prompts, wrappers, users, connected data sources, workflow pressures, and changing institutional expectations.

Within RAIDT, post-market monitoring is not limited to broad service-level surveillance. It is anchored in the run as the unit of governance. A run is one configured use of a GenAI system for a specific task, at a specific time, in a specific context. Monitoring therefore means examining whether runs continue to produce evidence that is sufficiently complete, whether scores shift across the five pillars, whether exceptions recur, and whether incidents or near misses reveal weaknesses in governance controls.

This distinguishes post-market monitoring from adjacent terms such as incident response, performance evaluation, or periodic audit. Incident response is typically triggered by a specific adverse event. Audit may occur at defined intervals and often reconstructs what has already happened. Performance evaluation may focus narrowly on output quality. Post-market monitoring in RAIDT is broader and more continuous: it tracks the evolving condition of use, the sufficiency of run-level evidence, and the stability of the governance claim attached to ongoing deployment.

It belongs inside RAIDT because RAIDT aims to move GenAI governance from principle statements toward reviewable operational evidence. A run-level evidence pack can show what happened in one instance; post-market monitoring shows whether many instances, over time, still justify trust, approval, and organisational reliance. The score profile then becomes more than a static rating. It becomes a monitored signal of governance quality across Responsibility, Auditability, Interpretability, Dependability, and Traceability.

Why this concept matters

Many organisations can describe how a GenAI system was approved, but far fewer can show how they know it remains acceptable in use. That gap is precisely where post-market monitoring matters. It solves the problem of governance decay: controls that looked credible at launch may weaken as prompts change, staff adapt the workflow, evidence capture becomes incomplete, or operational demands encourage shortcuts.

The concept also avoids a common confusion between deployment and assurance. Deployment is the point at which a tool enters practice. Assurance is the continuing ability to demonstrate that its use remains governed. Without monitoring, organisations often rely on narrative reassurance, occasional anecdotes, or delayed incident reporting. With monitoring, they can track trends, investigate deviations, compare runs over time, and decide whether retraining, redesign, tighter controls, or withdrawal are necessary.

For GenAI in organisational work, this matters because uncertainty is not confined to the development phase. A model may behave acceptably in testing but fail under real prompts, real time pressure, real users, and real institutional consequences. RAIDT operationalises this by linking monitoring to concrete evidence packs and score movements rather than leaving it as an abstract compliance aspiration.

Key idea: Post-market monitoring matters because RAIDT turns continuing oversight after deployment into structured, reviewable evidence about whether real-world use still deserves trust.

What this item controls
Practical example / likely audience question

Audience question

If RAIDT already creates an evidence pack for each run, why is post-market monitoring still needed?

Answer

The concern behind this question is that one might treat run-level documentation as sufficient in itself. The direct answer is that individual evidence packs are necessary but not sufficient for lifecycle governance. A single run can be well documented while the broader pattern of use is deteriorating. For example, an organisation may show good evidence for sampled runs, yet over several months the proportion of incomplete traces rises, reviewers stop recording overrides, and incident reports begin to cluster around a particular task type.

Post-market monitoring addresses that wider temporal pattern. It asks whether evidence quality is stable, whether pillar scores are moving, whether recurrent exceptions indicate control weakness, and whether new contexts of use are emerging without corresponding governance updates. RAIDT handles this better than a generic AI governance approach because it does not rely only on annual review or policy declaration. It can inspect repeated runs, compare evidence over time, and connect observed changes directly to evidence packs, score profiles, and intervention decisions.

Practical example in RAIDT terms

Consider a public service team using a generative AI assistant to draft responses to citizen enquiries. Each run produces a draft letter, supporting trace data, user edits, and reviewer sign-off. Early pilots show acceptable performance, so the organisation adopts the tool more widely.

The run-level issue appears three months later. Staff begin reusing unofficial prompt templates to save time, some reviews become superficial during peak periods, and several drafted responses start overstating entitlement conditions. No single run looks catastrophic in isolation, but the pattern is changing. RAIDT-based post-market monitoring would require evidence on prompt versions, reviewer actions, override frequency, missing fields in the evidence pack, incidents or complaints, and score trends across affected tasks.

The most affected pillars are Responsibility, Dependability, and Traceability, with secondary effects on Auditability and Interpretability. Monitoring improves governance readiness because it lets the organisation show not only what one run looked like, but how the system behaves as a governed socio-technical process over time. That is precisely the difference between static compliance and operational assurance.

Detailed link to RAIDT

Post-market monitoring links to RAIDT in four ways.

First, it extends RAIDT's core idea that responsible GenAI governance should be grounded in evidence rather than principle-only claims.
Second, it uses the run as the observational unit, allowing governance actors to inspect how specific uses accumulate into broader operational patterns.
Third, it turns evidence packs and score profiles into longitudinal governance instruments by showing whether evidence quality and pillar performance remain stable over time.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning because changes in use can be reconstructed, discussed, and acted upon.

Post-market monitoring -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness

In other words, RAIDT does not treat monitoring as a separate downstream compliance activity. It embeds monitoring into the same evidence architecture that supports assessment, explanation, and intervention.

Link to the five RAIDT pillars

Responsibility

Post-market monitoring supports Responsibility by checking whether the organisation continues to exercise accountable oversight after deployment rather than assuming that approval was enough.

Example evidence / implication:

Auditability

This item has a strong effect on Auditability because monitoring depends on preserving enough structure to inspect changes over time and reconstruct why decisions about continued deployment were made.

Example evidence / implication:

Interpretability

Monitoring contributes to Interpretability by revealing where users or reviewers repeatedly struggle to understand outputs, confidence boundaries, or appropriate use conditions.

Example evidence / implication:

Dependability

This item has a particularly strong effect on Dependability because post-market monitoring helps detect reliability decline, context drift, and recurring failure modes after deployment.

Example evidence / implication:

Traceability

Post-market monitoring strengthens Traceability by ensuring that changing prompts, configurations, reviewers, and operational contexts remain visible over time.

Example evidence / implication:

Why this item is more than a generic concept

In general AI governance, post-market monitoring may simply mean watching for problems after release and producing periodic assurance reports. In RAIDT, it means something more operational and more defensible. Monitoring is tied to run-level evidence, evidence-pack completeness, score movements, and explicit governance interventions.

That shift matters because RAIDT narrows the distance between observation and accountability. Instead of merely stating that the organisation monitors deployed AI, RAIDT provides a basis for showing what was monitored, which runs or clusters of runs were affected, how the evidence changed, and what follow-up action resulted. The concept therefore becomes a practical governance mechanism rather than a broad managerial intention.

Common misunderstanding

Misunderstanding

Post-market monitoring is just incident logging after something has already gone wrong.

Correction

Incident logging is only one part of the picture. Post-market monitoring also tracks weak signals before major harm occurs, such as declining evidence completeness, rising override rates, prompt drift, reviewer inconsistency, or repeated low-confidence use in unsuitable contexts. For example, if a hospital department notices that clinicians increasingly rewrite AI-generated discharge summaries, monitoring should treat that as an early governance signal even if no formal patient safety incident has yet been recorded. RAIDT makes this practical because those signs can be tied back to specific runs, evidence gaps, and changing pillar scores.

Boundary and limitation

Post-market monitoring does not prove that a system is universally safe, fair, or lawful. It also does not replace careful design, pre-deployment evaluation, human oversight, or formal audit. Monitoring can fail when evidence capture is weak, when staff bypass logging, when review thresholds are poorly calibrated, or when organisations collect data but do not act on what it shows.

RAIDT handles this limitation by treating monitoring as one component of a broader evidence framework. The value of monitoring depends on disciplined evidence grammar, meaningful thresholds, credible review processes, and organisational willingness to intervene. In short, monitoring can reveal changing governance conditions, but it cannot compensate for absent accountability or poor implementation.

Implementation levels

Manual implementation

A researcher or small team can implement post-market monitoring manually by sampling completed runs, checking evidence-pack completeness, recording notable incidents or near misses, and reviewing score changes in a spreadsheet or note template. Even at this level, the key requirement is consistency in what is reviewed and how deviations are documented.

Semi-automated implementation

A semi-automated approach can use structured metadata, templated evidence forms, scheduled review prompts, and dashboards that aggregate run statistics. This makes it easier to detect missing evidence, recurring exceptions, or score drift without requiring fully bespoke governance infrastructure.

Fully automated implementation

At scale, post-market monitoring can be embedded in a wrapper, orchestration layer, or governance pipeline that automatically logs run metadata, captures prompt and output versions, flags policy-sensitive patterns, tracks score trends, and routes anomalies to reviewers. In this form, RAIDT becomes a continuous monitoring architecture in which operational use, evidence generation, and governance intervention are tightly linked.

Practical use in the RAIDT project

This item is important across the RAIDT project because it connects the framework to lifecycle governance rather than one-off assessment. In Paper 08 Foundations, it helps justify why run-level evidence must remain useful after deployment, not only at the point of design. In Paper 09 Empirical Validation, it provides a basis for examining whether RAIDT can surface drift, weak controls, or improvement opportunities in real organisational settings. In Paper 10 Policy Pathways, it links RAIDT to the policy expectation that AI governance should continue across the operational lifecycle.

It is also relevant to sector playbooks and governance interventions because different settings will need different monitoring thresholds, sampling frequencies, and escalation rules. For supervision meetings, viva defence, and journal positioning, this item helps explain that RAIDT is not merely a scoring model. It is a governance architecture for sustained oversight, evidence accumulation, and contestable organisational learning.

Key audience questions to prepare for

Q1. How is post-market monitoring different from ordinary system maintenance?

Ordinary maintenance focuses on keeping a system functioning. Post-market monitoring focuses on whether the use of the system remains governable, evidenced, and acceptable in practice. In RAIDT, the difference is visible through run-level evidence, score trends, and review triggers.

Q2. Why not rely on annual audit instead of continuous monitoring?

Annual audit is valuable, but it is periodic and retrospective. Post-market monitoring provides earlier visibility into drift, recurrent exceptions, and weakening controls. RAIDT uses this to support intervention before governance failure becomes entrenched.

Q3. Does monitoring create excessive administrative burden?

It can, if implemented badly. RAIDT addresses this by structuring evidence around the run and by allowing monitoring to scale from manual sampling to automated dashboards. The aim is proportionate evidence, not indiscriminate data collection.

Q4. What exactly is being monitored in RAIDT?

RAIDT monitors more than output quality. It monitors evidence completeness, pillar scores, prompts and configuration changes, review actions, override patterns, incidents, and the suitability of a system for a given task context over time.

Q5. Why is this especially important for generative AI?

Generative AI is highly context-sensitive and can degrade through workflow change, prompt drift, user adaptation, or policy misalignment even when the underlying model is unchanged. Post-market monitoring is therefore essential for showing that operational use remains justified.

Suggested citation concepts to support this item
Short explanation for presentation

Post-market monitoring is the part of RAIDT that keeps governance active after deployment. Instead of assuming that a generative AI system remains acceptable because it passed an initial review, RAIDT checks whether real runs continue to produce complete evidence, stable scores, and manageable risk signals over time. This matters because organisational use changes: prompts drift, reviewers behave differently, workloads increase, and incidents may appear gradually rather than all at once. By tying monitoring to run-level evidence packs and the five-pillar score profile, RAIDT makes lifecycle governance practical. It allows supervisors, auditors, and organisations to see whether a system is still being used responsibly, not just whether it once looked compliant at launch.

One-line takeaway

Post-market monitoring is the continuing review of deployed GenAI use because RAIDT ties that review to run-level evidence, score trends, and governance readiness over time.

Related items in policy, standards and assurance
Anchored questions
Mentioned in reference-paper summaries (2)

Paper summaries live in Port/93-References/pdf_summaries/. Each file listed below contains the key term at least once.

Powered by Forestry.md