S9.11 - Post-market_monitoring

S9.11 ? Post-market monitoring

flowchart LR
    A[Post-deployment problems:
drift, weak evidence, incidents,
changing prompts, false assurance] --> B[RAIDT:
run-level evidence framework]
    H[Practical fields:
healthcare, public services,
enterprise drafting, dashboards] --> C[[Post-market monitoring]]
    B --> C
    C --> D[Evidence pack refresh
and completeness checks]
    C --> E[Score profile trends
across five pillars]
    C --> F[Reviewer reconstruction,
escalation, intervention]
    D --> G[Governance readiness,
organisational learning,
policy alignment]
    E --> G
    F --> G

? Star S9 - Policy, Standards and Assurance

Star context: Connects RAIDT to policy instruments, standards, assurance, procurement, audit and organisational accountability, with post-market monitoring showing how governance continues after deployment through structured evidence, review, and corrective action.

Academic picture

Definition / background

Post-market monitoring is the ongoing observation and review of a system after it has entered operational use. In conventional governance language, it refers to the set of activities through which an organisation watches for performance changes, emerging harms, compliance failures, and shifts in real-world conditions after deployment. In the context of generative AI, the idea is especially important because system behaviour is shaped not only by the model itself, but also by prompts, wrappers, users, connected data sources, workflow pressures, and changing institutional expectations.

Within RAIDT, post-market monitoring is not limited to broad service-level surveillance. It is anchored in the run as the unit of governance. A run is one configured use of a GenAI system for a specific task, at a specific time, in a specific context. Monitoring therefore means examining whether runs continue to produce evidence that is sufficiently complete, whether scores shift across the five pillars, whether exceptions recur, and whether incidents or near misses reveal weaknesses in governance controls.

This distinguishes post-market monitoring from adjacent terms such as incident response, performance evaluation, or periodic audit. Incident response is typically triggered by a specific adverse event. Audit may occur at defined intervals and often reconstructs what has already happened. Performance evaluation may focus narrowly on output quality. Post-market monitoring in RAIDT is broader and more continuous: it tracks the evolving condition of use, the sufficiency of run-level evidence, and the stability of the governance claim attached to ongoing deployment.

It belongs inside RAIDT because RAIDT aims to move GenAI governance from principle statements toward reviewable operational evidence. A run-level evidence pack can show what happened in one instance; post-market monitoring shows whether many instances, over time, still justify trust, approval, and organisational reliance. The score profile then becomes more than a static rating. It becomes a monitored signal of governance quality across Responsibility, Auditability, Interpretability, Dependability, and Traceability.

Why this concept matters

Many organisations can describe how a GenAI system was approved, but far fewer can show how they know it remains acceptable in use. That gap is precisely where post-market monitoring matters. It solves the problem of governance decay: controls that looked credible at launch may weaken as prompts change, staff adapt the workflow, evidence capture becomes incomplete, or operational demands encourage shortcuts.

The concept also avoids a common confusion between deployment and assurance. Deployment is the point at which a tool enters practice. Assurance is the continuing ability to demonstrate that its use remains governed. Without monitoring, organisations often rely on narrative reassurance, occasional anecdotes, or delayed incident reporting. With monitoring, they can track trends, investigate deviations, compare runs over time, and decide whether retraining, redesign, tighter controls, or withdrawal are necessary.

For GenAI in organisational work, this matters because uncertainty is not confined to the development phase. A model may behave acceptably in testing but fail under real prompts, real time pressure, real users, and real institutional consequences. RAIDT operationalises this by linking monitoring to concrete evidence packs and score movements rather than leaving it as an abstract compliance aspiration.

Key idea: Post-market monitoring matters because RAIDT turns continuing oversight after deployment into structured, reviewable evidence about whether real-world use still deserves trust.

What this item controls

The continuity of governance after a GenAI system moves from approval into routine use.
The completeness and freshness of run-level evidence collected during operational deployment.
The detection of score drift across Responsibility, Auditability, Interpretability, Dependability, and Traceability.
The escalation path from weak signals and near misses to formal review or intervention.
The distinction between isolated acceptable runs and stable, governable use over time.
The organisational capacity to justify continued deployment to reviewers, auditors, procurers, or regulators.

Practical example / likely audience question

Audience question

If RAIDT already creates an evidence pack for each run, why is post-market monitoring still needed?

Answer

The concern behind this question is that one might treat run-level documentation as sufficient in itself. The direct answer is that individual evidence packs are necessary but not sufficient for lifecycle governance. A single run can be well documented while the broader pattern of use is deteriorating. For example, an organisation may show good evidence for sampled runs, yet over several months the proportion of incomplete traces rises, reviewers stop recording overrides, and incident reports begin to cluster around a particular task type.

Post-market monitoring addresses that wider temporal pattern. It asks whether evidence quality is stable, whether pillar scores are moving, whether recurrent exceptions indicate control weakness, and whether new contexts of use are emerging without corresponding governance updates. RAIDT handles this better than a generic AI governance approach because it does not rely only on annual review or policy declaration. It can inspect repeated runs, compare evidence over time, and connect observed changes directly to evidence packs, score profiles, and intervention decisions.

Practical example in RAIDT terms

Consider a public service team using a generative AI assistant to draft responses to citizen enquiries. Each run produces a draft letter, supporting trace data, user edits, and reviewer sign-off. Early pilots show acceptable performance, so the organisation adopts the tool more widely.

The run-level issue appears three months later. Staff begin reusing unofficial prompt templates to save time, some reviews become superficial during peak periods, and several drafted responses start overstating entitlement conditions. No single run looks catastrophic in isolation, but the pattern is changing. RAIDT-based post-market monitoring would require evidence on prompt versions, reviewer actions, override frequency, missing fields in the evidence pack, incidents or complaints, and score trends across affected tasks.

The most affected pillars are Responsibility, Dependability, and Traceability, with secondary effects on Auditability and Interpretability. Monitoring improves governance readiness because it lets the organisation show not only what one run looked like, but how the system behaves as a governed socio-technical process over time. That is precisely the difference between static compliance and operational assurance.

Detailed link to RAIDT

Post-market monitoring links to RAIDT in four ways.

First, it extends RAIDT's core idea that responsible GenAI governance should be grounded in evidence rather than principle-only claims.
Second, it uses the run as the observational unit, allowing governance actors to inspect how specific uses accumulate into broader operational patterns.
Third, it turns evidence packs and score profiles into longitudinal governance instruments by showing whether evidence quality and pillar performance remain stable over time.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning because changes in use can be reconstructed, discussed, and acted upon.

Post-market monitoring -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness

In other words, RAIDT does not treat monitoring as a separate downstream compliance activity. It embeds monitoring into the same evidence architecture that supports assessment, explanation, and intervention.

Link to the five RAIDT pillars

Responsibility

Post-market monitoring supports Responsibility by checking whether the organisation continues to exercise accountable oversight after deployment rather than assuming that approval was enough.