Q020 - What_is_the_compliance-measurement_gap_in_GenAI_governance

Q020 — What is the compliance-measurement gap in GenAI governance?

← RAIDT · Star C0 - RAIDT Core, Definition, Values, Claims and Innovation · primary item: C0.06 · Governance readiness

Organisations face compliance pressure, but often cannot measure whether one run is genuinely audit-ready.

Appears in sources
Answer

The compliance-measurement gap in GenAI governance is the disconnect between what governance instruments expect organisations to demonstrate and what many current practices can actually measure at the level of a real use event. The papers argue that contemporary regimes increasingly expect demonstrable documentation, oversight, and traceability in operation, not only declarations of principle. Yet many responsible AI approaches remain principle-heavy but operationally thin: they describe desirable properties such as transparency or accountability without specifying a standard run-level evidence object that can be inspected later. As a result, organisations may have policies, model cards, or episodic audits, but still lack a way to prove what happened in one specific run.

For GenAI this gap is intensified by run-time variability and configuration sensitivity. Prompts, tool chains, retrieved context, PEFT or LoRA adapters, and preference-based alignment can materially alter outputs, but they are often not captured in a replayable record. This means compliance claims cannot easily be translated into measurement. RAIDT addresses the gap by defining the run as the unit of governance, requiring a run-level evidence pack, and assessing the five pillars (Responsibility, Auditability, Interpretability, Dependability, Traceability) through evidence rather than narrative assurance. In effect, it provides a measurement object that turns broad compliance expectations into inspectable run-level artefacts and comparable scoring.

Practical example

A bank may state that its GenAI-assisted adverse-action explanations are transparent and accountable. However, if it cannot show the exact prompt template, model deployment, decision criteria version, and review checks used for one disputed letter, those compliance claims are difficult to measure in practice. The organisation has policy language, but no replayable proof object for the contested use.

Under RAIDT, the same case would be assembled into a run-level evidence pack and scored. The bank could then measure whether Interpretability and Traceability are actually present for that specific run, instead of relying on a general policy statement about responsible use.

Sources in RAIDT papers
Powered by Forestry.md