S9.06 - Evidence_grammar

S9.06 ? Evidence grammar

flowchart LR
    A[Fragmented governance evidence
policies, logs, checklists, review notes] --> B[RAIDT
run-level evidence framework]
    B --> C[[Evidence grammar
shared evidential syntax for a governed run]]
    H[Prompt ID] --> C
    I[Model or system version] --> C
    J[Retrieval snapshot] --> C
    K[Output hash or ID] --> C
    L[Reviewer decision] --> C
    M[Audit and policy tools] --> C
    C --> D[Evidence pack]
    C --> E[RAIDT score profile]
    C --> F[Reviewer reconstruction]
    C --> G[Governance readiness
reviewability, contestability, audit readiness]
    D --> N[Policy and standards alignment]
    E --> N
    F --> G

? Star S9 - Policy, Standards and Assurance

Star context: Connects RAIDT to policy instruments, standards, assurance, procurement, audit and organisational accountability by giving them a common evidential language at run level.

Academic picture

Definition / background

Evidence grammar is the shared structure used to express, relate, and interpret governance-relevant evidence about a generative AI run. The term grammar is useful because it implies more than a list of data fields. A grammar defines what counts as a valid evidential statement, how different elements connect, and how a reviewer can move from a governance claim to the underlying artefacts that support or challenge it.

In RAIDT, evidence grammar specifies the minimum meaningful syntax for run-level governance. It links claims such as "this run was reviewed", "this output used approved retrieval inputs", or "this result was released under human oversight" to concrete fields such as prompt ID, model version, retrieval snapshot, output identifier, timestamp, reviewer identity or role, review decision, and justification. That shared syntax matters because otherwise evidence remains local, inconsistent, and difficult to compare across teams, tools, or policy regimes.

This concept differs from nearby ideas. A taxonomy classifies categories. An ontology defines conceptual relationships. A template provides a document format. Evidence grammar is narrower and more operational: it defines how governance claims are expressed in relation to verifiable run artefacts. Within RAIDT, this matters because the run is the unit of governance, the evidence pack is the unit of review, and the score profile is the unit of organisational interpretation. Evidence grammar is what makes those parts connect coherently.

Why this concept matters

Generative AI governance often fails not because organisations have no evidence, but because they have no common structure for interpreting it. Different teams may log different fields, describe the same event in incompatible ways, or record review decisions without enough context to reconstruct what was assessed. In that situation, policies appear stronger than the evidence base that supports them.

Evidence grammar addresses that problem by creating a stable evidential language across runs, systems, reviewers, and governance settings. It reduces ambiguity about what should be captured, how claims are linked to artefacts, and how evidence can be reused across audit, assurance, procurement, incident response, and policy crosswalks. Without it, organisations risk producing documentation that is verbose but not reviewable.

For RAIDT, the concept is central because it helps move governance from principle statements towards operational evidence. It makes it easier to compare runs, challenge conclusions, identify missing fields, and explain to supervisors, auditors, or regulators why a score or governance decision was assigned.

Key idea: Evidence grammar matters because RAIDT can only make governance reviewable if run-level evidence is expressed in a consistent and reconstructable form.

What this item captures

The minimum evidential fields needed to reconstruct a governed GenAI run.
The relationship between governance claims, technical artefacts, reviewer actions, and final decisions.
The shared syntax that allows different organisational actors to read evidence in the same way.
The translation of raw traces and logs into governance-ready evidence packs.
The reuse of one evidential structure across standards, assurance processes, procurement, and audit.

Practical example / likely audience question

Audience question

Why call this a grammar rather than just a template or a checklist?

Answer

The concern behind the question is that evidence grammar may sound more abstract than it needs to be. A template or checklist is useful, but neither guarantees that governance claims are expressed in a consistent, reviewable way. A template can be completed badly. A checklist can confirm that something was done without showing what was actually evidenced. Grammar is the stronger term because it points to structure, validity, and relationships.

In RAIDT terms, evidence grammar specifies how a claim must be connected to run-level artefacts. If a reviewer states that a run was safe to release, the grammar helps show which prompt version was used, which retrieval context was active, which output instance was reviewed, who reviewed it, when the review occurred, and what judgement was reached. That is more rigorous than merely ticking a box marked "review completed".

A generic AI governance approach may stop at documenting that controls exist. RAIDT handles the issue better because it ties the control statement to the governed run itself. The result is evidence that can be reconstructed, challenged, compared across cases, and reused for assurance rather than only recorded for compliance appearance.

Practical example in RAIDT terms

Consider a social care team using a generative AI tool to draft summaries of case notes before a safeguarding review meeting. The run-level issue is not simply whether the model performed well in general. The issue is whether this specific run used the correct case-note snapshot, whether sensitive material was handled under the right access conditions, whether the summary was checked by a practitioner, and whether the output was suitable for use in a high-stakes meeting.

The evidence needed includes the prompt ID, the configured task, the relevant case-note snapshot or retrieval reference, the model or system version, the output hash or identifier, the reviewer role, the review decision, and any override or escalation note. Responsibility is affected because a professional must remain accountable for use. Auditability and Traceability are affected because reviewers need to reconstruct the run. Dependability is affected because unstable or incomplete retrieval inputs could change the summary materially. Interpretability is affected because the basis for the summary must be understandable enough for professional scrutiny.

Evidence grammar improves governance readiness here by ensuring that the evidence pack contains a consistent account of the run rather than scattered notes across the case system, AI interface, and supervision workflow. It gives the organisation a repeatable way to show what was used, what was generated, what was reviewed, and why the output was or was not relied upon.

Detailed link to RAIDT

Evidence grammar links to RAIDT in four ways.

First, it supports RAIDT's core idea that governance should attach to a specific run rather than to abstract system claims.
Second, it structures the run-level evidence needed to reconstruct what happened in a given task, context, and time.
Third, it makes the evidence pack and RAIDT score profile more defensible because claims can be traced back to standardised artefacts and reviewer judgements.
Fourth, it strengthens reviewability, contestability, audit readiness, and organisational learning because the same evidential language can be reused across oversight settings.

Evidence grammar ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness

That chain matters because RAIDT is not only collecting data. It is converting run-specific traces into an organisationally usable basis for judgement, assurance, and improvement.

Link to the five RAIDT pillars

Responsibility

Evidence grammar clarifies who made, reviewed, approved, or rejected a run-level decision and on what basis. It reduces the common problem of organisational accountability being asserted without an identifiable evidential trail.