S4.08 - Modelproviderversion_identifier

S4.08 ? Model/provider/version identifier

flowchart LR
    A[Vague model naming
Generic labels only
Hidden provider or version changes] --> B[RAIDT
Run-level evidence framework]
    P[Model name
Provider name
Deployment ID
API version or snapshot
Checkpoint hash
Timestamp linkage] --> C[[S4.08 Model/provider/version identifier]]
    B --> C
    C --> D[Run-level evidence pack]
    C --> E[RAIDT score profile]
    C --> H[Reviewer reconstruction]
    D --> F[Contestability and audit readiness]
    E --> G[Governance readiness]
    H --> I[Organisational learning and incident analysis]

? Star S4 - Evidence Architecture and Artefacts

Star context: Identifies the concrete system instance behind a recorded run so that RAIDT evidence is inspectable, comparable, and reviewable rather than described only in generic terms.

Academic picture

Definition / background

A model/provider/version identifier records the specific generative AI system used in a given run by coupling three related elements: the model designation, the provider or operating environment, and the most stable version or deployment identifier available. Together, these elements identify the system that actually produced the output. In a hosted environment, this may involve a model label, a provider name, an API release or snapshot, and a deployment identifier. In a self-hosted or open-weight environment, it may instead involve a model family, an internal provider or platform team, and a checkpoint hash, container digest, or release tag.

Conceptually, this item sits at the boundary between technical configuration management and governance evidence. It is not just a product name. It is the minimum evidence needed to distinguish one run from another when the apparent "same model" may behave differently across providers, regions, deployment wrappers, release dates, routing layers, or fine-tuned variants. That distinction matters because governance claims about safety, quality, fairness, and reliability are only meaningful if they are tied to the actual system instance used.

Within RAIDT, this item belongs inside Evidence Architecture and Artefacts because it supports run-level inspectability. RAIDT does not govern AI through generic organisational statements alone; it governs configured uses of systems at run level. If the run record does not specify which model, which provider, and which version were involved, then the evidence pack cannot fully support reconstruction, comparison, or challenge. This item therefore contributes directly to the evidential integrity of the run-level evidence pack and indirectly to the credibility of the five-pillar score profile.

This item also differs from closely related fields. It is not the same as prompt versioning, because the prompt identifies what was asked, whereas the model/provider/version identifier identifies what answered. It is not the same as adapter lineage, because an adapter explains additional model modification layered onto a base system. It is not the same as timestamp, although timestamp becomes especially meaningful when version information is incomplete. The item is therefore a distinct part of the evidence architecture: it identifies the system endpoint or deployment context responsible for the generated result.

Why this concept matters

This concept solves a basic but consequential governance problem: many disputes about generative AI performance cannot be resolved if the organisation cannot say precisely which system produced the output. A reviewer may know the task, prompt, user role, and output, yet still be unable to explain why two runs diverged if the model/provider/version combination changed in the background. Without this item, comparisons across time become unreliable, accountability becomes diffuse, and improvement efforts risk targeting the wrong cause.

It also prevents a common category error in AI governance: treating a named model family as if it were a stable object. In practice, organisational use often depends on provider-specific implementations, hidden routing, deployment wrappers, and version updates. A generic label masks those differences. RAIDT avoids this confusion by requiring the run record to identify the actual served model context, not merely the marketing name or broad family.

For organisations, the absence of this item creates practical risk. Incidents become harder to investigate, drift becomes harder to detect, contractual and regulatory obligations become harder to evidence, and external review becomes harder to satisfy. By contrast, when the item is recorded well, organisations can distinguish a prompt issue from a provider issue, a user issue from a model-release issue, and a governance-control issue from a deployment-change issue. That is exactly the move RAIDT makes: from principles and assertions toward operational evidence.

Key idea: model, provider, and version must be recorded together because governance attaches to the actual system instance used in a run, not to a vague model label.

What this item captures

The model designation or endpoint label used for the run.
The provider, platform, or organisational environment serving that model.
The best available version marker, such as API version, snapshot date, deployment ID, checkpoint hash, release tag, or build identifier.
The distinction between apparently similar runs that in fact relied on different served systems.
The linkage needed to interpret output changes alongside prompts, decoding settings, tools, adapters, and timestamps.
The minimum provenance needed for reviewer reconstruction and run comparison.

Practical example / likely audience question

Audience question

Why is it not enough to record that the run used "GPT" or "Llama", especially if the task and prompt have already been documented?

Answer

The concern behind the question is that model naming appears, at first glance, to be sufficient shorthand. In practice, it is not. A generic model family name does not tell a reviewer which provider served the model, whether a platform wrapper changed the behaviour, or whether a version shift occurred between one run and the next. Two runs that both say "GPT" may differ materially because one used a provider-managed enterprise deployment and the other used a different hosted endpoint, API snapshot, safety layer, or routing policy.

The direct answer is that governance requires the actual served system to be identified, not just the broad family name. Suppose a finance team uses a generative AI assistant to summarise anti-money-laundering guidance. The prompt is unchanged, the user role is unchanged, and the task label is unchanged, yet a later run omits a material compliance caveat. If the run record says only "GPT", the organisation cannot tell whether the difference came from a model update, a provider switch, an internal deployment change, or something else in the stack.

RAIDT handles this better than generic AI governance because it connects the identifier to the rest of the run record. The model/provider/version identifier sits alongside prompt version, decoding parameters, tool-chain trace, timestamps, output hash, and reviewer notes. That structure lets a reviewer say, with evidence, whether a changed output is associated with a changed system instance or with some other factor. Generic governance often asks teams to document "the model used"; RAIDT asks for enough detail to support reconstruction and contestability.

Practical example in RAIDT terms

Consider a healthcare trust using a generative AI tool to draft discharge-summary explanations for clinicians to review before patient release. One run on Monday produces an accurate draft that correctly highlights a medication interaction warning. A similar run on Thursday, using the same internal workflow, fails to foreground that warning. The immediate governance question is whether the failure reflects user error, prompt weakness, retrieval variation, or a change in the underlying model service.

In RAIDT terms, the run-level issue is that the organisation must be able to reconstruct the exact system context of each run. The required evidence includes the model/provider/version identifier, timestamp, prompt ID and version, prompt hash, decoding parameters, any retrieval details, and the output hash. If Monday and Thursday used different provider deployments or different version snapshots, that difference becomes a legitimate explanatory factor rather than speculation.

The affected pillars are especially Auditability, Dependability, and Traceability, with secondary implications for Responsibility and Interpretability. Recording the identifier improves governance readiness because the trust can demonstrate that it knows not merely that "an LLM was used", but which served system produced the draft at a particular time, under which operational conditions, and with which implications for review and incident analysis.

Detailed link to RAIDT

Model/provider/version identifier links to RAIDT in four ways.

First, it operationalises RAIDT's core idea that governance should attach to the run as an evidential event rather than to abstract statements about AI use.
Second, it anchors the run-level record to the actual system instance that generated the output, making the run reconstructable and comparable.
Third, it strengthens both the evidence pack and the score profile because reviewers can assess whether evidence is sufficiently specific to support responsible judgement.
Fourth, it supports reviewability, contestability, audit readiness, and organisational learning by making system changes visible across runs rather than hidden behind generic labels.

Model/provider/version identifier -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness

In practical terms, this means RAIDT can distinguish between an organisation that merely claims to use an approved model family and an organisation that can evidence which served model context was actually used for a given task, by whom, when, and with what consequences.

Link to the five RAIDT pillars

Responsibility

This item supports Responsibility by clarifying which external provider, internal platform team, or deployment owner was implicated in a run. Responsibility becomes more concrete when decision-makers can identify the specific system context rather than referring vaguely to "the AI".