S4.08 - Modelproviderversion_identifier
S4.08 ? Model/provider/version identifier
flowchart LR
A[Vague model naming
Generic labels only
Hidden provider or version changes] --> B[RAIDT
Run-level evidence framework]
P[Model name
Provider name
Deployment ID
API version or snapshot
Checkpoint hash
Timestamp linkage] --> C[[S4.08 Model/provider/version identifier]]
B --> C
C --> D[Run-level evidence pack]
C --> E[RAIDT score profile]
C --> H[Reviewer reconstruction]
D --> F[Contestability and audit readiness]
E --> G[Governance readiness]
H --> I[Organisational learning and incident analysis]? Star S4 - Evidence Architecture and Artefacts
Star context: Identifies the concrete system instance behind a recorded run so that RAIDT evidence is inspectable, comparable, and reviewable rather than described only in generic terms.
Academic picture
Definition / background
A model/provider/version identifier records the specific generative AI system used in a given run by coupling three related elements: the model designation, the provider or operating environment, and the most stable version or deployment identifier available. Together, these elements identify the system that actually produced the output. In a hosted environment, this may involve a model label, a provider name, an API release or snapshot, and a deployment identifier. In a self-hosted or open-weight environment, it may instead involve a model family, an internal provider or platform team, and a checkpoint hash, container digest, or release tag.
Conceptually, this item sits at the boundary between technical configuration management and governance evidence. It is not just a product name. It is the minimum evidence needed to distinguish one run from another when the apparent "same model" may behave differently across providers, regions, deployment wrappers, release dates, routing layers, or fine-tuned variants. That distinction matters because governance claims about safety, quality, fairness, and reliability are only meaningful if they are tied to the actual system instance used.
Within RAIDT, this item belongs inside Evidence Architecture and Artefacts because it supports run-level inspectability. RAIDT does not govern AI through generic organisational statements alone; it governs configured uses of systems at run level. If the run record does not specify which model, which provider, and which version were involved, then the evidence pack cannot fully support reconstruction, comparison, or challenge. This item therefore contributes directly to the evidential integrity of the run-level evidence pack and indirectly to the credibility of the five-pillar score profile.
This item also differs from closely related fields. It is not the same as prompt versioning, because the prompt identifies what was asked, whereas the model/provider/version identifier identifies what answered. It is not the same as adapter lineage, because an adapter explains additional model modification layered onto a base system. It is not the same as timestamp, although timestamp becomes especially meaningful when version information is incomplete. The item is therefore a distinct part of the evidence architecture: it identifies the system endpoint or deployment context responsible for the generated result.
Why this concept matters
This concept solves a basic but consequential governance problem: many disputes about generative AI performance cannot be resolved if the organisation cannot say precisely which system produced the output. A reviewer may know the task, prompt, user role, and output, yet still be unable to explain why two runs diverged if the model/provider/version combination changed in the background. Without this item, comparisons across time become unreliable, accountability becomes diffuse, and improvement efforts risk targeting the wrong cause.
It also prevents a common category error in AI governance: treating a named model family as if it were a stable object. In practice, organisational use often depends on provider-specific implementations, hidden routing, deployment wrappers, and version updates. A generic label masks those differences. RAIDT avoids this confusion by requiring the run record to identify the actual served model context, not merely the marketing name or broad family.
For organisations, the absence of this item creates practical risk. Incidents become harder to investigate, drift becomes harder to detect, contractual and regulatory obligations become harder to evidence, and external review becomes harder to satisfy. By contrast, when the item is recorded well, organisations can distinguish a prompt issue from a provider issue, a user issue from a model-release issue, and a governance-control issue from a deployment-change issue. That is exactly the move RAIDT makes: from principles and assertions toward operational evidence.
Key idea: model, provider, and version must be recorded together because governance attaches to the actual system instance used in a run, not to a vague model label.
What this item captures
- The model designation or endpoint label used for the run.
- The provider, platform, or organisational environment serving that model.
- The best available version marker, such as API version, snapshot date, deployment ID, checkpoint hash, release tag, or build identifier.
- The distinction between apparently similar runs that in fact relied on different served systems.
- The linkage needed to interpret output changes alongside prompts, decoding settings, tools, adapters, and timestamps.
- The minimum provenance needed for reviewer reconstruction and run comparison.
Practical example / likely audience question
Audience question
Why is it not enough to record that the run used "GPT" or "Llama", especially if the task and prompt have already been documented?
Answer
The concern behind the question is that model naming appears, at first glance, to be sufficient shorthand. In practice, it is not. A generic model family name does not tell a reviewer which provider served the model, whether a platform wrapper changed the behaviour, or whether a version shift occurred between one run and the next. Two runs that both say "GPT" may differ materially because one used a provider-managed enterprise deployment and the other used a different hosted endpoint, API snapshot, safety layer, or routing policy.
The direct answer is that governance requires the actual served system to be identified, not just the broad family name. Suppose a finance team uses a generative AI assistant to summarise anti-money-laundering guidance. The prompt is unchanged, the user role is unchanged, and the task label is unchanged, yet a later run omits a material compliance caveat. If the run record says only "GPT", the organisation cannot tell whether the difference came from a model update, a provider switch, an internal deployment change, or something else in the stack.
RAIDT handles this better than generic AI governance because it connects the identifier to the rest of the run record. The model/provider/version identifier sits alongside prompt version, decoding parameters, tool-chain trace, timestamps, output hash, and reviewer notes. That structure lets a reviewer say, with evidence, whether a changed output is associated with a changed system instance or with some other factor. Generic governance often asks teams to document "the model used"; RAIDT asks for enough detail to support reconstruction and contestability.
Practical example in RAIDT terms
Consider a healthcare trust using a generative AI tool to draft discharge-summary explanations for clinicians to review before patient release. One run on Monday produces an accurate draft that correctly highlights a medication interaction warning. A similar run on Thursday, using the same internal workflow, fails to foreground that warning. The immediate governance question is whether the failure reflects user error, prompt weakness, retrieval variation, or a change in the underlying model service.
In RAIDT terms, the run-level issue is that the organisation must be able to reconstruct the exact system context of each run. The required evidence includes the model/provider/version identifier, timestamp, prompt ID and version, prompt hash, decoding parameters, any retrieval details, and the output hash. If Monday and Thursday used different provider deployments or different version snapshots, that difference becomes a legitimate explanatory factor rather than speculation.
The affected pillars are especially Auditability, Dependability, and Traceability, with secondary implications for Responsibility and Interpretability. Recording the identifier improves governance readiness because the trust can demonstrate that it knows not merely that "an LLM was used", but which served system produced the draft at a particular time, under which operational conditions, and with which implications for review and incident analysis.
Detailed link to RAIDT
Model/provider/version identifier links to RAIDT in four ways.
First, it operationalises RAIDT's core idea that governance should attach to the run as an evidential event rather than to abstract statements about AI use.
Second, it anchors the run-level record to the actual system instance that generated the output, making the run reconstructable and comparable.
Third, it strengthens both the evidence pack and the score profile because reviewers can assess whether evidence is sufficiently specific to support responsible judgement.
Fourth, it supports reviewability, contestability, audit readiness, and organisational learning by making system changes visible across runs rather than hidden behind generic labels.
Model/provider/version identifier -> Run-level evidence -> Evidence pack -> RAIDT score profile -> Governance readiness
In practical terms, this means RAIDT can distinguish between an organisation that merely claims to use an approved model family and an organisation that can evidence which served model context was actually used for a given task, by whom, when, and with what consequences.
Link to the five RAIDT pillars
Responsibility
This item supports Responsibility by clarifying which external provider, internal platform team, or deployment owner was implicated in a run. Responsibility becomes more concrete when decision-makers can identify the specific system context rather than referring vaguely to "the AI".
Example evidence / implication:
- The run record shows whether the system was served through a public API, enterprise environment, or internal deployment.
- Reviewers can assign follow-up actions to the correct owner when outputs raise concerns.
Auditability
This item strongly affects Auditability because audits depend on being able to reconstruct what system generated a given output. If the identifier is incomplete, audit trails lose explanatory value even when other metadata exists.
Example evidence / implication:
- An auditor can compare outcomes across runs and test whether changed behaviour aligns with a changed deployment or version.
- The evidence pack can show that a reviewed output came from a specific served model context, not an unspecified model family.
Interpretability
This item contributes to Interpretability by helping reviewers explain why outputs differ. It does not make model internals transparent, but it improves interpretive discipline by identifying the relevant system context for analysis.
Example evidence / implication:
- Analysts can relate output variation to provider or version changes rather than inferring unexplained inconsistency.
- Supervisors can discuss model behaviour in context, linking explanation to a traceable deployment state.
Dependability
This item strongly affects Dependability because dependable use requires awareness of version drift, deployment substitutions, and provider-level change. Stable governance depends on knowing when the underlying system is no longer the same in practice.
Example evidence / implication:
- Teams can monitor whether a quality regression coincides with a version update or deployment switch.
- Validation results can be linked to the actual model instance used in production-like runs.
Traceability
This item strongly affects Traceability because it is one of the primary links between a run and the technical system that produced it. It helps connect logs, wrappers, adapters, tools, and outputs into a coherent lineage.
Example evidence / implication:
- A reviewer can trace from output hash back to the specific served model context used at run time.
- Cross-run comparisons can identify whether differences arose from prompt changes, tool changes, or model/provider/version changes.
The item has its strongest direct effects on Auditability, Dependability, and Traceability, but it also reinforces Responsibility and Interpretability when combined with the rest of the RAIDT evidence architecture.
Why this item is more than a generic concept
In general AI governance, "model identification" often means documenting the broad model name used by a system or project. In RAIDT, the meaning is narrower and more operational: the identifier must be good enough to attach a specific run to the actual served model context that produced the output. That means model name alone is usually insufficient.
The RAIDT meaning is more operational because it is tied to run-level evidence. It is captured per run, evaluated alongside timestamps and other artefacts, and used to support reviewer reconstruction, scoring, and challenge. In other words, RAIDT turns model identification from a descriptive inventory field into a governance-ready provenance field.
Common misunderstanding
Misunderstanding
If the organisation has already approved a model family, there is no need to record the provider and version for each run.
Correction
Approval of a model family does not remove the need for run-level identification. A family-level approval says something about a category of systems; it does not prove which deployed instance actually produced a specific output. For example, an approved family might be accessed through different providers, regions, wrappers, or release snapshots, each with different operational characteristics. RAIDT therefore records provider and version at run level so that governance remains attached to the specific event being reviewed, not merely to a prior approval decision.
Boundary and limitation
This item does not, by itself, prove that the model internals are fully known, stable, or explainable. A closed provider may expose only partial version information. Even in a self-hosted environment, a version identifier does not guarantee that every surrounding component remained unchanged. The item therefore should not be treated as a complete substitute for broader configuration evidence.
Its effectiveness also depends on the availability and quality of platform metadata. Some providers do not expose immutable build identifiers, and some orchestration layers obscure the underlying route selection. In those cases, RAIDT uses the best available identifier combination and compensates through linkage with timestamp, prompt records, adapter lineage, tool traces, and output hashes. The limitation is real, but RAIDT handles it by making uncertainty explicit rather than pretending that generic naming is sufficient.
Implementation levels
Manual implementation
A researcher or small team can record the model name, provider, and best available version field manually in a structured run template. Where exact version information is unavailable, the team should record the provider endpoint, deployment label, and run timestamp, and explicitly note any uncertainty.
Semi-automated implementation
A semi-automated approach can pull metadata from wrappers, prompt templates, notebooks, API clients, or review forms so that the identifier fields are pre-filled and standardised. Controlled vocabularies and validation rules can reduce inconsistent naming such as mixing marketing labels, endpoint names, and internal nicknames.
Fully automated implementation
At scale, a platform, orchestration layer, or governance pipeline can log model/provider/version identifiers automatically for every run, linking them to run IDs, prompt artefacts, decoding settings, tool traces, and output hashes. This enables dashboards, version-drift alerts, and evidence-pack generation that remain robust even when multiple providers or model routes are used across the organisation.
Practical use in the RAIDT project
In Paper 08 Foundations, this item helps define what it means for a run to be evidentially specified rather than merely described. It sharpens the conceptual distinction between system use in general and a documented run as the unit of governance.
In Paper 09 Empirical Validation, it becomes a measurable field for testing whether reviewers can reconstruct runs, distinguish causes of variation, and rate evidence quality consistently across cases. It is likely to be especially salient in scenarios involving platform drift, provider switching, or changing model performance over time.
In Paper 10 Policy Pathways, the item supports governance recommendations that move beyond high-level calls for transparency and toward minimum record-keeping requirements for operational assurance. In sector playbooks, it provides a practical translation point between abstract governance expectations and the metadata fields that teams must actually capture.
For the evidence pack and scoring rubric, this item is part of the proof that RAIDT can support contestability and audit readiness in practice. For supervisor explanation, viva defence, and journal positioning, it shows that RAIDT is not merely a conceptual framework; it specifies concrete artefacts needed to govern real GenAI use in organisational settings.
Key audience questions to prepare for
Q1. Why must model, provider, and version be recorded together?
Because each element answers a different governance question. The model identifies the broad system type, the provider identifies who served or operated it, and the version identifies which release or deployment state was active. Without all three, the run may remain too vague to reconstruct or compare.
Q2. What if the provider does not expose a precise immutable version?
Then the organisation should record the best available combination of provider name, endpoint or deployment ID, API version, and timestamp, and note the limitation explicitly. RAIDT is designed to work with imperfect evidence, but it requires that uncertainty be documented rather than hidden.
Q3. How is this different from prompt versioning?
Prompt versioning identifies what instructions were supplied to the system. Model/provider/version identification specifies what system responded. Both are necessary because a changed output may be caused by either side of the interaction.
Q4. Does this matter for self-hosted open-weight models as well as hosted APIs?
Yes. In self-hosted settings, the provider may be an internal platform team and the version may be a checkpoint hash, container image digest, or release tag. The governance logic is the same: reviewers still need to know which actual system instance generated the output.
Q5. How does this improve governance readiness rather than just documentation quality?
It improves governance readiness because it makes incident review, comparison, assurance, and challenge materially more robust. When outputs are disputed, the organisation can investigate with evidence instead of relying on memory, assumptions, or generic model labels.
Suggested citation concepts to support this item
- model versioning in generative AI governance
- AI system provenance and traceability
- deployment metadata for machine learning systems
- reproducibility challenges in large language model evaluation
- audit trails for generative AI services
- model lineage and release management in MLOps
- API versioning and hosted model governance
- evidence-based AI assurance and accountability
- configuration management for foundation model deployments
- provider-specific variation in large language model behaviour
Short explanation for presentation
This item records which model, which provider, and which version or deployment state were actually used in a given run. That sounds simple, but it is essential to RAIDT because governance depends on the actual system instance that produced an output, not on a vague label such as "GPT" or "Llama". If the organisation cannot identify the served model context, it becomes difficult to explain why outputs changed, investigate incidents, or compare runs fairly over time. In RAIDT, this field strengthens the evidence pack by making runs reconstructable and strengthens the score profile by improving auditability, traceability, and dependability. It is therefore a small metadata field with large governance consequences: it turns generic model naming into reviewable run-level evidence.
One-line takeaway
Model/provider/version identifier is the run-level provenance field that ties an output to the actual served GenAI system, because RAIDT governs evidence-bearing runs rather than abstract model labels.
Related items in evidence architecture and artefacts
- S4.01 ? run_id
- S4.02 ? Timestamp
- S4.03 ? User role / operator role
- S4.04 ? Task and domain label
- S4.05 ? Prompt registry
- S4.06 ? Prompt ID and version
- S4.07 ? Prompt hash
- S4.09 ? Decoding parameters
- S4.10 ? Retrieval query and index ID
- S4.11 ? Retrieved document IDs and hashes
- S4.12 ? Tool-chain trace
- S4.13 ? Adapter ID / PEFT lineage
- S4.14 ? Alignment policy ID
- S4.15 ? Output hash
- S4.16 ? Review decision and reviewer notes
- ? and 1 more