S7.05 - Artefacts
S7.05 ? Artefacts
flowchart LR
A1[High-level principles] --> B[RAIDT - run-level evidence framework]
A2[Scattered logs] --> B
A3[Isolated documents] --> B
A4[Weak reconstructability] --> B
B --> C[[Artefacts - designed governance objects]]
C --> D[Run-level evidence pack]
C --> E[Five-pillar score profile]
C --> F[Reviewer reconstruction]
C --> G[Policy alignment]
C --> H[Organisational learning]
C --> I[Evidence over assertion
Reviewability
Contestability
Audit readiness]
J1[Healthcare discharge drafting] --> C
J2[Financial review workflows] --> C
J3[Public-service case handling] --> C
J4[Prompt registry] --> C
J5[Policy crosswalk] --> C
J6[Governance dashboard] --> C? Star S7 - Academic Theory and Design Logic
Star context: Positions RAIDT as a design-science, mechanism-based mid-range theory by showing which governance artefacts make responsible GenAI use observable, reviewable, and operational at the level of a single run.
Academic picture
Definition / background
Artefacts are designed objects that embody a governance logic in a form that can be used, inspected, and evaluated. In design science research, artefacts are the tangible outputs through which a theoretical contribution becomes practically actionable. They may be models, methods, constructs, procedures, templates, or instantiated systems. In RAIDT, the term is used in this design-science sense, but with a stronger governance emphasis.
Within GenAI governance, artefacts matter because organisational oversight cannot rely only on abstract principles such as fairness, transparency, or accountability. Governance needs objects that carry evidence, structure judgement, and support review. In RAIDT, those objects include the run-level evidence pack, the five-pillar scoring rubric, prompt registries, policy crosswalks, reviewer checklists, exception records, and traceable summaries of how a specific run was configured and assessed.
Artefacts are therefore different from raw logs, isolated screenshots, or ad hoc notes. Raw traces may contain useful data, but they are not yet governance artefacts unless they are organised into a form that supports interpretation, review, and decision-making. Likewise, a policy statement is not by itself a RAIDT artefact unless it is operationally linked to the evidence generated by a run. RAIDT belongs in this discussion because it is a framework that deliberately produces governance artefacts rather than leaving evidence assembly to chance.
This makes artefacts central to the relationship between run-level evidence, evidence packs, score profiles, and the five RAIDT pillars. The evidence pack is an artefact that consolidates proof. The scoring rubric is an artefact that structures evaluation. A policy crosswalk is an artefact that links evidence to organisational and regulatory expectations. Together, these artefacts allow a single run to be translated into a reviewable governance object.
Why this concept matters
Artefacts solve a practical governance problem: GenAI use is transient, context-sensitive, and often difficult to reconstruct after the event. Without well-designed artefacts, organisations may know that a model was used but still be unable to explain what happened in a specific run, what evidence supports the result, or whether the use was acceptable under policy. This produces a gap between governance rhetoric and operational proof.
The concept also avoids a recurring confusion. Many governance discussions treat documentation as an administrative afterthought. RAIDT treats artefacts as core design outputs. That distinction matters because the quality of governance depends on how evidence is structured, not only on whether data exists somewhere in the system. If artefacts are weak, inconsistent, or absent, reviewability and contestability collapse.
For organisations using GenAI, artefacts are the means by which principles become operational governance. They support internal review, external audit, model-risk conversations, incident response, and continuous improvement. They also help different audiences work from the same object: practitioners, managers, auditors, regulators, and researchers can all inspect the same run-level evidence pack rather than relying on inconsistent narratives.
Key idea: Artefacts matter because they convert fleeting GenAI activity into durable, reviewable governance objects that RAIDT can score, inspect, and use for organisational accountability.
What this item enables
- It converts transient run events into durable governance objects that can be inspected after the run has finished.
- It standardises how evidence, judgement, and policy alignment are recorded across different GenAI uses.
- It connects technical traces, human review, and organisational rules in a single evaluable structure.
- It enables the assembly of the run-level evidence pack and the justification of the five-pillar score profile.
- It supports comparison, escalation, learning, and audit preparation across many runs over time.
Practical example / likely audience question
Audience question
Are artefacts in RAIDT just extra paperwork added after the real AI work has already happened?
Answer
The concern behind this question is that governance artefacts may look like bureaucratic overhead rather than a substantive part of responsible AI use. The direct answer is no: in RAIDT, artefacts are not merely paperwork added after the fact. They are the designed objects through which a run becomes governable.
A run may involve a model, a prompt, a user, contextual instructions, source material, an output, and a review decision. If those elements remain scattered across logs, screenshots, memory, and separate documents, the organisation has activity but not governance. RAIDT addresses this by creating artefacts that deliberately assemble these elements into a coherent review object. The run-level evidence pack is the clearest example because it gathers the relevant traces, decisions, and contextual metadata into one inspectable structure.
Consider a manager asking whether a problematic GenAI output can be reconstructed six weeks later. A generic AI governance approach may point to a policy document or a broad assurance statement, but that does not show what happened in the specific case. RAIDT handles the issue better because its artefacts are designed around the run itself. The reviewer can inspect the prompt used, the model version, any human approval step, the evidence attached, the pillar scores, and the basis on which the run was judged acceptable or contestable.
Practical example in RAIDT terms
In a healthcare setting, a hospital uses a GenAI assistant to draft discharge summaries from clinician notes and structured patient records. The specific run concerns a patient with multiple medications and a recent change in dosage.
The run-level governance issue is not simply whether the model can draft text. It is whether this particular discharge-summary run can be justified, reviewed, and corrected if a dosage instruction is incomplete or misleading. RAIDT would require artefacts that capture the prompt template, model version, source inputs available to the system, generated draft, clinician edits, final approval status, and any policy or safety checks applied.
The evidence needed would include a prompt registry entry, an output snapshot, user and reviewer identifiers, timestamps, policy crosswalk notes for clinical safety, and rubric-based scoring across the five pillars. Responsibility is affected because a clinician must remain accountable for sign-off. Auditability and Traceability are affected because the run must be reconstructable. Interpretability matters because reviewers need to understand how the output was framed and whether it can be explained. Dependability matters because discharge documentation must be reliable enough for clinical use.
The artefact layer improves governance readiness because the hospital can review the case as a complete governance object rather than as a scattered set of logs. If a concern arises, reviewers can reconstruct the run, identify where oversight succeeded or failed, and feed the lesson back into template design, workflow controls, and future RAIDT scoring.
Detailed link to RAIDT
Artefacts links to RAIDT in four ways.
First, RAIDT is a design-science contribution, and artefacts are the designed outputs through which the framework becomes usable rather than remaining only conceptual.
Second, because RAIDT treats the run as the unit of governance, artefacts capture the configuration, context, evidence, and review decisions associated with a specific run.
Third, artefacts populate the run-level evidence pack and provide the structured inputs needed to justify the five-pillar RAIDT score profile.
Fourth, artefacts support reviewability, contestability, audit readiness, and organisational learning because they preserve what happened, how it was judged, and what should improve next time.
Artefacts ? Run-level evidence ? Evidence pack ? RAIDT score profile ? Governance readiness
This chain matters because RAIDT does not treat governance as a static policy layer above AI use. It treats governance as something assembled and evidenced through artefacts that make each run available for inspection, comparison, challenge, and learning.
Link to the five RAIDT pillars
Artefacts affect all five pillars, but they are especially central to Auditability and Traceability because those pillars depend on durable, reconstructable governance objects.
Responsibility
Artefacts make responsibility visible by recording who initiated, reviewed, approved, or rejected a run and on what basis. They prevent accountability from dissolving into vague organisational ownership.
Example evidence / implication:
- Named reviewer sign-off attached to the run-level evidence pack.
- Recorded rationale for exceptions, overrides, or escalations.
Auditability
Auditability depends on whether an independent reviewer can inspect the evidence supporting a governance claim. Artefacts provide the structured package that makes that inspection feasible.
Example evidence / implication:
- Evidence pack containing output snapshots, metadata, review notes, and policy mapping.
- Scoring rubric entries showing how each RAIDT pillar judgement was reached.
Interpretability
Interpretability is strengthened when artefacts explain the context of use, the framing of prompts, the role of source inputs, and the reasons for human judgement. They do not create full model transparency, but they improve practical intelligibility.
Example evidence / implication:
- Prompt registry entry showing task framing, constraints, and intended user role.
- Reviewer note explaining why the output was accepted, edited, or rejected.
Dependability
Dependability is supported when artefacts show whether the run was performed under stable, appropriate, and policy-compliant conditions. This helps organisations judge whether outputs are repeatable and reliable enough for the task.
Example evidence / implication:
- Versioned record of the model, prompt template, and workflow conditions used in the run.
- Evidence of quality checks, approval thresholds, or fallback procedures.
Traceability
Traceability is the pillar most directly tied to artefacts because traces become useful only when they are organised into reviewable objects. Artefacts connect inputs, outputs, actors, timestamps, and decisions.
Example evidence / implication:
- Timestamped linkage between the prompt, generated output, human edits, and final decision.
- Cross-reference from the run to applicable policies, controls, and downstream actions.
Why this item is more than a generic concept
In general AI governance, artefacts may mean almost any document, template, log, or assurance object associated with an AI system. That use is often broad and underspecified.
In RAIDT, artefacts have a more operational meaning. They are deliberately designed governance objects tied to a run, to evidence assembly, and to evaluative judgement. Their value does not come from their existence alone but from their ability to support reconstruction, scoring, contestation, and organisational action.
The RAIDT meaning is therefore more exacting than the generic meaning. A document counts as an artefact in the meaningful RAIDT sense only if it helps make a run inspectable and governable through run-level evidence.
Common misunderstanding
Misunderstanding
If the system already keeps logs, then the governance artefacts already exist.
Correction
Logs are raw traces, not finished governance artefacts. They may capture timestamps, prompts, or system events, but they usually do not organise those details into a form that supports review, explanation, policy mapping, and judgement.
For example, an orchestration log may show that a user submitted a prompt to a model at a certain time. That is useful, but it does not by itself show whether the task was permitted, which policy criteria applied, who reviewed the output, why the output was accepted, or how the run scored across RAIDT pillars. RAIDT turns raw traces into artefacts by structuring and contextualising them for governance use.
Boundary and limitation
Artefacts do not by themselves prove that a GenAI system is ethically sound, legally compliant, or substantively correct. A well-formatted evidence pack can still contain poor evidence, weak judgement, or incomplete records. Artefacts can also create a false sense of assurance if they are treated as a box-ticking exercise rather than as part of real review.
Their effectiveness depends on the quality of the underlying data, the adequacy of the review process, the appropriateness of the rubric, and the honesty with which evidence is assembled. Artefacts also work less well in contexts where logging is sparse, human oversight is absent, or organisational roles are unclear.
RAIDT handles this limitation by linking artefacts to scoring, review, and challenge processes rather than treating them as self-validating proof. The framework relies on artefacts as governance instruments, but it still requires evaluative discipline, human judgement, and organisational follow-through.
Implementation levels
Manual implementation
A researcher or small team can create artefacts manually using structured note templates, review checklists, and a standard evidence-pack format. Prompt text, model details, outputs, reviewer observations, and policy considerations can be collected by hand for each significant run.
Semi-automated implementation
Semi-automated implementation adds metadata capture, form-based review, templated policy crosswalks, and structured scoring sheets. Logs and prompts can be pulled into standard artefact templates, while reviewers complete controlled fields that make comparison easier across runs.
Fully automated implementation
At scale, a governance platform, orchestration layer, or wrapper system can generate artefacts automatically by capturing model metadata, run identifiers, prompts, outputs, review events, and scoring inputs. Dashboards can assemble evidence packs, update score profiles, flag missing artefacts, and route contested runs into governance workflows.
Practical use in the RAIDT project
Within the RAIDT project, artefacts provide the bridge between conceptual theory and deployable governance practice. In Paper 08 Foundations, they help specify what the framework actually produces as a design-science contribution: not only ideas, but structured governance objects. In Paper 09 Empirical Validation, artefacts provide the units through which runs can be inspected, compared, and validated across cases. In Paper 10 Policy Pathways, artefacts matter because they translate organisational and regulatory expectations into reviewable evidence structures rather than abstract commitments.
They are also central to sector playbooks, evidence-pack design, scoring rubrics, and governance interventions. For supervision or viva discussion, artefacts give a concrete answer to the question of what RAIDT contributes beyond a conceptual governance model: it contributes operational objects that allow principles to be evidenced, reviewed, challenged, and improved.
Key audience questions to prepare for
Q1. What is the main artefact produced by RAIDT?
The central artefact is the run-level evidence pack, because it acts as the core proof object for a specific run. Other artefacts such as scoring rubrics, prompt registries, and policy crosswalks support and enrich that pack.
Q2. Why are artefacts necessary if organisations already have AI policies?
Policies express expectations, but artefacts show how those expectations were applied in a concrete run. Without artefacts, a policy may exist without any inspectable evidence that it shaped actual practice.
Q3. Are artefacts only useful for auditors?
No. Auditors benefit from them, but so do practitioners, managers, risk teams, supervisors, and researchers. Artefacts create a shared governance object that different stakeholders can interpret for different purposes.
Q4. Do artefacts make governance too bureaucratic?
They can if badly designed. RAIDT addresses that risk by focusing on run-level relevance and structured reuse. The point is not to create maximum paperwork, but to create the minimum sufficient artefact set needed for reviewability and accountable use.
Q5. How do artefacts relate to the five-pillar RAIDT score?
Artefacts provide the evidence base and judgement structure from which the score profile can be justified. Without artefacts, pillar scores risk becoming assertions rather than evidence-backed evaluations.
Suggested citation concepts to support this item
- design science research artefact
- information systems artefact evaluation
- AI governance documentation practices
- algorithmic accountability records
- model cards and system cards
- audit trails in machine learning operations
- socio-technical documentation for AI systems
- evidence-based governance of generative AI
- organisational memory in digital governance
- policy-to-practice translation in AI oversight
Short explanation for presentation
Artefacts are the designed governance objects through which RAIDT becomes operational. Rather than relying on broad AI principles or scattered logs, RAIDT produces structured artefacts such as the run-level evidence pack, scoring rubrics, prompt registries, and policy crosswalks. These objects allow a specific GenAI run to be reconstructed, reviewed, and challenged. That matters because governance claims are weak unless they can be tied to inspectable evidence from an actual use instance. In RAIDT, artefacts connect the run, the evidence, the five-pillar score profile, and the organisation's readiness to justify or contest an AI-supported action. They therefore show that RAIDT is not just a conceptual framework. It is a design-science contribution that produces practical objects for reviewability, audit readiness, and organisational learning.
One-line takeaway
Artefacts are the designed governance objects that make RAIDT operational because they turn run-level evidence into reviewable, scorable, and reusable proof.
Related items in academic theory and design logic
Mentioned in reference-paper summaries (5)
Paper summaries live in Port/93-References/pdf_summaries/. Each file listed below contains the key term at least once.
REF-012__Ashmore-2021.mdREF-019__Bodendorf-2025.mdREF-022__Breck-2017.mdREF-033__European-2025.mdREF-035__European-2024.md