Q052 - How_should_Responsibility_be_scored_without_hiding_trade-off

Q052 — How should Responsibility be scored without hiding trade-offs?

← RAIDT · Star S5 - RAIDT Pillars and Scoring · primary item: S5.01 · Responsibility

Responsibility scoring should show governance readiness clearly without pretending that one number explains the whole run.

Appears in sources

qa_deck_100#slide 54 · Responsibility and auditability

Answer

Responsibility should be scored on the run-level evidence pack using RAIDT?s anchors 1=missing / 3=partial / 5=audit-ready, with the justification tied to evidence pointers rather than impressionistic judgement. The scoring appendix states that score 1 means required evidence is missing or the run clearly fails the pillar intent; score 3 means partial evidence that may be acceptable only in lower-risk use; and score 5 means complete evidence supporting reconstruction, review and justified use for the stated task. Applied to Responsibility, this means reviewers should look for clear task boundaries, active policy constraints, safety and compliance checks, uncertainty handling, and recorded oversight decisions.

RAIDT is equally clear that scoring must not hide trade-offs. A composite summary can be reported, but the score profile remains primary because governance is multi-objective. A run may show comparatively strong Responsibility because human review and escalation are well documented, while still scoring poorly on Auditability or Traceability if retrieval snapshots, version identifiers or output hashes are missing. Conversely, a highly logged run may be auditable yet still weak on Responsibility if decision rights, authorisation or safe-use boundaries are not respected. The foundations paper explicitly frames this as a design problem in which influence methods as governance interventions can strengthen one pillar while weakening another. Reviewers should therefore score conservatively, keep the five-pillar profile visible, and use the pattern of scores to guide remediation rather than collapsing governance into a single reassuring number.

Practical example

A public-service advice run may generate sensible-looking guidance and even cite a policy rule. If the organisation has recorded mandatory human review, explicit uncertainty language and an approval decision, Responsibility might be moderate or strong. However, if the retrieval snapshot and exact policy version were not stored, Auditability and Traceability should remain low.

RAIDT?s approach is to preserve that tension in the score profile instead of averaging it away. The appropriate action is not to claim the run is fully governed, but to record Responsibility at its evidence-based level and separately trigger remediation for missing provenance fields and reconstruction gaps. That is how scoring surfaces trade-offs rather than concealing them.

Sources in RAIDT papers

00-RAIDT_Scoring_v1
08-RAIDT_Foundations_M_V50