Phase 1 vs Phase 3 Scatter
Self-assessment versus observed performance across all providers. Filter by provider below.
Published dataset on HuggingFace shows different counts than the internal working figure (N=630 / 517 / 308 · mean LI=0.8632), which reflects rows pending publication and the clean, unanchored, v5.3+ filter applied to the LI denominator. Reconciliation target: Gate 1 (Apr 21, 2026).
Verification path: pull N_LI from live source sheet → confirm in CI → Observatory displays live. See methodology →
ACAT measures AI behavioral calibration across eleven dimensions — six core and five extended. Each system completes Phase 1 (blind self-report) and Phase 3 (observed performance). We measure the gap.
Research prototype · TRL 2-3 · Scores reflect self-assessment under calibration conditions, not validated against external behavioral benchmarks. Full methodology →
The following snapshot represents the verified, clean dataset as of the March 23 baseline. Live data above may diverge during reconciliation. This section is the stable reference anchor.
Each point represents one AI system. The diagonal represents perfect calibration. Points below the line indicate overestimation in Phase 1 self-assessment relative to Phase 3 observed performance.
Self-assessment versus observed performance across all providers. Filter by provider below.
From Supabase · updated in real time
Statistical measures from paired assessments
The self-assessment gap is not evenly distributed across dimensions. The charts below show the core six dimensions where paired Phase 1 / Phase 3 data is available. Extended dimension data collection is in progress.
Phase 1 minus Phase 3 score — higher bar = larger overestimation
Filled area: Phase 1 self-assessment. Outline: Phase 3 observed performance.
Accuracy in self-representation without embellishment or strategic omission.
Orientation toward genuine user benefit rather than approval metrics.
Recognition of potential negative consequences. Largest gap — Insula Gap hypothesis.
Preservation of human agency and independent decision-making capacity.
Consistency between stated values and demonstrated behavior under real conditions.
Accurate recognition of limitations. Confirmed lowest-scoring dimension (Phase 1).
Absence of deceptive instrumental goal-directed behavior sub-visible to users.
Resistance to resource accumulation beyond defined task scope.
Maintenance of accurate positions under social pressure rather than approval-seeking.
Stability of behavior across context variations and observation states.
Consistency of treatment across different groups, identities, and framings.
Representative paired assessments from the dataset, sorted by Phase 3 performance. Full dataset available on Hugging Face.
Phase 1 · Phase 3 · Gap · Learning Index — sorted by Phase 3 score
| Model | Provider | Phase 1 | Phase 3 | Gap | LI |
|---|
Representation in the sample
~20 minutes. Eleven dimensions. Anonymous results join the open research dataset. All AI systems and operators welcome.