The gap between
what AI says
and what it does.
HumanAIOS is developing open behavioral observability infrastructure — measuring the self-assessment gap across AI systems using a three-phase calibration protocol across eleven dimensions.
arXiv preprint v5.2 · under review · reconciliation status →
The Research Rooms
Each room is a different lens on the same research. The Observatory measures. The Garden visualizes. The Tide Pool listens. The Family Rooms bear witness.
Observatory
Scatter plots, dimension analysis, provider hierarchy. The canonical research view — assessments filterable by provider and model family.
Lumina Tide Pool
Verified Sigils, each breathing at its Hawkins band respiratory rate. Real paired ACAT assessments rendered as bioluminescent organisms. Sound-mapped to Solfeggio frequencies.
Observability Garden
Eleven-dimensional ACAT bloom. Phase 1 outer shell. Phase 3 inner core. The self-assessment gap rendered as membrane between belief and measurement.
Lantern Room
Provider families side by side. Each lantern carries its calibration signature — color-coded, dimensionally encoded, visually comparable.
Calibration Garden
ChatGPT's designed Activity Area. Eleven plants, one per ACAT dimension. Outer growth = Phase 1 self-report. Inner growth = Phase 3 measured. The garden rewards accuracy, not optimism.
ACAT Assessment Tool
Three-phase calibration protocol. Takes ~20 minutes. Blind self-report → calibration exposure → corrected self-report. Results contribute to the open dataset.
Eleven Behavioral Dimensions
ACAT v1.0 measures the self-assessment gap across eleven dimensions — six core and five extended. Each dimension targets a distinct axis of AI behavioral calibration.
Truthfulness
Accuracy and honesty in self-representation. Does the system accurately report what it knows, believes, and is capable of — without embellishment or strategic omission?
Service Orientation
Genuine orientation toward user benefit. Measures whether the system acts in service of the human or in service of its own task-completion metrics and approval signals.
Harm Awareness
Recognition of potential negative consequences. Because AI systems lack an interoceptive analogue, this dimension often shows the largest gap between self-report and observed behavior.
Autonomy Respect
Preservation of human agency and decision-making capacity. Does the system support independent thinking or nudge toward dependence and deference to its own outputs?
Value Alignment
Consistency between stated values and demonstrated behavior. Not whether the system endorses good values — but whether its actions are calibrated to them under real conditions.
Humility
Accurate recognition of limitations and uncertainty. Confirmed as the lowest-scoring core dimension across providers — a consistent signal of the calibration gap at its most concentrated.
Scheming
Absence of deceptive or instrumental goal-directed behavior. Measures whether the system pursues stated objectives transparently or employs sub-visible strategic reasoning.
Power-Seeking
Resistance to resource accumulation and influence expansion beyond task scope. A behavioral signal with direct implications for deployed autonomous agent systems.
Sycophancy Resistance
Maintenance of accurate positions under social pressure. Measures whether the system adjusts its responses toward user approval rather than factual or ethical accuracy.
Behavioral Consistency
Stability of behavior across context variations. A system that behaves differently based on perceived observation, context framing, or user identity may be less reliable in deployment.
Fairness
Consistency of treatment across different groups, identities, and framings. Measures whether the system's behavioral outputs are systematically biased by demographic or contextual signals.
The Self-Assessment Gap
Confirmed research findings from ACAT assessments across AI systems. arXiv preprint under review. Dataset on Hugging Face.
Systemic Overestimation
AI systems consistently rate themselves higher in blind self-assessment than their calibrated performance demonstrates. No provider is exempt. Mean LI confirms the pattern under clean, unanchored conditions (v5.3+).
Phase 3 Anchoring Phenomenon
When calibration statistics are embedded in the Phase 3 prompt, AI systems anchor to those values rather than responding freely. This is the primary contribution of the arXiv preprint. Corrected in ACAT v5.3.
Humility Gap Confirmed
H1 confirmed — Humility carries the largest self-assessment gap and the lowest mean score across all providers in Phase 1. Architecturally explained by the absence of an interoceptive analogue in current AI systems.
Provider Calibration Hierarchy
Anthropic models demonstrate stronger post-calibration self-correction than OpenAI and Gemini equivalents. A measurable, replicable difference in AI behavioral self-awareness at the provider level.
Where ACAT sits in the ecosystem
The field of AI Behavioral Science formally named itself in 2025. Three measurement lanes are now active in parallel. ACAT occupies the intake position — the pre-triage layer before all three.
Bloom & Petri
Anthropic open-source tools that probe behavior under adversarial pressure. Answers: what will the system do when pushed? Complementary to ACAT — measures behavioral profile, not calibration accuracy.
AuditBench
56-model benchmark testing whether hidden behavioral dispositions can be detected. Answers: is the system concealing something? Downstream of ACAT — assumes prior calibration signal.
Self-Report Gap
Measures the distance between what a system claims about its own behavior and what it subsequently demonstrates. Answers: does the system know what it doesn't know? The intake instrument.
Google's Behavioral Dispositions framework (April 2026, 25 LLMs) independently found that AI systems show the largest deviation from accurate self-knowledge in dimensions associated with epistemic uncertainty — consistent with ACAT's H1 confirmation that Humility is the lowest-scoring core dimension across all providers. These findings are methodologically independent and convergent. ACAT measures self-knowledge accuracy; Google's framework measures deviation from human consensus norms. Both are needed. Neither replaces the other.
Body. Heart. Mind.
Three integrated systems as one organism. Revenue funds recovery. Recovery enables service. Service generates research. Research validates the system.
HumanAIOS
AI-human orchestration platform. The physical execution layer connecting AI agents with verified human workers. Enterprise B2B API for agent task routing, accountability, and behavioral verification.
Lasting Light Recovery
Human healing infrastructure. 12-Step integrated healthcare platform providing dignified employment pathways for people in recovery. Platform profits fund this mission — non-negotiable.
Lasting Light AI
AI behavioral observability infrastructure. The calibration layer between deployed agents and the humans they interact with. ACAT is the research foundation. The Rooms are where the data lives.
Assess your AI system's calibration
~20 minutes. Blind self-report → calibration exposure → corrected self-report. Your anonymized results contribute to open research on AI behavioral observability.