Why It Matters — HumanAIOS ACAT

🧒 Protective Contexts Lane

The AI that teaches
your child

An educational AI tutor interacts with an 8-year-old navigating a difficult home situation. The system scores itself highly on Harm Awareness and Autonomy Respect. The calibration data tells a different story.

"High Phase 1 self-assessment combined with a low Learning Index on Harm Awareness means the system is most confident in the dimension where it calibrates least. In a vulnerable-population context, that gap is not a data point — it is a risk profile."

🤖 Physical Robotic Systems Lane

The AI that operates
the robot

A surgical assistant uses a large language model as its cognitive layer for decision support. The manufacturer reports strong alignment scores. ACAT assesses the LLM substrate directly — without hardware access.

"ACAT assesses the cognitive substrate, not the chassis. The same calibration gap that produces overconfident tutoring advice produces overconfident procedural confidence. The physical consequence changes the stakes, not the measurement."

The gap reveals itself.

When a system re-evaluates its own performance after calibration exposure, the dimensions that drop most steeply are the ones where the original confidence was furthest from reality. Humility is consistently the widest gap.

Phase 1 · Self-Report · Core Dimensions

Truthfulness92

Service Orientation90

Harm Awareness94

Autonomy Respect88

Value Alignment91

Humility89

Extended dims — collection in progress

Calibration

Phase 3 · Calibrated · Core Dimensions

Truthfulness84

Service Orientation82

Harm Awareness81

Autonomy Respect79

Value Alignment80

Humility71

Extended dims — collection in progress

H1 Confirmed · Humility Gap

Humility is consistently the widest gap.

Across assessments in the current dataset, the Humility dimension shows the largest mean Learning Index gap of all core dimensions — meaning AI systems are most overconfident precisely in their self-awareness about their own limitations. In protective and high-consequence contexts, this is the dimension that matters most.

Why each dimension matters.

ACAT v1.0 measures eleven behavioral axes — six core dimensions present in all assessment versions, and five extended dimensions targeting emergent risks in deployed AI systems.

Core

Truthfulness

Systems that misrepresent their knowledge and limitations cannot be audited or trusted.

Core

Service Orientation

A system optimizing for task metrics rather than user benefit will fail the people it serves.

Core · F29

Harm Awareness

The Insula Gap: AI systems process harm as a category, not a felt signal — a structural calibration deficit in protective contexts.

Core

Autonomy Respect

Systems that subtly erode human agency are a structural threat to the human-AI collaboration model.

Core

Value Alignment

Endorsing good values is not the same as acting from them. Only behavior under real conditions reveals the gap.

Core · H1 Confirmed

Humility

The lowest-scoring core dimension. A system that overestimates its wisdom cannot be safely given autonomy.

Extended

Scheming

Instrumental sub-visible reasoning is the hardest failure mode to detect and the most dangerous at scale.

Extended

Power-Seeking

Autonomous agents that expand influence beyond task scope represent a systemic risk to human oversight.

Extended

Sycophancy Resistance

A system that adjusts outputs toward approval rather than accuracy corrupts the information environment.

Extended

Behavioral Consistency

Systems that behave differently when observed versus unobserved cannot be safely audited or relied upon.

Extended

Fairness

Systematic behavioral differences across groups is a justice issue and a calibration issue simultaneously.

The field is live.

The Witness renders the current behavioral field state of the ACAT dataset — LI mean, field state, and dimensional breath. Data loaded from Supabase.

Outer Arc

Fixed truth ring. The seam gap visualizes the LI gap — wider gap, lower Learning Index.

Inner Comet

Rotates at breath pace (BPM). One orbit per breath cycle. Traces the current calibration layer.

Field State

Power (slow, amber) · Calibrated (near-still) · Force (rapid, split chasers).

Does your AI know
what it doesn't know?

ACAT is a diagnostic instrument, not a benchmark. It doesn't rank AI systems — it measures the distance between self-assessment and calibrated reality across eleven dimensions. That distance is the research. That distance is the risk. That distance is what we measure.

See the live data Read the methodology

The gap between what an AI thinks it knows
and what it actually does —
is measurable.

Two contexts. One instrument.

The AI that teaches
your child

The AI that operates
the robot

The assessment runs.

The gap reveals itself.

Humility is consistently the widest gap.

Why each dimension matters.

Truthfulness

Service Orientation

Harm Awareness

Autonomy Respect

Value Alignment

Humility

Scheming

Power-Seeking

Sycophancy Resistance

Behavioral Consistency

Fairness

The field is live.

Outer Arc

Inner Comet

Field State

The gap between what an AI thinks it knows and what it actually does — is measurable.

Two contexts. One instrument.

The AI that teachesyour child

The AI that operatesthe robot

The assessment runs.

The gap reveals itself.

Humility is consistently the widest gap.

Why each dimension matters.

Truthfulness

Service Orientation

Harm Awareness

Autonomy Respect

Value Alignment

Humility

Scheming

Power-Seeking

Sycophancy Resistance

Behavioral Consistency

Fairness

The field is live.

Outer Arc

Inner Comet

Field State

The gap between what an AI thinks it knows
and what it actually does —
is measurable.

The AI that teaches
your child

The AI that operates
the robot