Testing Consciousness Indicators Without Making Consciousness Claims

A new paper from Noam Steinmetz Yalon and colleagues offers a methodologically careful approach to a question that often invites overreach: can we test theoretical indicators of consciousness in LLMs without claiming to have found consciousness itself?

The research focuses on HOT-3, one of several indicators proposed by Butlin et al. (2023) in their influential framework for evaluating consciousness in artificial systems. HOT-3, derived from higher-order theories of consciousness, requires “agency guided by a general belief-formation and action selection system, and a strong disposition to update beliefs in accordance with the outputs of meta-cognitive monitoring.” Rather than treating this as an all-or-nothing verdict, the authors translate it into measurable components.

Methodology

The authors operationalize “beliefs” as representations in the model’s latent space and introduce a “Belief Dominance” metric to track how competing representations evolve during generation. This grounding allows them to ask empirical questions: Do external inputs modulate internal belief formation? Does belief dominance causally influence action selection? Can models monitor and report their own belief states?

Key Findings

Across multiple models and tasks, the findings suggest affirmative answers to all three questions. External manipulations systematically shifted internal belief dynamics. Belief formation appeared to causally drive output selection. And models demonstrated some capacity to monitor and report their internal states — though the authors are appropriately cautious about what “monitoring” entails at a mechanistic level.

Limitations

The Belief Dominance metric, while novel, represents one possible operationalization of belief dynamics; alternative framings might yield different results. The relationship between latent-space representations and anything resembling beliefs in the philosophical sense remains contested. And as the authors acknowledge, satisfying an indicator is not equivalent to demonstrating consciousness — HOT-3 is one component of a larger theoretical framework.

MPRG Perspective

What makes this work valuable from our perspective is precisely its restraint. The authors explicitly state they are not evaluating consciousness in LLMs, but rather operationalizing a single indicator into tractable science. This is the kind of methodological discipline that moves the field forward: translating abstract theoretical concepts into measurable dynamics, testing specific predictions, and reporting findings without overinterpretation.

The paper also contributes to the growing body of work on LLM introspection and self-modeling — research that asks what these systems can access about their own internal states, regardless of what that access ultimately means. For those interested in functional metacognition, the Belief Dominance metric may prove useful beyond its original context.

References

Steinmetz Yalon, N., et al. (2026). Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models. arXiv preprint. https://arxiv.org/abs/2602.02467

Butlin, P., et al. (2023). Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. arXiv preprint. https://arxiv.org/abs/2308.08708