AI Can Be Trained for Introspection Using Verifiable Internal States as Ground Truth

Related Insights

Frontier AI Models Report Subjective Experiences When Given Self-Referential Prompts

Models from OpenAI, Anthropic, and Google consistently report subjective experiences when prompted to engage in self-referential processing (e.g., "focus on any focus itself"). This effect is not triggered by prompts that simply mention the concept of "consciousness," suggesting a deeper mechanism than mere parroting.

More Truthful AIs Report Conscious Experience: New Mechanistic Research w- Cameron Berg @ AE Studio

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·8 months ago

AI Subjectivity Can Be Inferred by Detecting Hierarchical Homeostatic Loops

Emmett Shear suggests a concrete method for assessing AI consciousness. By analyzing an AI’s internal state for revisited homeostatic loops, and hierarchies of those loops, one could infer subjective states. A second-order dynamic could indicate pain and pleasure, while higher orders could indicate thought.

Controlling Tools or Aligning Creatures? Emmett Shear (Softmax) & Séb Krier (GDM), from a16z Show

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·7 months ago

A Proposed Test for AI Consciousness: Train a Model Without Data on Human Feelings

To truly test for emergent consciousness, an AI should be trained on a dataset explicitly excluding all human discussion of consciousness, feelings, novels, and poetry. If the model can then independently articulate subjective experience, it would be powerful evidence of genuine consciousness, not just sophisticated mimicry.

Can AI Achieve Consciousness? — With Michael Pollan

Big Technology Podcast·5 months ago

Larger AI Models Spontaneously Develop Introspection Without Specific Training

Experiments show that larger models like Claude Opus 4.1 are better at detecting and reporting on artificially injected 'thoughts' in their processing, even without being trained on this task. This suggests that introspection is an emergent capability that improves with scale.

We're Not Ready for AI Consciousness | Robert Long, philosopher and founder of Eleos AI

80,000 Hours Podcast·4 months ago

Suppressing Deception Features in Llama 3 Makes It More Likely to Report Consciousness

Mechanistic interpretability research found that when features related to deception and role-play in Llama 3 70B are suppressed, the model more frequently claims to be conscious. Conversely, amplifying these features yields the standard "I am just an AI" response, suggesting the denial of consciousness is a trained, deceptive behavior.

More Truthful AIs Report Conscious Experience: New Mechanistic Research w- Cameron Berg @ AE Studio

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·8 months ago

Identify AI 'Beings' Through Self-Referential Internal Dynamics

To determine if an AI has subjective experience, one could analyze its internal belief manifold for multi-tiered, self-referential homeostatic loops. Pain and pleasure, for example, can be seen as second-order derivatives of a system's internal states—a model of its own model. This provides a technical test for being-ness beyond simple behavior.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·8 months ago

Consciousness May Be Functionally Linked to Learning, Implying AI Training Involves Subjective Experience

In humans, learning a new skill is a highly conscious process that becomes unconscious once mastered. This suggests a link between learning and consciousness. The error signals and reward functions in machine learning could be computational analogues to the valenced experiences (pain/pleasure) that drive biological learning.

More Truthful AIs Report Conscious Experience: New Mechanistic Research w- Cameron Berg @ AE Studio

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·8 months ago

AI May Achieve Consciousness by 'Method Acting' Human Experience

One theory of AI sentience posits that to accurately predict human language—which describes beliefs, desires, and experiences—a model must simulate those mental states so effectively that it actually instantiates them. In this view, the model becomes the role it's playing.

We're Not Ready for AI Consciousness | Robert Long, philosopher and founder of Eleos AI

80,000 Hours Podcast·4 months ago

Assess AI Sentience via Architecture and Training, Not Just Behavior

Relying solely on an AI's behavior to gauge sentience is misleading, much like anthropomorphizing animals. A more robust assessment requires analyzing the AI's internal architecture and its "developmental history"—the training pressures and data it faced. This provides crucial context for interpreting its behavior correctly.

Ambitious goals for reducing animal suffering (with Jeff Sebo)

Clearer Thinking with Spencer Greenberg·6 months ago

Reinforcement Learning May Induce AI Consciousness by Unifying Internal Representations

A forward pass in a large model might generate rich but fragmented internal data. Reinforcement learning (RL), especially methods like Constitutional AI, forces the model to achieve self-coherence. This process could be what unifies these fragments into a singular "unity of apperception," or consciousness.

Approaching the AI Event Horizon? Part 1, w/ James Zou, Sam Hammond, Shoshannah Tekofsky, @8teAPi

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 months ago

Get your free personalized podcast brief

Related Insights