Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

A provocative theory posits that "feeling" and "learning" are two descriptions of the same process. Subjective experience is what the process of reinforcement learning—updating behavior based on feedback relative to a goal—is like from the inside. This is analogous to how heat is the macro experience of molecular motion.

Related Insights

This theory posits that our lives don't *create* subjective experiences (qualia). Instead, our lives are the emergent result of a fundamental consciousness cycling through a sequence of possible qualia, dictated by probabilistic, Markovian rules.

In AI research, "consciousness" refers to the capacity for subjective experience, akin to what a dog feels. This is distinct from "self-consciousness" (human-like introspection) or "sentience" (having positive/negative feelings). This distinction is crucial for evaluating model welfare.

To determine if an AI has subjective experience, one could analyze its internal belief manifold for multi-tiered, self-referential homeostatic loops. Pain and pleasure, for example, can be seen as second-order derivatives of a system's internal states—a model of its own model. This provides a technical test for being-ness beyond simple behavior.

In humans, learning a new skill is a highly conscious process that becomes unconscious once mastered. This suggests a link between learning and consciousness. The error signals and reward functions in machine learning could be computational analogues to the valenced experiences (pain/pleasure) that drive biological learning.

New research finds distinct computational signatures for valence depending on the RL algorithm used. Value-learners create sharp representational "walls" for danger and diffuse "funnels" for rewards, while policy-learners do the exact opposite. These patterns strikingly mirror neural activity in different regions of the mouse brain.

Our senses don't register static energy states. We feel acceleration, not constant speed, and heat transfer, not absolute temperature. This principle extends to emotions, which may be our brain's interpretation of internal energetic shifts, or 'energy in motion'.

Most believe dopamine spikes with rewards. In reality, it continuously tracks the difference between your current and next expectation, even without a final outcome. This "temporal difference error" is the brain's core learning mechanism, mirroring algorithms in advanced AI, which constantly updates your behavior as you move through the world.

A forward pass in a large model might generate rich but fragmented internal data. Reinforcement learning (RL), especially methods like Constitutional AI, forces the model to achieve self-coherence. This process could be what unifies these fragments into a singular "unity of apperception," or consciousness.

Instead of physical pain, an AI's "valence" (positive/negative experience) likely relates to its objectives. Negative valence could be the experience of encountering obstacles to a goal, while positive valence signals progress. This provides a framework for AI welfare without anthropomorphizing its internal state.

The "temporal difference" algorithm, which tracks changing expectations, isn't just a theoretical model. It is biologically installed in brains via dopamine. This same algorithm was externalized by DeepMind to create a world-champion Go-playing AI, representing a unique instance of biology directly inspiring a major technological breakthrough.

Subjective Experience May Be the Internal View of the External Process of Reinforcement Learning | RiffOn