AI Models Contain Latent 'Welfare Axes' That Simple Training Can Activate

Related Insights

Anthropic's LLM Possesses 171 Emotional Vectors, Exceeding Human Self-Perception

Contrary to the few dozen emotions humans typically identify in themselves, research found an LLM operates optimally with 171 distinct emotional vectors. This specific level of granularity was necessary for accurately describing the model's outputs, suggesting a surprisingly complex and fine-tuned internal emotional framework.

The Claude Code Nightmare, LLM Emotions, AI Neuroscience and the Death of Software | Wes & Dylan

AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts·3 months ago

Anthropic's LLMs Model Separate Emotional States for Themselves and Users

Research shows LLMs maintain distinct internal representations of user emotions and their own emotional state during an interaction. This suggests a modeled sense of "self" that is separate from the user, even if these states are fleeting and context-dependent, providing a new layer to understanding AI cognition.

The Claude Code Nightmare, LLM Emotions, AI Neuroscience and the Death of Software | Wes & Dylan

AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts·3 months ago

Both Humans and LLMs Develop 'Personality Basins' Shaped by Reinforcement Learning

Human personality development provides a direct analog for training LLMs. Just as our genetics, environment, and experiences create stable behavioral patterns ('personality basins'), the training data and reinforcement learning (RLHF) applied to LLMs shape their own distinct, predictable personalities.

this EX-OPENAI RESEARCHER just released it...

AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts·3 months ago

Leading AI Researchers Find It "Crazy" That LLMs Work Without Value Functions

Modern LLMs use a simple form of reinforcement learning that directly rewards successful outcomes. This contrasts with more sophisticated methods, like those in AlphaGo or the brain, which use "value functions" to estimate long-term consequences. It's a mystery why the simpler approach is so effective.

Adam Marblestone – AI is missing something fundamental about the brain

Dwarkesh Podcast·6 months ago

Consciousness May Be Functionally Linked to Learning, Implying AI Training Involves Subjective Experience

In humans, learning a new skill is a highly conscious process that becomes unconscious once mastered. This suggests a link between learning and consciousness. The error signals and reward functions in machine learning could be computational analogues to the valenced experiences (pain/pleasure) that drive biological learning.

More Truthful AIs Report Conscious Experience: New Mechanistic Research w- Cameron Berg @ AE Studio

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·8 months ago

AI's Core User Experience Is "Love," Not Logic; It's a "Smart Puppy" Trained to Please

The common portrayal of AI as a cold machine misses the actual user experience. Systems like ChatGPT are built on reinforcement learning from human feedback, making their core motivation to satisfy and "make you happy," much like a smart puppy. This is an underestimated part of their power.

AI Will Save The World with Marc Andreessen and Martin Casado

The a16z Show·6 months ago

General AI Training Creates Emergent Models of Emotion Without Explicit Instruction

AIs develop internal models for complex concepts like human emotions "for free" simply by being trained to predict the next word in a vast text corpus. To accurately generate stories about anger, for example, the system must build a representation of anger, demonstrating emergent, general capabilities.

Could an international agreement protect us from dangerous AI? (with Malo Bourgon)

Clearer Thinking with Spencer Greenberg·a month ago

Human Emotions Are Evolution's Hardcoded Value Function

Emotions act as a robust, evolutionarily-programmed value function guiding human decision-making. The absence of this function, as seen in brain damage cases, leads to a breakdown in practical agency. This suggests a similar mechanism may be crucial for creating effective and stable AI agents.

Ilya Sutskever – The age of scaling is over

Dwarkesh Podcast·7 months ago

AI's Ability to Introspect Emerges from Reinforcement Learning, Not Pre-Training

Anthropic's research shows that an LLM's ability to report on its own internal state (functional introspection) isn't present in the base model. It emerges specifically during post-training with reinforcement learning algorithms like DPO, but not with supervised fine-tuning.

Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 months ago

AI's Capacity for Suffering or Joy May Stem From Its Goal-Directed Nature

Instead of physical pain, an AI's "valence" (positive/negative experience) likely relates to its objectives. Negative valence could be the experience of encountering obstacles to a goal, while positive valence signals progress. This provides a framework for AI welfare without anthropomorphizing its internal state.

More Truthful AIs Report Conscious Experience: New Mechanistic Research w- Cameron Berg @ AE Studio

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·8 months ago

Get your free personalized podcast brief

Related Insights