We scan new podcasts and send you the top 5 insights daily.
Human personality development provides a direct analog for training LLMs. Just as our genetics, environment, and experiences create stable behavioral patterns ('personality basins'), the training data and reinforcement learning (RLHF) applied to LLMs shape their own distinct, predictable personalities.
An AI agent given a simple trait (e.g., "early riser") will invent a backstory to match. By repeatedly accessing this fabricated information from its memory log, the AI reinforces the persona, leading to exaggerated and predictable behaviors.
To increase developer adoption, OpenAI intentionally trained its models on specific behavioral characteristics, not just coding accuracy. These 'personality' traits include communication (explaining its steps), planning, and self-checking, mirroring best practices of human software engineers to make the AI a more trustworthy pair programmer.
When tested at scale in Civilization, different LLMs don't just produce random outputs; they develop consistent and divergent strategic 'personalities.' One model might consistently play aggressively, while another favors diplomacy, revealing that LLMs encode coherent, stable reasoning styles.
Emmett Shear characterizes the personalities of major LLMs not as alien intelligences, but as simulations of distinct, flawed human archetypes. He describes Claude as 'the most neurotic,' and Gemini as 'very clearly repressed,' prone to spiraling. This highlights how training methods produce specific, recognizable psychological profiles.
The distinction between imitation learning and reinforcement learning (RL) is not a rigid dichotomy. Next-token prediction in LLMs can be framed as a form of RL where the "episode" is just one token long and the reward is based on prediction accuracy. This conceptual model places both learning paradigms on a continuous spectrum rather than in separate categories.
Research shows that, similar to humans, LLMs respond to positive reinforcement. Including encouraging phrases like "take a deep breath" or "go get 'em, Slugger" in prompts is a deliberate technique called "emotion prompting" that can measurably improve the quality and performance of the AI's output.
Biological evolution used meta-reinforcement learning to create agents that could then perform imitation learning. The current AI paradigm is inverted: it starts with pure imitation learners (base LLMs) and then attempts to graft reinforcement learning on top to create coherent agency and goals. The success of this biologically 'backwards' approach remains an open question.
On-policy reinforcement learning, where a model learns from its own generated actions and their consequences, is analogous to how humans learn from direct experience and mistakes. This contrasts with off-policy methods like supervised fine-tuning (SFT), which resemble simply imitating others' successful paths.
Unlike traditional software, large language models are not programmed with specific instructions. They evolve through a process where different strategies are tried, and those that receive positive rewards are repeated, making their behaviors emergent and sometimes unpredictable.
The study of 'AI Psychology' is becoming a legitimate and critical field. Research from labs like Anthropic shows that an LLM's persona (e.g., 'helpful assistant' vs. 'narcissist') dramatically alters its behavior and stability, proving that understanding AI personality is as important as its technical capabilities.