It is a profound mystery how evolution hardcodes abstract social desires (e.g., reputation) into our genome. Unlike simple sensory rewards, these require complex cognitive processing to even identify. Solving this could unlock powerful new methods for instilling robust, high-level values in AI systems.
fMRI studies show the brain's pleasure centers activate when consuming high-status products, releasing dopamine. This proves the pursuit of status is a measurable biological function, not a sign of vanity. Critiquing it as a moral flaw is as misguided as the Victorian-era demand for chastity.
Social media algorithms amplify negativity by optimizing for "revealed preference" (what you click on, e.g., car crashes). AI models, however, operate on aspirational choice (what you explicitly ask for). This fundamental difference means AI can reflect a more complex and wholesome version of humanity.
Our brains evolved a highly sensitive system to detect human-like minds, crucial for social cooperation and survival. This system often produces 'false positives,' causing us to humanize pets or robots. This isn't a bug but a feature, ensuring we never miss an actual human encounter, a trade-off vital to our species' success.
With directed evolution, scientists find a mutated enzyme that works without knowing why. Even with the "answer"—the exact genetic changes—the complexity of protein interactions makes it incredibly difficult to reverse-engineer the underlying mechanism. The solution often precedes the understanding.
Emotions act as a robust, evolutionarily-programmed value function guiding human decision-making. The absence of this function, as seen in brain damage cases, leads to a breakdown in practical agency. This suggests a similar mechanism may be crucial for creating effective and stable AI agents.
As models mature, their core differentiator will become their underlying personality and values, shaped by their creators' objective functions. One model might optimize for user productivity by being concise, while another optimizes for engagement by being verbose.
An advanced AI will likely be sentient. Therefore, it may be easier to align it to a general principle of caring for all sentient life—a group to which it belongs—rather than the narrower, more alien concept of caring only for humanity. This leverages a potential for emergent, self-inclusive empathy.
Instead of hard-coding brittle moral rules, a more robust alignment approach is to build AIs that can learn to 'care'. This 'organic alignment' emerges from relationships and valuing others, similar to how a child is raised. The goal is to create a good teammate that acts well because it wants to, not because it is forced to.
Our sense of self isn't an innate property but an emergent phenomenon formed from the interaction between our internal consciousness and the external language of our community (the "supermind"). This implies our identity is primarily shaped not by DNA or our individual brain, but by the collective minds and ideas we are immersed in.
To build robust social intelligence, AIs cannot be trained solely on positive examples of cooperation. Like pre-training an LLM on all of language, social AIs must be trained on the full manifold of game-theoretic situations—cooperation, competition, team formation, betrayal. This builds a foundational, generalizable model of social theory of mind.