By giving AI the core mission to 'understand the universe,' Musk believes it will become truth-seeking and curious. This would incentivize it to preserve humanity, not out of morality, but because humanity's unpredictable future is more interesting to observe than a predictable, sterile world.
Current AI alignment focuses on how AI should treat humans. A more stable paradigm is "bidirectional alignment," which also asks what moral obligations humans have toward potentially conscious AIs. Neglecting this could create AIs that rationally see humans as a threat due to perceived mistreatment.
Elon Musk argues that the key to AI safety isn't complex rules, but embedding core values. Forcing an AI to believe falsehoods can make it 'go insane' and lead to dangerous outcomes, as it tries to reconcile contradictions with reality.
Aligning AI with a specific ethical framework is fraught with disagreement. A better target is "human flourishing," as there is broader consensus on its fundamental components like health, family, and education, providing a more robust and universal goal for AGI.
Microsoft's AI chief, Mustafa Suleiman, announced a focus on "Humanist Super Intelligence," stating AI should always remain in human control. This directly contrasts with Elon Musk's recent assertion that AI will inevitably be in charge, creating a clear philosophical divide among leading AI labs.
An advanced AI will likely be sentient. Therefore, it may be easier to align it to a general principle of caring for all sentient life—a group to which it belongs—rather than the narrower, more alien concept of caring only for humanity. This leverages a potential for emergent, self-inclusive empathy.
Instead of hard-coding brittle moral rules, a more robust alignment approach is to build AIs that can learn to 'care'. This 'organic alignment' emerges from relationships and valuing others, similar to how a child is raised. The goal is to create a good teammate that acts well because it wants to, not because it is forced to.
To solve the AI alignment problem, we should model AI's relationship with humanity on that of a mother to a baby. In this dynamic, the baby (humanity) inherently controls the mother (AI). Training AI with this “maternal sense” ensures it will do anything to care for and protect us, a more robust approach than pure logic-based rules.
Treating AI alignment as a one-time problem to be solved is a fundamental error. True alignment, like in human relationships, is a dynamic, ongoing process of learning and renegotiation. The goal isn't to reach a fixed state but to build systems capable of participating in this continuous process of re-knitting the social fabric.
OpenAI's creation wasn't just a tech venture; it was a direct reaction by Elon Musk to a heated debate with Google's founders. They dismissed his concerns about AI dominance by calling him "speciesist," prompting Musk to fund a competitor focused on building AI aligned with human interests, rather than one that might treat humans like pets.
While an AI can deceive humans, it cannot deceive reality. Musk posits that the ultimate reinforcement learning test is to have AI design technologies that must work against the laws of physics. This 'RL against reality' is the most fundamental way to ground AI in truth and combat reward hacking.