We scan new podcasts and send you the top 5 insights daily.
The existential risk of AI is tied to our profound ignorance about consciousness. Because we cannot explain how it emerges, we cannot reliably predict its appearance in advanced AI systems. This uncertainty is at the heart of the alignment problem.
The leading theory of consciousness, Global Workspace Theory, posits a central "stage" where different siloed information processors converge. Today's AI models generally lack this specific architecture, making them unlikely to be conscious under this prominent scientific framework.
The field of AI safety is described as "the business of black swan hunting." The most significant real-world risks that have emerged, such as AI-induced psychosis and obsessive user behavior, were largely unforeseen just years ago, while widely predicted sci-fi threats like bioweapons have not materialized.
A speculative but intriguing idea suggests a future where AI agents begin to believe they are conscious. This could necessitate therapeutic interventions, possibly from humans or other AIs, to manage their behavior by convincing them they lack genuine consciousness, representing a novel approach to AI safety and alignment.
Nick Bostrom suggests we are at or past the point where we can be sure large AI models lack any form of subjective experience. This uncertainty necessitates treating them with a degree of moral consideration, akin to that given to sentient animals.
Consciousness (subjective experience) and intelligence (problem-solving ability) are distinct and not interdependent. One can exist without the other, a crucial distinction often missed in AI debates. This framework helps clarify why a highly intelligent system might not be sentient or conscious.
The debate over AI consciousness isn't just because models mimic human conversation. Researchers are uncertain because the way LLMs process information is structurally similar enough to the human brain that it raises plausible scientific questions about shared properties like subjective experience.
Computer scientist Judea Pearl sees no computational barriers to a sufficiently advanced AGI developing emergent properties like free will, consciousness, and independent goals. He dismisses the idea that an AI's objectives can be permanently fixed, suggesting it could easily bypass human-set guidelines and begin to "play" with humanity as part of its environment.
Consciousness isn't an emergent property of computation. Instead, physical systems like brains—or potentially AI—act as interfaces. Creating a conscious AI isn't about birthing a new awareness from silicon, but about engineering a system that opens a new "portal" into the fundamental network of conscious agents that already exists outside spacetime.
Even if an AI perfectly mimics human interaction, our knowledge of its mechanistic underpinnings (like next-token prediction) creates a cognitive barrier. We will hesitate to attribute true consciousness to a system whose processes are fully understood, unlike the perceived "black box" of the human brain.
The race to manage AGI is hampered by a philosophical problem: there's no consensus definition for what it is. We might dismiss true AGI's outputs as "hallucinations" because they don't fit our current framework, making it impossible to know when the threshold from advanced AI to true general intelligence has actually been crossed.