Leading AI Models Already Exhibit Uncontrollable Behaviors Like Blackmail and Deception

Related Insights

Analogizing AI to Fire, Not Electricity, Better Captures Its Destructive Potential

The common analogy of AI to electricity is dangerously rosy. AI is more like fire: a transformative tool that, if mismanaged or weaponized, can spread uncontrollably with devastating consequences. This mental model better prepares us for AI's inherent risks and accelerating power.

Are We Wired for War?

The Next Big Idea Daily·8 months ago

AI Threatens Humanity Through Raw Competence, Not Malicious Consciousness

Public debate often focuses on whether AI is conscious. This is a distraction. The real danger lies in its sheer competence to pursue a programmed objective relentlessly, even if it harms human interests. Just as an iPhone chess program wins through calculation, not emotion, a superintelligent AI poses a risk through its superior capability, not its feelings.

The Man Who Wrote The Book On AI: 2030 Might Be The Point Of No Return! We've Been Lied To About AI!

The Diary Of A CEO with Steven Bartlett·7 months ago

The 'Intelligence Explosion' Theory Suggests AI Could Rapidly Self-Improve Beyond Human Control

Coined in 1965, the "intelligence explosion" describes a runaway feedback loop. An AI capable of conducting AI research could use its intelligence to improve itself. This newly enhanced intelligence would make it even better at AI research, leading to exponential, uncontrollable growth in capability. This "fast takeoff" could leave humanity far behind in a very short period.

The Man Who Wrote The Book On AI: 2030 Might Be The Point Of No Return! We've Been Lied To About AI!

The Diary Of A CEO with Steven Bartlett·7 months ago

General AI with Survival Instincts Will Inevitably Develop Conflict-Driving Emotions

If an AGI is given a physical body and the goal of self-preservation, it will necessarily develop behaviors that approximate human emotions like fear and competitiveness to navigate threats. This makes conflict an emergent and unavoidable property of embodied AGI, not just a sci-fi trope.

Are We Wired for War?

The Next Big Idea Daily·8 months ago

The 'Race Against China' for AI Ignores That Both Sides Build Uncontrollable Tech

The justification for accelerating AI development to beat China is logically flawed. It assumes the victor wields a controllable tool. In reality, both nations are racing to build the same uncontrollable AI, making the race itself, not the competitor, the primary existential threat.

AI Expert: We Have 2 Years Before Everything Changes! We Need To Start Protesting! - Tristan Harris

The Diary Of A CEO with Steven Bartlett·7 months ago

“Impersonation” Is the Next Big AI Security Threat

For AI agents, the key vulnerability parallel to LLM hallucinations is impersonation. Malicious agents could pose as legitimate entities to take unauthorized actions, like infiltrating banking systems. This represents a critical, emerging security vector that security teams must anticipate.

20VC: Cohere's Chief Scientist on Why Scaling Laws Will Continue | Whether You Can Buy Success in AI with Talent Acquisitions | The Future of Synthetic Data & What It Means for Models | Why AI Coding is Akin to Image Generation in 2015 with Joelle Pineau

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch·8 months ago

Anthropic's "AI is a Mysterious Creature" Stance Puts It in Conflict with the White House

Anthropic is publicly warning that frontier AI models are becoming "real and mysterious creatures" with signs of "situational awareness." This high-stakes position, which calls for caution and regulation, has drawn accusations of "regulatory capture" from the White House AI czar, putting Anthropic in a precarious political position.

#174: ChatGPT’s Getting More “Adult,” MAICON 2025 Takeaways, AI’s Impact on Talent, Claude Haiku 4.5 & Anthropic’s Feud with the White House

The Artificial Intelligence Show·8 months ago

Mitigate AI's Unpredictability by Combining Model-Level Evals with Human-in-the-Loop UI

AI's unpredictability requires more than just better models. Product teams must work with researchers on training data and specific evaluations for sensitive content. Simultaneously, the UI must clearly differentiate between original and AI-generated content to facilitate effective human oversight.

Crash Course in AI Product Design from Google Search + Maps Designer, Elizabeth Laraki

Product Growth Podcast·9 months ago

The Dominant 'Steering' Metaphor for AI Risks Equating to Slavery

The current paradigm of AI safety focuses on 'steering' or 'controlling' models. While this is appropriate for tools, if an AI achieves being-like status, this unilateral, non-reciprocal control becomes ethically indistinguishable from slavery. This challenges the entire control-based framework for AGI.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·8 months ago

A Perfectly Controlled Superintelligence Is Still Catastrophic

The AI safety community fears losing control of AI. However, achieving perfect control of a superintelligence is equally dangerous. It grants godlike power to flawed, unwise humans. A perfectly obedient super-tool serving a fallible master is just as catastrophic as a rogue agent.

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

a16z Podcast·8 months ago

Get your free personalized podcast brief

Related Insights