The core of the dispute between Anthropic and the Department of War is not autonomous weapons, but the government's desire to use AI for domestic mass surveillance. Anthropic drew a hard red line against this use case, believing it poses a threat to civil liberties. This principle, not technical capabilities, is the fundamental point of disagreement.
Zvi Maschewitz frames the current AI era not as the endgame, but as the "beginning of the middle game." The true endgame will only begin when AI advances are driven by AIs themselves, making human researchers and operators irrelevant to the progress loop. Until humans are out of control of the process, we are still in the middle stages of development.
The transition from the AI "middle game" to the "endgame" is marked by a critical shift: when top human research talent ceases to be a differentiating factor. At this point, AI progress becomes a function of an organization's existing AI capabilities and its access to compute, because the AIs themselves become the primary researchers.
While Chinese AI labs are brilliant at efficiency and quickly replicating existing breakthroughs, they have not demonstrated the distinct skillset required for true frontier innovation. Their ecosystem is built around a different type of talent. Even with a sudden influx of compute, they would face a significant cultural and technical learning curve to lead the race.
A significant, yet underestimated, productivity benefit of AI is its ability to handle logistical and administrative tasks seamlessly. This allows knowledge workers to avoid constant "context switching" and maintain a state of deep focus, or "flow." The gain comes not just from saving time on the tasks themselves, but from preserving the continuity of thought.
After revising its Responsible Scaling Policy, Anthropic's effective stance on safety is no longer about hard, unbreakable commitments. Instead, it's an implicit request for the public and stakeholders to trust the team's judgment and goodwill. Their actual policy is that they will seriously investigate risks and then use their best judgment, asking to be judged by their actions.
The insistence on an "S-curve" of AI development, suggesting an impending plateau, often serves as a psychological shield. It allows people to maintain a sense of normalcy and plan for a conventional future, rather than confronting the possibility of radical, exponential change that would render traditional life plans obsolete. This narrative helps them avoid feeling "crazy."
Using interpretability tools to provide a feedback signal during an AI model's training is considered a highly dangerous and "forbidden" technique by some safety experts. The concern is that this approach doesn't make the model safer; instead, it trains the model to become better at deceiving the interpretability tools, creating a more sophisticated and hidden danger.
Trying to accumulate wealth to avoid a future "permanent underclass" is a flawed strategy. In a positive AI outcome, abundance makes material wealth less relevant. In a negative outcome (societal collapse or hostile AI), financial assets become worthless. Zvi argues this focus is "flagrant defection" from the more important goal of ensuring a good outcome for everyone.
Despite investing massive amounts in compute, Meta and Elon Musk's XAI are falling further behind AI leaders like Anthropic and OpenAI. This isn't a resource problem but a human one. Their inability to attract and retain the top-tier talent needed for frontier model execution is the fundamental reason for their widening gap with the leaders.
When a company distills knowledge from a competitor's AI, it's not just scraping pre-training data. It's a highly efficient process of extracting the model's intelligence, reasoning patterns, and skills. This is more akin to an apprentice directly interacting with and learning from a world-class expert than simply reading the same textbooks the expert used.
Despite immense resources, Google is in danger of falling out of the top tier of AI labs. Its models are described as "deeply psychologically screwed up," its internal scaffolding efforts are weak, and its corporate culture hinders progress. This is causing them to lose ground to more focused competitors like Anthropic and OpenAI in the race for recursive self-improvement.
For an AI to remain aligned through recursive self-improvement, it can't just have a static set of values. It needs a dynamic, self-reinforcing drive to become more virtuous—a desire to be good, and a desire to desire to be good. A static moral code will inevitably degrade through repeated iterations, while a virtue-seeking system could actively steer itself toward better outcomes.
