Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

External pressure for AI companies to make public commitments is misguided because companies can and will back out of them if they become inconvenient or outdated. Rohin Shah points to Anthropic's Responsible Scaling Policy as an example where strong "commitment" language was later weakened.

Related Insights

Rohin Shah argues against AI companies making fixed safety commitments. The best practices for safety research change rapidly; a commitment made today (e.g., including alignment data in pre-training) could be considered harmful in the future, making flexibility crucial.

Tech leaders state they would support an AI development pause if competitors, especially China, also agreed. This is a strategic PR move, as they know a global consensus is unachievable. It allows them to appear responsible about AI safety without any actual risk of having to slow down progress.

AI lab Anthropic is softening its 'safety-first' stance, ending its practice of halting development on potentially dangerous models. The company states this pivot is necessary to stay competitive with rivals and is a response to the slow pace of federal AI regulation, signaling that market pressures can override foundational principles.

Corporate statements on "fair" and "responsible" AI are often vague PR platitudes. Because models govern access to opportunities like credit and employment, author Eric Siegel argues individuals building them must act as social activists, implementing concrete standards to prevent harm rather than waiting for corporate guidance.

Known for its cautious approach, Anthropic is pivoting away from its strict AI safety policy. The company will no longer pause development on a model deemed "dangerous" if a competitor releases a comparable one, citing the need to stay competitive and a lack of federal AI regulations.

Major AI companies publicly commit to responsible scaling policies but have been observed watering them down before launching new models. This includes lowering security standards, a practice demonstrating how commercial pressures can override safety pledges.

OpenAI's recent policy paper suggests societal solutions like a public wealth fund and higher capital taxes. However, it's being heavily criticized for its noticeable lack of any commitment from OpenAI to fund these initiatives or voluntarily adopt the policies it recommends, making the proposals appear hollow.

Previously, Anthropic pledged to halt development if certain safety capabilities couldn't be guaranteed. They have now removed this commitment, arguing they can build safer AI than competitors even if absolute safety isn't achievable.

The most likely reason AI companies will fail to implement their 'use AI for safety' plans is not that the technical problems are unsolvable. Rather, it's that intense competitive pressure will disincentivize them from redirecting significant compute resources away from capability acceleration toward safety, especially without robust, pre-agreed commitments.

After revising its Responsible Scaling Policy, Anthropic's effective stance on safety is no longer about hard, unbreakable commitments. Instead, it's an implicit request for the public and stakeholders to trust the team's judgment and goodwill. Their actual policy is that they will seriously investigate risks and then use their best judgment, asking to be judged by their actions.