The Anthropic AI danger chickens come home to roost

This is what Gemini came up with when I gave it the phrase "AI danger chickens" LOL

In a previous edition of this newsletter, I wrote about the launch of Anthropic's newest AI model, code-named Mythos, and how the company said that it was too powerful to be trusted — mostly because of its ability to detect and potentially exploit software vulnerabilities — and therefore would only be available to a select few companies for testing as part of something called Project Glasswing. At the time, I and others drew an analogy between Anthropic's repeated claims about the dangers posed by its AI models and the classic fable about "The Boy Who Cried Wolf" (coincidentally, the latest update to Mythos is code-named Fable), because its claims are seen by some as primarily marketing. Well, regardless of the truth of those claims, based on recent events it appears that the townsfolk have created a Wolf Detection Department, and the full might of the Wolf Protection Force is being brought to bear on the boy and his company.

In his Understanding AI newsletter, Tim Lee put together a good overview of what happened over the past few days. Anthropic, he says, "stunned the AI world by announcing it was revoking access to Claude Fable 5 and Mythos 5, the powerful new models it released just three days earlier. The government, Anthropic said, had issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States. Because Anthropic doesn’t have a way to limit access to Americans, this amounted to a de facto ban." According to multiple news reports, researchers working for Amazon found it was possible to bypass Fable’s guardrails and gain access to its cybersecurity capabilities. Anthropic CEO Dario Amodei, however, argued in a blog post that the type of bypass that occurred does not pose the same risk as a broader jailbreak, and therefore a ban is unwarranted.

Whether it is warranted or not, however, is unfortunately (or fortunately, depending on your point of view) up to the government to decide, not Anthropic. The company may believe that it is the only one capable of managing or harnessing a tool like AI — and it may even be right — but that is not going to stop the government of Donald J. Trump from doing whatever the hell it wants, even if it doesn't really know what it is doing or why. On top of that, as AI researcher and former White House advisor Dean Ball noted in a recent newsletter post entitled "Leviathan Waking," there is a very real sense that Anthropic is either being naive or foolhardy in the way it went about releasing Mythos, since that release came so soon after the company was declared a supply-chain risk for not playing ball by allowing the Department of War to use its AI to target weapons (which I wrote about here). Ball described it in this way:

In D.C., Anthropic’s rapid release of Mythos after the supply-chain risk controversy with the Department of War was not just seen as another step in the development of AI, even if that is what it was. It was seen by many as a move against the United States Government—a private company, developing a weapon, as a move against the government. What else, really, could one have expected? The stark reality is that making superintelligence is a profoundly political act even in the healthiest of societies, to say nothing of the filthily political world we Americans currently inhabit.

Ball compared the current state of AI to what it might be like if there were several large pharmaceutical companies, but the Food and Drug Administration didn't exist. Instead, companies would just decide when to release certain drugs, and if one became aware of a drug that cured cancer but also killed people, it would just release it to certain pre-approved patients and tell the government later. In this alternate universe, the government creates a voluntary "show your work" program, but the company decides not to do this, so the government bans the drug. In a matter of weeks, Ball says, the US would go from "a system that was implausibly laissez-faire for the level of risk involved in this industry, to a system that was, in the eyes of essentially all expert onlookers, incomprehensibly strict." In the end, he says, "no company gets to shake the foundation of state sovereignty while staying blithely above the raw reality of politics."

Note: In case you are a first-time reader, or you forgot that you signed up for this newsletter, this is The Torment Nexus. Thanks for reading! You can find out more about me and this newsletter in this post. This newsletter survives solely on your contributions, so please sign up for a paying subscription or visit my Patreon, which you can find here. I also publish a daily email newsletter of odd or interesting links called When The Going Gets Weird, which is here.

Start your engines

Nathan Lambert, an AI researcher who has worked for both Meta and Google, wrote that the White House forcing Anthropic to turn off access to its new model — not just externally but internally — is "the starting gun of a new era in AI governance," in which such governance challenges will only become more likely. However, Lambert also says that there is "some amount of Anthropic reaping what it has sowed." The company's fear-mongering about the power of its models over the past few years has accelerated the arrival of this new era, he says. But the era would have come anyway, and the political attacks on Anthropic undermine the American system. "It points to a near-term world where model releases are judged on vibes by an executive branch with minimal technical talent," based on pure political expediency. Also, Lambert says, this is a government that took office when we were "in the ChatGPT era of AI governance — models that just answer questions." Concerns about safety were far away.

Even Gary Marcus, a cognitive scientist who is a prominent critics of Anthropic and its claims about the power of its AI models, said about the White House decision that it was a classic example of America's current "shambolic AI policy," including the Commerce Department's "cut-off-your-nose-to-spite-your-face export control order." Marcus says that the order took the entire industry by surprise because virtually everyone, whether left or right, or pro- or anti-tech, thinks it is terrible. "It was both bad policy and bad politics." Much like Lambert, Marcus adds that Anthropic "brought this on themselves to a considerable degree, by (a) overselling doom [and] literally and repeatedly calling for export controls themselves." That said, however, Marcus notes that "no industry can thrive if the government appears arbitrary and capricious, as they did last night, (effectively) abruptly shutting down an entire thriving industry."

So what happens now? Marcus says what we need is "something I have not seen of the Trump administration: carefully considered nuance. Shoot first and ask questions later is not what we need right now." According to Marcus, banning a model because two researchers said it could be jailbroken in a technical way is naive, since almost every LLM-based AI system can be jailbroken. What happened "was not new and likely not specific to Mythos or Fable and in any case should not have been addressed in such a hasty, arbitrary way." And it doesn't help mitigate criticism of the government's move, he added, that OpenAI's president, Greg Brockman, is a huge Trump donor, and Trump advisor Jared Kushner's brother is a big investor in OpenAI. Amazon, which reportedly found the vulnerability and told the government, is also a big investor in OpenAI.

SE Gyges, an AI researcher who writes a newsletter under that pseudonym, agrees that Anthropic brought much of the White House's wrath upon itself. A few days before the export ban, he notes, Anthropic CEO Dario Amodei published a post saying the government "should have the power to block or deter deployment of the model" if it is determined to present unacceptable risks. "So yes, Dario asked for this," says Gyges. "They got what they asked for." And not only did they ask for this, he says, "they asked for this for years, and they could not be talked out of asking for it." When people said that it might be bad to empower the government in this way, according to Gyges, Anthropic was not sympathetic, and the company "has not changed their position even slightly as the government has changed hands and has mostly gone insane." Gyges said he feels a certain amount of schadenfreude that "the leopard ate their face first" (which is a reference to this popular meme, in case you are not all terminally online).

Unlike the ruling earlier this year that labelled Anthropic a supply-chain risk, which Gyges called an "overtly political, ham-handed power grab" by the Secretary of War, the latest decision had none of the same clumsy political posturing (like calling Anthropic too 'woke,' as Trump did in a tweet) and was based on an existing and well-established power the government has, which is the export-control order. And on top of all this, the writer notes that there is an "unimpeachable record" of public statements from Anthropic and Amodei that characterize Mythos as a potential cybersecurity risk and/or a bioweapon, and talk about how it should be government regulated. They have been doing this for "so long and so openly that it seems like people have gotten used to it," Gyges says, "and no longer notice how weird and legally questionable it seems to be."

Anthropic's savior complex

Ben Thompson of Stratechery raised a question that I confess has also been bugging me as well: if Anthropic is so convinced that Mythos and its offspring are dangerous, then why did it release them in the first place? And why is it fighting with the government after saying that it wanted regulation? Thompson notes that Anthropic said at launch that its model's performance would degrade if another AI company tried to use it for LLM development. "I actually don’t begrudge Anthropic not wanting to help its competitors," Thompson writes, but "what should be blisteringly clear is that Anthropic does not think that anyone else other than them should even be making frontier LLMs." The Anthropic origin story is rooted in the belief that OpenAI wasn’t taking safety seriously enough, and so the company believes it alone can handle AI, and that "they are justified in trying to control everyone else, up to and including the U.S. government."

Cybersecurity experts who have looked into the alleged vulnerability or jailbreaking possibilities in Anthropic's model say the reality of what happened should probably not have triggered the kind of nuclear response the White House came up with. Katie Moussouris, a cybersecurity veteran, said in a blog post that Anthropic shared with her a private copy of a paper written by security researchers describing an alleged "guardrail bypass," and the process and result that they described should never have triggered an export control order. Moussouris and a number of other security researchers have called on the White House to revoke the export control order, calling the move to ban the advanced cybersecurity capabilities possible with Mythos “dangerous.”

Even if the bypass didn't justify the export order, MG Siegler, a former partner at Google Ventures, made a similar point to Dean Ball: namely, that when you are a company with the kind of size and power that Anthropic has, you can't take the same kind of approach as when you were a plucky little startup. "There's a growing sense that it's increasingly Anthropic's way or the highway," he writes. "And that's perhaps okay when you're the scrappy underdog. But when you move into the position of power as the de-facto leader in AI, that's a problem. Potentially a big one." That's not to say the company doesn't have a point when it comes to the power of its model, or the desire to regulate the release of that model itself. But unfortunately, Siegler says, the company "does not operate in a perfect simulation, but rather in the real world. The very messy, very chaotic real world" (and there is nothing more chaotic than the Trump administration).

On a simple engineering level, the Trump administration's ban on Mythos and Fable seems heavyhanded at best and a huge overreaction at worst. As Marcus has pointed out, virtually any LLM is going to be susceptible to the same kind of exploit that the Amazon researchers showed was possible with Anthropic's models. And based on the current rate of AI development both at OpenAI and Chinese companies like DeepSeek, whatever abilities Mythos and Fable have will be widely distributed soon. But regardless of whether the Trump administration thought they needed to block Anthropic's software, by taking such a Draconian step to stop it, without a compelling argument as to why, the government has sent a message that it can't be trusted, either by AI companies inside America or by potential customer of those companies outside America.

Justin Hendrix, the editor of Tech Policy Press, pointed out that the Trump administration’s move is “likely to raise alarms in foreign capitals about the reliability of American AI for critical applications.” One obvious conclusion is that AI companies in the US can’t be trusted to operate without interference from the government. It raises the same kinds of questions that critics raised about TikTok because it might be controlled by the Chinese government. The White House may feel that its reasoning is sound, but to outsiders it is going to look capricious — what Hendrix calls "a cloud of suspicion that senior officials are picking favorites based on personal and political factors.” And that could make things difficult not just for Anthropic but for the entire AI industry in the US. Some officials in the EU are already talking about the need to have "sovereign AI" controlled by national governments. Is that really the future we want?

Got any thoughts or comments? Feel free to either leave them here, or post them on Substack or on my website, or you can also reach me on Twitter, Threads, BlueSky or Mastodon. And thanks for being a reader.

The Anthropic AI danger chickens come home to roost

Start your engines

Anthropic's savior complex

Read more

It's the end of reading as we know it! It's the new Dark Ages!

Kids need to explore the world and that includes social media

Anthropic discovers that Claude has a secret spot for thinking

Is there a new quantum-level processor or is Microsoft lying?