Does Cloudflare want to protect the web or control it?

When AI engines or LLMs (large-language models) such as ChatGPT started to become popular, one of the main concerns some people had was the widespread crawling and indexing of content that companies like OpenAI and Anthropic engaged in while training these models, and that this is a form of theft. As some readers are probably aware, I've written about this before for previous editions of The Torment Nexus, and I've argued that instead of theft or copyright infringement, this kind of indexing should be considered a fair use under US copyright law, just as Google's scanning of millions of physical books was, because of the transformative nature of that use (one of the four factors that judges have to consider in order to make a fair-use ruling). There have been a couple of court decisions that suggest this might become legal precedent -- with certain restrictions -- but the Supreme Court has yet to weigh in on the question.
In addition to the numerous copyright-infringement lawsuits that have been launched by authors, publishers, and newspaper owners like the New York Times (which tried to negotiate a payment-for-indexing deal with OpenAI but failed to reach an agreement on price), there have been a number of attempts to foil the scraping and indexing bots that AI companies use to hoover up content. Some are of the "spanner in the spokes" variety, like the Iocaine Project, named after the tasteless poison in the movie The Princess Bride (Never go against a Sicilian when death is on the line!). It is an open-source program that allows website owners to trap AI bots and scrapers in a kind of garbage-infested dead-end, where they are forced to go around and around indexing nonsense in a maze with no exit. Other attempts to foil the bots are a little more sophisticated, and Cloudflare has come up with several. But is the solution worse than the problem?
In case you're not familiar with Cloudflare, it is what's known as a CDN or content-delivery network, essentially a giant middleman that stands between your blog or website (or the sites and services of hundreds of thousands of corporations, governments and other entities) and the festering swamp of hackers and ne'er-do-wells that is the open internet. If you use Cloudflare, your website will load a lot faster – because the company caches it and then serves it from its own superfast network of servers – and is also protected from hacking attempts and DDoS (distributed denial of service) attacks, and a host of other nasty behavior. In the interests of full disclosure, I should point out that use Cloudflare for my website, but just for DNS hosting and rerouting, so that when you type in mathewingram dot com you get taken to the right website.
Cloudflare's services definitely make the internet a safer place, and make it easier to operate a website without an entire security team. And co-founder and CEO Matthew Prince genuinely seems like a nice guy. As Fred Vogelstein notes at Crazy Stupid Tech (which he runs with my former boss Om Malik), Prince isn't your typical tech founder. He may be worth billions, since Cloudflare has a market cap of about $70 billion, but he is clearly a man of principle. He and his wife bought their local newspaper because he wanted the people of Park City, Utah to have high-quality news – which is not that surprising when you find out that his major in university was not computer science but English. Prince says this was a key reason for his first AI scraper-related offering to website publishers and Cloudflare users, which the company calls "pay-to-crawl." Here's how Prince announced it on July 1, a date that was clearly of some relevance:
Instead of being a fair trade, the web is being stripmined by AI crawlers with content creators seeing almost no traffic and therefore almost no value. That changes today, July 1, what we’re calling Content Independence Day. Cloudflare, along with a majority of the world's leading publishers and AI companies, is changing the default to block AI crawlers unless they pay creators for their content. That content is the fuel that powers AI engines, and so it's only fair that content creators are compensated directly for it. Next, we'll work on a marketplace where content creators and AI companies, large and small, can come together. Traffic was always a poor proxy for value. We think we can do better. We believe that if we can begin to score and value content not on how much traffic it generates, but on how much it furthers knowledge we not only will help AI engines get better faster, but also potentially facilitate a new golden age of high-value content creation.
Note: In case you are a first-time reader, or you forgot that you signed up for this newsletter, this is The Torment Nexus. You can find out more about me and this newsletter in this post. This newsletter survives solely on your contributions, so please sign up for a paying subscription or visit my Patreon, which you can find here. I also publish a daily email newsletter of odd or interesting links called When The Going Gets Weird, which is here.
Color me skeptical

If you want to read a little more about how Prince is thinking about this problem, and also get more details on how a lawyer built a giant technology player, Ben Thompson interviewed him for a recent edition of his newsletter Stratechery. In a nutshell, Prince believes that the entire structure of the internet of content that we have grown used to thanks to Google is under immediate threat from AI-powered search, which supplies answers instead of links. As he describes it, "Google set this expectation that everybody can scrape the Internet for free, but it was never free. Google paid for it for a really long time and the quid pro quo with the content creators was, we get a copy of your content and in exchange we’ll send you traffic and help you monetize that traffic." However, he goes on to say, that quid pro quo breaks down as we shift from search engines to answer engines, and the potential outcome of that change is that without some kind of alternative business model for content (advertising isn't enough) "all of the journalists, academics, and researchers in the world will starve to death."
Will Prince's "pay per crawl" idea work? Let's just say I'm skeptical. And I'm not the only one. For one thing, I don't think that there is enough coherence or agreement among media companies or publishers to ever make a model like the one Prince is describing work. As someone who has spent decades writing about and for media companies, the idea that any of them would willingly co-operate, even when faced with their own imminent destruction, is wildly implausible. And on the other side of the coin, there is virtually no chance that AI companies en masse are going to adopt the kind of payment structure that Prince is advocating, especially not if it only includes a small selection of content (which it likely will, for the reasons given above). I also don't think the kinds of payments Prince is talking about – even if they did actually come to pass – would be enough to change the business model of most media companies significantly.
As I mentioned earlier, pay-per-crawl is just one of Cloudflare's ideas for fixing the internet in the age of AI. Another is what the company calls "signed agents." Cloudflare wants to create an index of AI crawlers and agents that it will verify and approve as trusted, in much the same way that its CAPTCHA system, in use on many websites, verifies and approves a person as being human. It's an extension of an existing program that Cloudflare has to verify and index bots or scrapers like the kind that Google uses to index content. The difference with agents is that in many cases, the AI-powered agents that are interacting with websites – in order to book flights or research topics, or accomplish other tasks – are not coming from companies like Google or OpenAI or Anthropic. They are being sent out to accomplish those tasks on behalf of internet users of all kinds, some harmless and some, well... not. Here's how Cloudflare puts it:
Agents are changing the way that humans interact with the Internet. Websites need to know what tools are interacting with them, and for the builders of those tools to be able to easily scale. Message signatures help achieve both of these goals, but this is only step one. Cloudflare will continue to make it easier for agents and websites to interact (or not!) at scale, in a seamless way.
As I mentioned above, Prince seems like a genuinely good guy, and both the pay-per-crawl and signed-agents proposals appear to come from a sincere desire to help content companies and to help the web in general (although some have speculated that they are also driven by inherent self-interest). But while Prince may have good intentions, that doesn't mean the system that he sets up would necessarily be good for the open web. Several years ago, Prince cut off a website known as KiwiFarms from using his service, leaving them open to DDoS attacks and other security breaches. KiwiFarms is a terrible place, and Prince wrote a blog post in which he said that doing so was "an extraordinary decision for us to make and, given Cloudflare’s role as an Internet infrastructure provider, a dangerous one that we are not comfortable with.” But he still removed them. Will he do so again, and if so will we all agree with that decision?
A wolf in sheep's clothing?

When I first read about Cloudflare's pay-per-crawl and signed-agents proposals, I thought they were a well-intentioned effort to help the open web, but reading a recent Substack post on the signed-agent idea started me thinking about it a little differently. The post was written by someone who goes by the name Positive Blue, and the writer in question appears to be a programmer and developer by the name of Jordi Montes, who has a service that allows AI agents to make and accept payments. It seems fairly obvious that someone like that might be predisposed to like AI agents, and be interested in seeing them proliferate, and therefore might be critical of efforts like Cloudflare's. But despite all that, I still think some of his comments are worth considering. Here's how he describes the downsides of the signed-agent system that Cloudflare is promoting:
Do you register with Google, Amazon or Microsoft to use the web? Cloudflare’s new “signed agents” pitch sounds like safety but it’s a wolf in sheep’s clothing. They’ve built an allowlist for the open web and told builders to apply for permission. That’s not how the internet works. Yes, identity for agents is a real problem. But Cloudflare is solving it like a border checkpoint. Get on their list or get treated like a trespasser. That’s vendor approval not an internet protocol. An allowlist run by ONE company? Authentication for that world isn’t “ask Cloudflare for a hall pass.” It’s verifiable chains of delegation and request-level proof: open, portable, and independent of any one company.
There are definitely going to be (and already are) malicious AI agents, Montes argues, and we will need ways of identifying them and controlling them. But is the solution to give one company – as well-intentioned as it might be – the ability to set the standard for how to do that, or to control the keys? The challenge we are facing, Montes says, is bigger than Cloudflare, Google, Microsoft or any single company: "The future of the web cannot hinge on who controls the keys. We need protocols, not gatekeepers. If we let a handful of companies decide which agents are valid, the agentic web will collapse into walled gardens. We’ve seen this movie before." Garry Tan, the president and CEO of YCombinator, and a number of others have raised similar concerns on X, arguing that having Cloudflare become the gatekeeper for AI agents is not the right solution, even if the company's motive is to help the web remain free and protect it from attack.
In many ways, this argument reminded me of my friend Mike Masnick's 2019 paper "Protocols Not Platforms," in which he argued that the social web – just like the internet itself – should be made up of open protocols, not proprietary, for-profit and silo'd platforms like Facebook and Twitter. Masnick's paper famously helped inspire former Twitter CEO Jack Dorsey to create the project that eventually gave birth to Bluesky, a theoretically decentralized Twitter alternative (but is it really decentralized? Not in some important ways, as I wrote in a previous edition of Torment Nexus). The point of the protocol approach is that it can't be – or shouldn't be – controlled by a single company. Instead, there are interrelated and open standards like HTML and SMTP and too many others to mention. Can people use these standards to create things that are bad, or to set up services that take advantage of unsuspecting users? Of course. But that is the price we pay for the freedom to create the services and content we want.
I'm not trying to make Matthew Prince or Cloudflare out to be the bad guy. As I've tried to point out, their motives appear to be (mostly) pure. And it's possible that the tools Cloudflare is proposing could eventually become an open standard, or a decentralized network that anyone can plug into with an API. I'm not suggesting that they even want to control these kinds of solutions. But as we think about what the internet of the future looks like, I think we should be cautious about handing too much power to a company that already controls a crucial piece of internet infrastructure, even if they have the best of intentions.
Got any thoughts or comments? Feel free to either leave them here, or post them on Substack or on my website, or you can also reach me on Twitter, Threads, BlueSky or Mastodon. And thanks for being a reader.