4 court cases that shaped the future of web scraping

Ruben Herrera

Tech builder focused on infrastructure, automation, backend systems, and scalable SaaS development

The war over data: why Meta, X, Google, and Reddit are suing scrapers

Meta, X Corp., Google, and Reddit are putting increasing pressure on companies that work with data collection. In this piece, we break down the major lawsuits and analyze how the courts could reshape access to public data — and the entire automated data collection industry.


Platforms used to build anti-bot systems. Others rotated proxies, changed browser fingerprints, bypassed rate limits, solved captchas, and assumed the whole game came down to one thing: who could out-engineer whom.

That picture is already outdated.

The main battle is no longer happening in the browser or in the console. It is happening in court. And the question being decided there is more uncomfortable than any anti-bot system: can the public web still be treated as public if a large platform wants to declare it its own controlled territory? Can the collection of open data be banned with a single line in a ToS? Can an anti-bot system be turned from a technical barrier into a legal trap?

At its simplest, the story looks like this. First, platforms tried to lock down the internet through contract: “these are our rules, and you broke them.” When that strategy started to fall apart, they moved to heavier weapons — the DMCA and the courts. In other words, the conflict shifted from “did you violate the site’s terms” to “did you have the right to automate access at all.”

1. Meta v. Bright Data

The case that made one thing clear: you cannot lock down a public website with a single reference to the ToS.

Meta tried to push a very convenient theory for platforms: if you came to the site and sent requests to its servers, then you were already operating under its rules — even if you never signed anything and were only viewing a public page. On paper, that sounds powerful. In court, not so much.

The claim was built around the idea that Bright Data was collecting and selling data from Facebook and Instagram in violation of the platforms’ terms. But the court looked not at Meta’s rhetoric, but at the actual mechanics of access. And those mechanics were uncomfortably simple: Bright Data collected public data without logging in, in other words through logged-off scraping. That detail is where Meta’s theory started to crack. The court concluded that this kind of access did not fall within “use” of the service in the way the platform was trying to argue, so the approach failed.

The court effectively rejected Meta’s core premise: an open page on the internet does not become a closed platform simply because the platform wants it to.

The importance of that ruling went far beyond the dispute with Meta itself. The court drew a clear line: if the data is available without authorization, and the scraper is not entering a closed part of the product or using an account, then the ToS no longer function as a universal ban. A mere reference to site rules is not enough.

The court also rejected another convenient claim for platforms: that captchas and other anti-bot mechanisms automatically turn an open page into a closed one. That point matters. If the court had accepted that logic, almost any serious technical defense could have started to function like user authorization on a platform. But the court showed that anti-bot protection by itself does not make the data private.

After Meta v. Bright Data, one thing became obvious: you cannot shut down the open web with a piece of paper in a user agreement. It was the first serious blow to the platforms’ strategy.

2. X Corp. v. Bright Data

Not only the logic of ToS broke here, but the broader idea of controlling public content.

After the Meta case, it became clear that user rules alone would not stop scraping. In X Corp., the question became more serious: can a platform use private claims to build a regime around public data that, in practice, works like its own copyright system?

X tried to strengthen its position with multiple claims at once: from violations of site terms to allegations of interference with infrastructure, unfair competition, and unlawful use of data. But the court did not accept that framework. The reason was not technical, but principled: a platform cannot use private lawsuits to obtain a level of control over public user content that effectively replaces federal copyright law.

The court took a clear position here: a platform cannot use its own rules and private claims to obtain more control over user content than the law actually gives it. The mere fact that the data sits on its servers does not make it the platform’s full property.

For practitioners, there was another important takeaway. X tried to frame proxies, IP rotation, and bypassing limits as almost standalone misconduct. The court did not agree. The mere fact of IP rotation was not enough to prove deception. Nor did the argument about defense-related costs help: extra expenses and platform frustration do not by themselves mean that sufficient legal harm has been shown.

For platforms, this was a bad signal. After X Corp. v. Bright Data, it became clear that public content could no longer simply be brought under platform control through ToS, technical restrictions, and a bundle of private claims. The court made it clear: that formula no longer works on its own.

And it was after this turn that it became obvious that corporations would start looking for a heavier tool. They found one.

3. Google v. SerpApi

This is where the fight stops being about site rules and turns into something much more dangerous. This is where the platform approach really changed.

Meta and X tried to lock down the public web through ToS and private claims, but that strategy did not deliver the result they wanted. Google then shifted the subject of the dispute itself: the question was no longer whether site rules had been violated, but whether technical protection had been bypassed — and whether that could be treated as a DMCA violation.

That is exactly why Google’s lawsuit against SerpApi may end up being the most dangerous of the four.

Google builds its case around one key thesis: SearchGuard is not just an anti-bot system. It is a technical access-control system. If the court agrees, then bypassing that protection will no longer look like a rules-of-the-platform issue. It will look like a possible violation of federal law. And that is an entirely different level of risk and consequence.

The risk is no longer theoretical. Google relies on the DMCA’s anti-circumvention provisions, describes the bypassing as large-scale, and seeks damages for each individual incident. Even without a final ruling, that is already enough to understand the scale of the threat: if this approach holds, it will become a major precedent for the data collection industry and for SEO tools and services.

What makes this case dangerous is that it changes the entire frame of the dispute. Until recently, an anti-bot system was just a technical defense around a public resource. Now Google is trying to argue that it is not merely a filter against unwanted traffic, but a control system governing access to protected content. If that interpretation stands, bypassing an anti-bot system will no longer be treated as disputed scraping, but as circumvention of technical protection under the law.

If that proposed framework works, the market will be operating in a very different reality. Any advanced anti-bot system around public data could be reframed as an access-control system. And then the dispute would no longer be about scraping an open page, but about obtaining data through a method the law deems impermissible.

That is exactly what SerpApi is contesting. Its position targets the weakest part of the whole theory: search results remain open and available without authorization, so the anti-bot system around them should not automatically be treated as an access-control system for protected content. And there is also the separate question of whether Google can legitimately claim such a broad scope of protection in the first place.

Even so, the mere existence of a lawsuit like this already changes the rules of the game. After Google v. SerpApi, scraping can no longer be described as just an ordinary ToS violation. The question is now different: is a court willing to treat an anti-bot system as a legally significant access-control mechanism, and its bypass not merely as disputed scraping, but as an unlawful method of obtaining data?

And if the answer is yes, the consequences will go far beyond a single company like SerpApi.

4. Reddit v. SerpApi / Oxylabs / AWMProxy / Perplexity

It only escalates from here: the target is no longer a single scraper, but the entire route by which data travels from a website to a finished product.

If Google is testing whether an anti-bot system can be turned into a legal access-control tool, Reddit is expanding the attack even further. The issue is no longer one scraper. It is the entire chain through which content is collected, transferred, and ultimately ends up inside AI services.

This is a completely different kind of dispute. It is no longer about one bot or one downloaded page. Reddit describes an entire chain — from proxies and scraping services to APIs, search results, AI models, and finished user-facing answers. And the key move is that Reddit treats this whole chain as a mechanism for using someone else’s content without the platform’s permission.

The main risk for the market is that, for the first time so directly, the question is being raised not only about scraping data, but also about everyone who took part in moving that data further down the line — all the way to the AI model.

That is exactly why the list of defendants looks the way it does. Reddit is not going only after the companies that ultimately deliver the final answer to the user. It is also going after the companies that make the data access chain possible. That is why, alongside Perplexity, the case names SerpApi, Oxylabs, and AWMProxy. Reddit’s logic is straightforward: these are not separate, isolated acts by unrelated companies, but a connected market in which some players collect data, others provide bypass and delivery infrastructure, and others turn the result into a commercial product.

That is what makes the case new. In older scraping disputes, the focus was on a single parser — the party that actually pulled the data. Now the attention shifts to the whole chain: who provided the infrastructure, who handled the network layer, who exposed the API, who turned raw content into a commercial answer. If that approach takes hold, legal risk will no longer fall only on data collectors, but on everyone who once considered themselves “just infrastructure.”

At its core, Reddit is trying to expand the subject of the dispute itself: not to limit it to the question of who directly collected the data, but to put the entire route by which it moves toward an AI product under legal pressure.

If that approach works, the next front of the dispute will shift. The question will no longer be whether scraping itself is permissible, but how far liability can be extended across the chain — from the party that obtained the data to those who enabled its delivery and use.

Final takeaways

Case Platform approach Core question in dispute What it changed for the market
Meta v. Bright Data Attempt to ban scraping through user rules Does collecting open data without login bind a scraper to platform terms? It became clear that an open page does not by itself mean automatic agreement to every restriction in the site rules
X Corp. v. Bright Data Reliance on ToS plus a set of private claims about harm and unfairness Can a platform impose its own control regime over public user content? The court showed the limits of that strategy and refused to let public data be turned into a private zone of control
Google v. SerpApi Attempt to move the dispute into the realm of bypassing technical protection Can an anti-bot system around an open search result page be treated as an access-control system? The dispute shifts away from site rules toward possible violation of federal law, which is a completely different level of risk
Reddit v. SerpApi / Oxylabs / AWMProxy / Perplexity Attack not only on data collection itself, but on the whole chain of intermediaries Can liability extend not only to the collector, but also to those who enable the transfer and use of the data? The target is no longer one scraper, but the whole infrastructure through which data reaches an AI product

How the scraping dispute moved from ToS to the DMCA

flowchart TD A[Public data on the web] --> B[Platforms try to restrict access through ToS] subgraph Contract era B --> C[Meta v Bright Data] B --> D[X Corp v Bright Data] C --> E[ToS do not work against logged-off scraping] D --> F[Public content cannot be privatized through private claims] end E --> G[The contract-based model starts to crack] F --> G G --> H[Platforms move to a heavier legal theory] H --> I[Google v SerpApi] H --> J[Reddit v the AI ecosystem] I --> K[DMCA §1201 and the anti-bot system as access control] J --> L[The attack shifts from one scraper to the entire data supply chain] K --> M[The right to scrape is no longer decided by code alone] L --> M

This is where the real shift happened. At first, platforms tried to ban scraping through user agreements. Now they are trying to go further: to argue that even open data is still under platform control, and that bypassing anti-bot protection is no longer just a violation of site rules, but a violation of the law.

Why these cases, specifically, are defining the future of scraping

Because together they show a complete change in the logic of the dispute. First, platforms tried to stop scraping through site rules. Then through private lawsuits. Now they are trying to go even further: to present anti-bot systems as access-control mechanisms, and to extend liability not only to the data collector, but across the whole chain of intermediaries. That is the real shift that now defines the future of scraping.

That is exactly why these four cases are changing the market. They showed that the scraping debate can no longer be reduced to technology alone — proxies, limits, anti-bot systems, and bypass techniques. It is now a dispute about control over access to data, which means it is also a dispute about where technical protection ends and legal prohibition begins.

The old core question sounded like this:

“Can we collect this data technically?”

That is no longer the main question.

What matters now is something else:

“Will the court treat this barrier as ordinary technical protection — or as an access-control system whose circumvention carries legal consequences?”

The answer to that question will affect far more than the fate of individual services.

If the approach now being tested by Google and Reddit takes hold, the internet will remain open only in part. To a human user, it will still look like open space. To machines, it will increasingly look like leased territory: licensed, accessed through approved channels, and governed by someone else’s rules.

At this point, the dispute has narrowed to one core issue: who gets to control access to public data on the internet.