We scan new podcasts and send you the top 5 insights daily.
In a striking example of corporate double standards, Meta was an active customer of Bright Data—using its services to scrape e-commerce sites—at the same time it was suing the company to prevent others from scraping its own platform.
Legal battles won by data firm Bright Data against platforms like Meta and X set a key precedent: public information not behind a login is fair game. A federal judge's declaration, "You do not own the internet," solidifies the right to collect this data responsibly.
The NYT's seemingly contradictory AI strategy is a deliberate two-pronged approach. Lawsuits enforce intellectual property rights and prevent unauthorized scraping, while licensing deals demonstrate a clear, sustainable market and fair value exchange for its journalism.
When dealing with tech giants like Google or OpenAI, publishers should not rely on goodwill. They are self-interested capitalists who prioritize their own profits. The only reliable strategy is to build mutually beneficial economic ecosystems or create direct relationships with your audience.
As AI makes it trivial to scrape data and bypass native UIs, companies will retaliate by shutting down open APIs and creating walled gardens to protect their business models. This mirrors the early web's shift away from open standards like RSS once monetization was threatened.
Amazon's lawsuit against Perplexity's shopping agents is more than a web-scraping dispute; it's a strategic move to control how users access its marketplace. A win for Amazon could set a legal precedent allowing platforms to block third-party agents and force customers into proprietary AI ecosystems, stifling competition in agentic commerce.
Rather than simply failing to police fraud, Meta perversely profits from it by charging higher rates for ads its systems suspect are fraudulent. This 'scam tax' creates a direct financial incentive to allow illicit ads, turning a blind eye into a lucrative revenue stream.
To prove unauthorized data use, Reddit created a fake post visible only within Google's search results. When Perplexity's AI incorporated this "honeypot" content, it provided irrefutable evidence that Perplexity was scraping Google for Reddit data against its terms, creating a clever legal strategy for content owners.
The concept of charging AI agents to crawl web content highlights a fundamental conflict. While content creators see it as a way to monetize their IP, growth-focused businesses want to open the floodgates to bots for maximum exposure and lead generation.
Meta is removing ads from law firms attempting to recruit plaintiffs for class-action lawsuits against the company. It justifies this by citing a ToS clause that allows content removal to mitigate adverse legal impacts. This is a powerful example of a platform using its own policies as a defensive legal strategy.
Medium's CEO frames the AI training data issue as a classic prisoner's dilemma. Because AI companies chose an "antisocial" path of scraping without collaboration, platforms are now forced to defect as well—blocking crawlers and threatening data poisoning to create leverage and bring them to the negotiating table.