Google Says It Will Scrape Publishers’ Data for AI Unless Forced Not To

Google hungers for all that content produced by the wealth of digital publishers creating text, video, and images on a daily basis. To deal with the sticky copyright issues at the heart of AI training, Google is proposing that all those companies who don’t want their content gobbled up will need to “opt-out” to ensure Google’s open maw doesn’t swallow all their juicy data.

The tech giant offered this raw deal to the Australian government in response to the country’s recent proposal to ban “high-risk” AI applications, including creating deepfakes, disinformation, and discrimination. As first reported by The Guardian, Google shared that publishers should have the ability to say no to whether their content is copied for the purpose of training AI.

Google released its Bard chatbot in the land down under back in May, and since then, the company has been trying to entice the country into allowing it to scrape ever more data. Google has already written to the Australian government over relaxing copyright laws to allow more AI training. Now it’s being open about establishing an AI-friendly internet that allows scraping by default. The proposal would force publishers both big and small to educate themselves about the opt-out and establish it on their own sites rather than putting the onus on Google.

The company did not explicitly say how this opt-out function would work, and Google did not immediately respond to Gizmodo’s request for comment. In a July blog post, Google called for new “standards and protocols” about how web publishers participate in the internet. The company pointed to the 30-year-old, community-developed robots.txt standard, a protocol that indicates to web crawlers and bots which portions of a site they’re allowed to visit.

Of course, that robots.txt protocol only works with nice bots that agree to comply voluntarily. It doesn’t impede any company that decides not to obey the standard. Plus, it doesn’t take back any data that was already scraped without publishers’ consent. Google has multiple large language models, including its recently announced PaLM 2. Google’s Bard chatbot was originally based on the LaMDA LLM, and researchers have noted that 50% of its content comes from public forums while a good chunk of it is scraped from Wikipedia and other websites.

It’s not just publishers that Google is looking to scrape, it’s the entire internet writ large. Recently, Google updated its privacy policy to explicitly allow the company to use everything you post online to be used in developing its AI tools. Shortly after Gizmodo was first to spot the policy change, Google was hit with a class action lawsuit claiming the company scraped up copyrighted material without consent.

ChatGPT creator OpenAI has been hit with a very similar lawsuit over its alleged abuse of copyright. Essentially, these companies have already scraped up massive amounts of the internet to train their models. So much of the data is already based on Wikipedia entries and Reddit posts, but these models also make use of articles, books, and other online text. Just consider that the GPT-4 language model is trained on 45 terabytes of data, so there’s a bounty of published material locked inside. OpenAI has its own designs on industry-friendly regulation, and it has called for a whole new federal agency meant to oversee the tech. Google, on the other hand, has lobbied against that proposal.

Google’s opt-out idea wouldn’t be localized to just Australia, of course. The company has been trying to court the largest news organizations like The New York Times and The Washington Post with new AI tools, all while trying to infer its A-OK if they scrape up all those published articles for use training their AI.

Google Says It Will Scrape Publishers’ Data for AI Unless Forced Not To

Sign up for our newsletters

Latest news

‘My Soul Left My Body’: Amazon Accidentally Bills Users Billions of Times What They Owe

‘Magic: The Gathering’ Will Embrace the Multiverse in 2027

Someone Paid Almost $1 Million For Jensen Huang’s Leather Jacket, Should Be Executed by Swirlie

How to Watch France vs England Livestream Free from Anywhere

‘Backrooms’ Almost Got Trapped In Copyright Hell

This July Belongs To the X-Men

Your Child’s Next Teacher Could Be a Sex Robot

Everybody’s Suing Paramount This Week

Latest Reviews

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

X by Xreal a01+ Review: AR Glasses That Are Light on Your Face (and Wallet)

Related Articles

Google Says It Will Scrape Publishers’ Data for AI Unless Forced Not To

Sign up for our newsletters

‘My Soul Left My Body’: Amazon Accidentally Bills Users Billions of Times What They Owe

‘Magic: The Gathering’ Will Embrace the Multiverse in 2027

Someone Paid Almost $1 Million For Jensen Huang’s Leather Jacket, Should Be Executed by Swirlie

How to Watch France vs England Livestream Free from Anywhere

‘Backrooms’ Almost Got Trapped In Copyright Hell

This July Belongs To the X-Men

Your Child’s Next Teacher Could Be a Sex Robot

Everybody’s Suing Paramount This Week

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

X by Xreal a01+ Review: AR Glasses That Are Light on Your Face (and Wallet)

Related Articles

The Best Budget Laptops Under $1,000 for Back to School

The Best Tech to Level Up Summer 2026

Apple Is Coming for the People Building OpenAI’s Future

China Just Dropped Another Bomb on America’s Frontier AI Companies

Body Bags Found Outside OpenAI HQ as Execs Increasingly Fear for Their Lives

OpenAI Just Launched Its First Hardware Product—and It’s a Tiny Keyboard for Bossing Around AI Agents