Skip to content
Artificial Intelligence

Trump Admin Accuses China of ‘Industrial-Scale’ Theft of AI Tech. What Does That Even Mean?

Wow. I didn't know that. What else can you say? I'm actually sad to hear that.
By

Reading time 2 minutes

Comments (11)

The Trump administration (which has argued that scraping an infinite amount of copyright-protected data for training AI models should be considered fair use) is accusing China of “industrial-scale” theft of intellectual property from American AI companies and is threatening to crack down on actors accused of doing the jacking.

According to the Financial Times, the accusations of theft were apparently levied in a memo from Michael Kratsios, the director of the White House Office of Science and Technology Policy, and distributed to multiple government agencies. It directed those working in the departments to share information with AI companies to make them aware of any attempts by foreign actors to access their sensitive information.

At issue is the apparent practice of distillation, in which a larger model’s outputs are used to train a smaller one, enabling the smaller model to mimic some of the larger model’s performance more cheaply and with less computational overhead. It’s the practice that Anthropic accused several China-based AI labs of doing earlier this year. OpenAI also claimed that DeepSeek used distillation techniques to train its open-source model, arguing that the lab was trying to “free-ride” on the back of work done by US firms.

Pot, have you met kettle? Most of these companies have been very protective of stuff in the black box of AI, the weights and data that make their models tick. But we know much of that material is (or was at one point) protected by copyright. Dubious practices, like Anthropic buying, scanning, and destroying millions of books, give tech giants potential legal cover as courts continue to debate whether or not AI model training constitutes fair use, but the argument that something transformational is happening under the hood gets complicated when models regularly reproduce training material nearly verbatim.

Under that context, it’s interesting to see how far these companies will go to protect their own material. When Anthropic recently saw the source code for its Claude Code product leak online, the company reportedly issued thousands of copyright takedown requests to prevent people from republishing and sharing it. According to the New York Times, there was one notable exception to the widespread removal attempt: a rewritten version hosted on GitHub. One quick-thinking leaker used AI agents to translate the leak into another programming language, which was apparently enough for Anthropic to deem it transformative and outside its claim of ownership.

Given that the Copyright Office has determined that works created by AI systems without human input are not eligible for copyright protections, and companies like Anthropic like to brag about how much of their code is now completed by autonomous AI agents, it’s increasingly interesting to learn what is and isn’t covered. But it does seem that AI companies believe they should get special treatment. Copyright doesn’t apply when they want to bypass it, but it’s an essential protection when it means someone might get their hands on their work.

Share this story

Sign up for our newsletters

Subscribe and interact with our community, get up to date with our customised Newsletters and much more.