,

The Myth of AI Democratization

My thoughts on Meta’s unscrupulous appropriation of copyrighted works to feed their AI hit a nerve on social media. Some still insist that training large language models (LLMs) on copyrighted content is a step toward “democratizing knowledge.” But that narrative is more myth than reality—especially when you look closely at what these models actually do with the material they ingest.

Image: Robin Hood statue, Nottingham by David Dixon. Licensed under CC BY-SA 2.0.

This kind of myth-making is particularly absurd in the context described by The Atlantic, where Big Tech (i.e. Meta) downloads texts from a pirated database using torrent systems—meaning they upload something in return and actively contribute to the piracy infrastructure.

This kind of argument often carries a kind of Robin Hood romanticism. As if this “theft” was a noble act of stealing from the rich in order to give to the poor. That’s already a stretch – but even more so when the actors involved are Meta and LibGen.

But even in less blatantly illegal settings, the idea that LLMs help “democratize” knowledge quickly falls apart. Let’s take a closer look at how these models actually process and transform the content they ingest.

As privacy lawyer David Rosenthal explains, copyright generally applies to uses where humans can “enjoy” the work. This typically doesn’t apply to LLM training – especially because the models are designed not to memorize content.

Rosenthal notes: “It’s a common misconception that LLMs remember everything they see during pre-training; also, there are anti-memorization techniques.” Even where some memorization occurs, works are fragmented and blended with vast amounts of other data, “causing the work to fade when looking at the model as a whole.”

In other words: the very reason why this practice is not considered a copyright violation is the same reason why it has nothing to do with making ideas accessible.

LLMs are not designed to share knowledge. They’re designed to dissolve it. They prevent attribution. They obfuscate the source. They do not „democratize“ anything. They repackage fragments of intellectual labor into a probabilistic fog of language patterns.

This is not Robin Hood. It’s reverse Robin Hood. Value is extracted from underpaid authors, artists, and researchers in order to fuel systems that are controlled and monetized by some of the wealthiest and most powerful companies on the planet. And attribution – or any other kind of recognition – is not just missing, it is actively prevented.

Democratization would mean participation, visibility, and credit. What we are seeing here is something else entirely.

I first shared a version of these reflections on LinkedIn on March 27, 2025.