It is widely acknowledged that AI companies use web articles to train their models without compensating creators or obtaining permission. Publishers such as The New York Times, the Chicago Tribune, and the Toronto Star have already filed lawsuits against this practice. Now, another prominent organization has joined the legal proceedings.
Techcrunch has reported that Encyclopedia Britannica and its subsidiary Merriam-Webster have filed a lawsuit against OpenAI, alleging that the AI giant committed “massive copyright infringement” by scraping and using nearly 100,000 of its online articles to train its LLMs without permission.
What’s this lawsuit about?
Britannica claims that ChatGPT generates responses that substitute its content, reducing web traffic and potential revenue. If users can ask ChatGPT a question and receive an answer based on Britannica’s articles, there may be less incentive to visit the website directly.
The complaint also targets OpenAI’s use of Britannica content in ChatGPT’s RAG workflow, a process where the AI scans the web for updated information when answering questions, alleging that the AI reproduces its content, in full or in part, when answering questions.
Additionally, Britannica alleges that OpenAI is violating trademark law. The company has argued that ChatGPT hallucinates information and then falsely attributes it to the publisher. According to Britannica, ChatGPT’s hallucinations jeopardize “the public’s continued access to high-quality and trustworthy online information.”
What’s going to happen next?
That’s the big question. There is no strong legal precedent establishing whether training an AI on copyrighted content constitutes copyright infringement. Anyone can tell you that it’s not right to use someone else’s work to train your data, but the law around it is murky at best.
In a recent case involving Anthropic, a federal judge ruled that using copyrighted content as training data was transformative enough to be legal. However, the same judge found that Anthropic had illegally downloaded millions of books, resulting in a $1.5 billion settlement with affected writers.
As this issue continues to evolve, lawmakers have significant ground to cover. The outcome of these cases will likely shape how AI companies can legally use web content in the future.

