Skip links

We Are All Complicit In Digital Theft of Copyright

ChatGPT, Gemini, Llama, CoPilot and Mistral are about to kill an entire industry in order to build a new one that primarily benefits the already well-padded tech companies. In many ways, it is unfair to blame the user of the services, but by using the services, we become complicit of digital theft of artists copyright.

“The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted.”

These are the words in a new petition led by famous musicians and writers. Journalists, photographers, designers, graphic artists and all other artists whose work is normally protected by copyright are also signing – yet again – while a myriad of lawsuits have been filed against the companies behind generative AI services.

OpenAI has used the content on the web to train, create and develop ChatGPT. Google has done the same with Gemini. Meta with Llama. Anthropic with Claude. And then there’s Mistral, Perplexity and many many more doing the same. Microsoft is a major investor in both OpenAI and Mistral and their CoPilot is built on ChatGPT. None of these companies have asked permission in advance to use other people’s content. Most say that they use ‘publicly available content’ and they believe it is legal and fair. All of them – including Llama, which Meta mendaciously calls open source – are closed black boxes when asked what content they have specifically trained their services on.

It’s pretty wild. An over-hyped new industry is being established on top of other people’s work and it has been going on for over two years. Many content creators and artists have changed their legal terms and tried to technically block further training, but none have been successful in stopping the abuse. On the contrary, generative AI has a strong tailwind. A Danish minister believes that all government employees should have access to CoPilot, Danish municipalities are experimenting, and even companies that are typically concerned about abuse their own trademarks are using GenAI services. Lots of consultants have emerged as preachers who want to teach us to prompt, because it’s about being first – not best – and they have found a good way to make a living.

Copyright Infringement
This is a huge copyright infringement – outright theft, as Naomi Klein calls it in The Guardian – from a whole lot of creators who are forced to spend time and often very scarce resources defending their right to control their own content and works. There are plenty of lawsuits going on, and luckily the big ones are leading the way, even if they are small up against the tech giants. The most high-profile case is The New York Times v. OpenAI and Microsoft. Getty Images has sued Stability AI, the company behind Stable Diffusion, and the Authors Guild has also filed a lawsuit against OpenAI. The list is unbelievably long and is collected here in ‘Generative AI – Intellectual property cases and policy tracker’ in the US.

In the US, there is a certain risk that the cases will be lost. The US copyright law contains a ‘fair use’ clause, which Google once won on when they created Google Books. The difference between that case and now, however, is that Google Books didn’t cost money. That’s why it’s so important for AI companies to provide free versions of their services (where you pay with your data in return). The fair-use doctrine is Microsoft’s defense. The giant, which for the past several years has marketed itself on the ethically responsible use of artificial intelligence and sits inside most government and corporate computers, argues that this is fair use and are working on getting as many people as possible to use CoPilot before the lawsuits are settled.

In Europe, we don’t have a fair-use clause in copyright law. But big AI tech’s skilled lawyers will do everything they can to find good arguments for using other people’s works in the name of innovation. No European copyright holders have yet filed a lawsuit against them. The Danish media has announced it in the Wired, but it is expensive and time-consuming to pursue such cases.

Human-generated Content is Gold
It is very important for text-based GenAI to train their services on human-generated fact-checked content rather than, for example, loose rumors on social media, which is why some media outlets have entered into licensing agreements with, for example, OpenAI for the legal use of their content. However, these are typically short-term agreements, where OpenAI gets access to full archives that they can practice on for as long as they have an agreement. Time will tell if it’s a good idea for media companies to enter into these agreements. They risk an ugly version of a Spotify solution where more and more people stop paying for digital content because they can get it all via e.g. ChatGPT or Google.

Even if the Silicon Valley companies were to win the copyright cases (unlike in China, where judges have ruled that there is copyright infringement), one could argue that it is deeply unethical to use other people’s content to build something new without asking permission and sharing the revenue. And what about all of us who use these products because we have to follow our employer’s demands or some other reason such as FOMO? We are become complicit in digital theft.

Copyright is only one Ethical Issue
With the list of other ethical problems, the copyright issue might drown. Just to name a few other issues: GenAI creates misinformation (they hallucinate – a fancy word for lying) and is used by bad actors to create deep fakes, so we can no longer tell the difference between fake and truth. They violate privacy, there’s lots of bias and discrimination, they’re deliberately designed human-like so that some are seduced and manipulated, and they’re huge climate sinners.

Fortunately, many organisations are building ethically responsible language models. The organisation Fairlytrained.org points to those where copyright is not violated. The image generator Firefly.adobe.com has in most cases received permission from the creators and can actually compete with OpenAI’s Dall-E. In Finland, Silo.ai is working on a Germanic model, there are several German entities with German government support, and in Denmark, we have the Danish Language Model Cross Consortium, which is supported by the Danish Business Association and the Alexandra Institute. So, at some point, there will be models that we can all use without infringing anyone’s copyright. The question is whether they will ever be able to compete with the unethical services.

This article was first published in Danish at Copenhagen Review of Communication

Translated partly byt the help of DeepL.com (free version)

Photo: Andrej Lišakov at Unsplash.com