(The illustration is AI-generated by Adobe Firefly which tells us that it is trained only on consented material)
Media outlets want a fair revenue share with tech companies that earn ad revenue from their content. With AI services, the conflict has intensified.
Analysis. Google and Facebook owe publishers a lot of money because they earn ad revenue from their content. In the US alone, Google would have to pay 17.5% of its ad sales – about $10 billion a year – while Meta would have to cough up 6.6% – about $2 billion a year – if there was a fair revenue split. This is the conclusion of a new independent report Paying for News: What Google and Meta Owe US Publishers fra Columbia University.
We’re only talking about using journalistic content by search engines and social media to generate traffic and engagement. Many countries, including Denmark, have introduced laws that give media outlets the right to demand payment, and negotiations are also underway. Still, there are no set amounts and it’s an uphill battle as Google and Facebook try to avoid paying. Google says according to Semafor Media that a maximum of 2% of all searche sare related to the news. Even with this low percentage, it would mean that the struggling media could share a lot of money from Google.
However, before this issue is resolved, a new one is looming. AI companies like Google, Meta, and OpenAI/Microsoft have launched popular generative AI services like Bard, Llama and ChatGPT. To do this, they have vacuumed (scraped) all content from the internet, including media, without asking permission first. All this content is used to train and develop the AI services.
The US AI companies believe they should have the content for free to ensure progress and innovation and for the US to continue to compete with China. They say that it is fair use to scrape content from the open internet – just like Google once got a ruling in the US that it was ‘fair use’ to download all books and make them available to everyone for free.
Media: Not Fair Use
The media argues the exact opposite. In a White Paper, written by their trade association, they argue that the AI companies have used their content between 5 and 100 times more frequently for their AI services than other general content on the web, and that this is not fair use, because the AI companies
- profit from media content via commercial products,
- use media content to compete with the media, and
- do not use media content significantly differently from the way the media uses the content.
The arguments are about US copyright law. In the EU we are, as always, stricter and do not allow ‘fair use’ to the same extent as in the US.
License Agreement
Many media outlets believe that AI companies should also enter into license agreements and provide them with a fair revenue share. But unlike before, when most news media put all their content online for free, they are a little smarter this time. Most of the big news outlets have blocked for more scraping, as far as I know. It’s the right thing to do, because why help them make their product better when they get nothing in return?
The media hopes that blocking the scraping will place them in a better position when negotiating license agreements. And this is where I think it will be more than difficult. When it comes to putting a price tag on those license agreements, it’s going to be peanuts at best. Having followed the big tech companies closely over a couple of decades, it’s hard to imagine that there will be a fair share.
Meta is telling the US government in a consultation process for a new law that might give media a fair share that media content is ‘completely insignificant’ in the big data AI services, so a reasonable royalty would be incredibly small.
AI companies are fighting hard not to pay for something they believe they have a right to access for free. That’s why news media should use their good, fact-checked content to create better trusted AI services that are being developed in a legal and fair way. It doesn’t matter if it takes a while. Legal, democratic and ethical processes usually do.
(The illustration is AI-generated by Adobe Firefly which tells us that it is trained only on consented material)