Generative AI deployment is about to get more expensive
Reddit, Twitter and major news publishers to charge for using their content for AI training
At a Glance
- Major U.S. publishers join Reddit and Twitter in asking payment for their data used to train generative AI models.
- There is a growing chorus of concern over IP rights, also voiced by artists, musicians and developers.
The cost to deploy ChatGPT and generative AI tools is about to get more expensive as a growing chorus of content publishers are asking to be paid for their data used to train AI models − or risk getting sued.
The latest comes from a nonprofit alliance representing 2,000 U.S. print and digital publishers, which includes storied publications such The Wall Street Journal, The New York Times and The Washington Post.
The News/Media Alliance, in a detailed explanation of its new policy, made clear that the “unlicensed use of content” is an “intellectual property infringement.”
Generative AI developers and deployers “should not use publisher IP without permission, and publishers should have the right to negotiate for fair compensation for use of their IP by these developers,” the group said.
Further, generative AI developers must get “explicit permission” from the publishers. They are asking for payment for the following uses of their data:
Training: Using of content to train and test generative AI systems
Surfacing: Responses created from user prompts, which could include a note explaining what is in the surfaced content
Synthesizing: Summaries, explanations, analyses and the like, of source content in response to a prompt
The alliance pointed out that use of content for training, surfacing and synthesizing “is not authorized by most publishers’ terms and conditions.”
Importantly, the group also said that “authorization for search" is not the same as approval to use the data for generative AI, since the latter displays more content than traditional search.
"Negotiating written, formal agreements is therefore necessary," according to the alliance.
Reddit, Twitter, artists and devs
The alliance joins companies and other entities that are asking for payment for use of their data.
Reddit recently introduced a premium tier for “third parties who require additional capabilities, higher usage limits, and broader usage rights.” In its updated terms of use, the online community said developers looking to build a large language AI model “may not use content on Reddit as an input for any model training without explicit consent.”
Using Reddit data for pure research is free in general, but it will charge fees if applications need “large volumes of data” to cover its costs.
Twitter in February announced that it will “no longer support free access” to its API and a “paid basic tier will be available instead.” The social media platform said its API is “unique” from data shared by other platforms because it “reflects information that users choose to share publicly.”
Artists, developers and stock image sites are already embroiled in – or threatened − lawsuits over use of their content in generative AI models.
Two weeks ago, major record label Universal Music Group warned music streaming services about using copyrighted songs to train AI models. It said it would “not hesitate to take steps to protect our rights and those of our artists.”
Earlier this year, Getty Images sued Stability AI, the parent of text-to-image generator Stable Diffusion, in the U.S. and U.K. over alleged copyright infringement of its image library. Last November, developers sued GitHub, Microsoft and OpenAI for alleged copyright violations of their AI-powered coding tool Copilot.
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like