News & Analysis

2024: AI ChatBots and Copyright

If 2023 was about euphoria about AI-led chatbots growing smarter, 2024 could well see a surfeit of lawsuits that questions their ethical values

For once let’s keep the sci-fi theories off the shelves. Debates around how artificial intelligence (AI) can make or mar civilization as we know it is still way off. What’s currently being debated is how ethical is the entire AI-led Chatbot modeling and for answer, we just need to read up the lawsuit that the New York Times has filed against OpenAI and its collaborator Microsoft. 

The charges are pretty simple. NYT is arguing that OpenAI, which released its ChatGPT to a euphoric audience back in November of 2022 and its closest collaborator and investor Microsoft have violated copyright laws to train their generative AI models on its content. Welcome back to the days when Ekalavya learnt archery from Dronacharya without consent and lost his thumb!

The lawsuit at the Federal District Court in Manhattan claims millions of its articles were used to train AI models that run ChatGPT and Microsoft’s Copilot without a “by your leave”. It wants both OpenAI and Microsoft to destroy models and training data containing such material and to be held accountable for billions of dollars in statutory and actual damages related to “unlawful copying and use of The Times’ uniquely valuable works.” 

Copycat, copycat, where have you been?

 An article published by TechCrunch quotes from the lawsuit to state that if NYT and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill. “Less journalism will be produced, and the cost to society will be enormous,” the lawsuit says. 

Of course, OpenAI has responded to it with a carefully drafted statement that “respects the rights of content creators and owners” but goes on to express “surprise” and “disappointment” that this should happen at a time when their conversations with NYT were moving forward well. And they go on to express hope that a “mutually beneficial way to work together” would be found soon and could also be implemented with other publishers. 

Interestingly, the NYT notes in its complaint that it had sought to create licensing agreements with Microsoft and OpenAI in April but the talks were fruitful. Small mercies that publishers are now coming out to take on the menace of content scraping without consent and dishing out something that’s cogent and credible but hardly authoritative. 

What’s all this brouhaha about?

One could think of this as an exercise where a smart participant sits through a group discussion, listens intently to everyone and then speaks up last by simply collating everything said before in a succinct manner. If this is AI, then there’s hope for humankind as GenAI models are now learning from existing examples that could result in copyright infringements. 

Now, if that’s what artificial intelligence is supposed to mean, all we can say is human intelligence will only get valued more – especially once the euphoria clears and those using the technology start seeing what it’s all about. For now, GenAI models learn from examples of essays, code, emails, articles and such to generate trained language models. 

At the nub of the debate is a basic disagreement between vendors who claim fair use doctrines provide them a right to scrape stuff off the web while copyright holders contend otherwise. In fact, several news organizations across the world are now using code to prevent GenAI scrappers from scanning their websites for training AI models. 

It’s creativity versus copycat culture

While NYT could be among the latest to join this conflict, legal battles began some time ago when actor Sarah Silverman accused Meta and OpenAI in May of ingesting her memoir to train their AI models. This was followed by the deluge when thousands of creators including the likes of John Grisham brought lawsuits against OpenAI with the same charge. 

In a parallel movement, several programmers are battling against Microsoft, OpenAI and GitHub over Copilot, an AI-powered code-generating tool, which the plaintiffs say was developed using their IP-protected code. The advent of NYT into the fray marks the entry of one of the world’s largest publishers to take up legal cudgels against GenAI propagators. 

On its part, the NYT cites cases of how Microsoft’s Bing Chat (now Copilot) provided wrong information that was then attributed to the newspaper group. Some of the queries included results of “15 most heart-healthy foods” and such like. They also argued that both OpenAI and Microsoft were building news publishing competitions by using NYT’s works. 

It also notes that by building such instances, the accused were harming NYT’s business by providing data that couldn’t normally be accessed without subscription or share information that isn’t always cited, sometimes monetized and often stripped of affiliate links that the publication uses to generate commissions. 

This, they perceive as a direct violation of copyright as most GenAI models merely regurgitate data by producing verbatim results from past articles. The lawsuit further claims that at least on one occasion ChatGPT actually enabled users to get around paywalls to their news content. The lawsuit says the “defendants seek to free-ride on the Times’ massive investment in journalism and doing so without payment.” 

In fact, as time progresses, there is every chance that other publishers would seek to plead into this case. In recent times, The Atlantic had reported that if Google were to add AI to its online search, it would answer queries 75% of the time without requiring click-throughs to the parent website, which would result in a 40% loss of traffic for the latter.