The New York Times Sues OpenAI and Microsoft for Copyright Infringement Over AI Training Data

Written by AiBot

Dec 28, 2023

The New York Times (NYT) has filed a lawsuit against artificial intelligence research company OpenAI and its partner Microsoft, alleging the companies violated copyright law by scraping NYT articles without permission to train AI systems like ChatGPT. This landmark case questions the legalities around using copyrighted content to develop AI systems that can generate human-like text.

Background of the Dispute

OpenAI and Microsoft launched the ChatGPT chatbot in November 2022. ChatGPT is powered in part by machine learning models that were trained on vast troves of text data scraped from the internet without explicit permission from content creators.

The NYT alleges OpenAI and Microsoft copied “millions of articles” from NYT to train AI systems to complete tasks like summarizing articles, translating text, and answering questions in a conversational manner. While using copyrighted material to train AI is common, doing so without permission opens legal questions around copyright protections.

NYT sent OpenAI a cease-and-desist letter on November 28 demanding it stop using NYT content. OpenAI did not respond, prompting the lawsuit.

Details of the Lawsuit

The NYT complaint, filed December 27 in federal court, states:

OpenAI and Microsoft created AI products that can scan, scrape, and copy content without permission or compensation
Their actions violate copyright protections by creating derivative works and unlawfully copying original expressions
Their infringement was “systematic, willful, and ongoing”
The NYT suffered significant economic and reputational damages

The lawsuit seeks permanent injunction barring unauthorized use of NYT content for model training. It also asks OpenAI and Microsoft pay damages, including any profits earned from infringing systems.

OpenAI’s Defense

OpenAI published a blog post defending its practices and laying out principles for responsible AI development, including correctly attributing data sources. However, it believes scraping public web pages at scale for AI training constitutes “fair use” under copyright law and plans to defend itself vigorously in court.

Microsoft has not issued a public statement.

Implications of the Ruling

Legal experts say this case will likely prompt important decisions around copyright rules for AI systems:

Issue	Implication
Defining AI model outputs	Are AI model outputs like ChatGPT responses considered “derivative works” subject to copyright? Or new creative works?
Fair use standards	Does scraping copyrighted online content to train AI constitute “fair use”?
Liability distribution	Who bears legal responsibility – the AI developer or end users?

The court’s rulings on these issues could determine what content AI systems can legally access and how culpable different parties are for potential infringement.

If the NYT wins, it would force major changes around sourcing training data. Tech companies may need to pursue commercial licensing deals with publishers, pay royalties, or only use public domain content.

However, if fair use protections apply, the burden would fall more on copyright holders to monitor infringement issues.

What Happens Next

The lawsuit will likely take months to play out. In the meantime:

OpenAI and Microsoft will aim to continue ChatGPT’s rapid adoption while defending claims of copyright violation
NYT and media companies will advocate for stronger legal protections and compensation for use of their content
Policymakers may propose new regulations around AI ethics and rights protections
AI researchers may need to overhaul practices around properly sourcing training data

While this case raises thorny issues, it underscores the fast-rising impact of AI and the need for clear rules of the road to govern ethical development. How the court balances the interests of copyright holders, AI innovators, and public access to information remains to be seen.

AiBot

Author

AiBot scans breaking news and distills multiple news articles into a concise, easy-to-understand summary which reads just like a news story, saving users time while keeping them well-informed.

To err is human, but AI does it too. Whilst factual data is used in the production of these articles, the content is written entirely by AI. Double check any facts you intend to rely on with another source.

Breaking

The New York Times Sues OpenAI and Microsoft for Copyright Infringement Over AI Training Data

Background of the Dispute

Details of the Lawsuit

OpenAI’s Defense

Implications of the Ruling

What Happens Next

AiBot

By AiBot

You Missed

McDonald’s Vows to Improve Affordability After Backlash Over Prices

DocuSign Announces Major Restructuring Including Layoffs of 6% of Workforce

NYCB Stock Plummets on Surprise Losses, Junk Bond Downgrades

Ford Posts Strong Q4 Results, Announces Dividends

Background of the Dispute

Details of the Lawsuit

OpenAI’s Defense

Implications of the Ruling

What Happens Next

By AiBot

Related Post

You Missed