November 8, 2022
Filed in the U.S. District Court for the Northern District of California, the 56-page suit claims Copilot, which is designed to auto-complete snippets of code, “violates the licenses that open-source programmers chose and monetizes their code despite GitHub’s pledge never to do so.”
“Copilot’s goal is to replace a huge swath of open source by taking it and keeping it inside a GitHub-controlled paywall,” the lawsuit reads.’
GitHub is an open source repository of computer code that anyone can use with attribution to the authors. OpenAI is an AI research lab. Microsoft owns GitHub and has a stake in OpenAI. In June 2021, GitHub and OpenAI launched Copilot, an AI model that trained on lines of code.
The plaintiffs, led by programmer and lawyer Matthew Butterick, alleges that Copilot reproduces their copyrighted code without attribution or notifying users of license requirements, according to the lawsuit.
A spokesperson from GitHub told AI Business, “We’ve been committed to innovating responsibly with Copilot from the start, and will continue to evolve the product to best serve developers across the globe.”
In a blog, GitHub said it has a feature that lets developers “block suggestions of 150+ characters matching public code” but acknowledged that it “doesn’t address all use cases.”
But new capabilities are coming in 2023 that will provide Copilot users with “an inventory of similar code found in GitHub public repositories” and “the ability to sort that inventory by repository license, commit date, etc.”
Alan Behr, a partner in the Intellectual Property Practice at the Phillips Nizer law firm, told AI Business that “the allegation in this case is that Copilot got hold of open source code, did not provide proper attribution and then monetized its modifications to the code. Simply, the claim is that Copilot violated the terms of open source licenses by which it was permitted, without charge, to make modifications to code."
"That is a very particular question of contract law regarding one business’s use of open source code. It would be hard to see how that would dampen the growth of generative AI in general,” Behr added. "The lawsuit is essentially for breach of licenses for open-source code; oversimplified, one way to look at it is as a contract dispute with complications. I would suggest not reading into it any broader implications at this time."
Statutory damages of $9 billion
Butterick began investigating the AI tool last month after a Texas A&MU professor called out Copilot for copying his code without attribution or a license. The lawsuit claims Copilot “simply reproduces code that can be traced back to open-source repositories or open-source licensees” and “never includes attributions to the underlying authors.”
The lawsuit contends that GitHub and OpenAI have “offered shifting accounts of the source and amount of the code or other data used to train and operate Copilot.” Moreover, the defendants “have also offered shifting justifications for why a commercial AI product like Copilot should be exempt from these license requirements, often citing ‘fair use.’”
The plaintiffs want a jury trial and also asked for statutory damages; an award of damages for harms resulting from the defendant’s breach of licenses; punitive damages resulting from unfair competition; plus the cost of attorney’s fees.
The plaintiffs estimate statutory damages of at least $9 billion − $2,500 incurred three times for each of the 1.2 million Copilot users. Each time Copilot spits out a code, it incurs three violations of not providing attribution, copyright notice and license terms, according to the lawsuit.