AI comes for the coder – but still has a long way to go
Microsoft's GitHub and OpenAI have launched a technical preview of an AI-based tool called Copilot, capable of auto-completing code snippets.
GitHub said that Copilot does not just fill the lines based on what it has previously seen, but examines the code the software developer has already written to create new code – although early tests show it primarily achieves the former.
GitHub is the world's largest software repository, home to much of the public and private code that makes the modern world possible.
Code for code
"GitHub Copilot draws context from the code you’re working on, suggesting whole lines or entire functions," GitHub CEO Nat Friedman said.
"It helps you quickly discover alternative ways to solve problems, write tests, and explore new APIs without having to tediously tailor a search for answers on the Internet. As you type, it adapts to the way you write code – to help you complete your work faster."
On its project page, GitHub claims that Copilot is capable of converting comments to code, can autofill repetitive code, and suggest alternatives.
To pull this off, Copilot was trained on billions of lines of public code, a GitHub document explained. The post also linked to "a recent thought-provoking paper," about the relationship between the suggested code and the code that informed it.
The paper in question, called On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, was co-published by former Google AI ethics co-lead Timnit Gebru, and ostensibly led to her departure from the company. Further members have since resigned in protest, or been forced out.
"One company fires us for it and another company cites it," Gebru said, welcoming its inclusion in the GitHub project.
Nearly 300 GitHub employees have already used the tool in an internal trial. The 453,780 suggestions Copilot made were categorized, with most representing simple completions of code or repetitions. When asked to fill in Python functions, the system got them right 43 percent of the time on the first try, and 57 percent after 10 attempts. That dataset is available as open source for others to analyze.
Currently, when Copilot quotes code it has learned, it does not tell you where it is quoted from. However, GitHub plans to add in an origin tracker and a duplication search in the coming days and weeks.
There are other limitations: "The code it suggests may not always work, or even make sense," the company admitted.
"While we are working hard to make GitHub Copilot better, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted, like any other code. As the developer, you are always in charge."
The Copilot "doesn’t actually test the code it suggests, so the code may not even compile or run... GitHub Copilot may suggest old or deprecated uses of libraries and languages. You can use the code anywhere, but you do so at your own risk."
This was evidenced in one of the few examples of Copilot in action. Developer Nick Shearer pointed out that one of the promotional images of the tool "uses a float to store a monetary value, which is just plain wrong and would cause you no end of bugs and pain."
And there are other dangers, according to GitHub: "The technical preview includes filters to block offensive words and avoid synthesizing suggestions in sensitive contexts. Due to the pre-release nature of the underlying technology, GitHub Copilot may sometimes produce undesired outputs, including biased, discriminatory, abusive, or offensive outputs."
Copilot is built on the OpenAI Codex, a machine learning system created by startup OpenAI. After pivoting from a non-profit model, it has grown close to Microsoft – raising a billion from the company, using a custom-made Azure supercomputer, investing in companies together, and giving the software giant first dibs on its commercial output.
Microsoft acquired GitHub in 2018 for $7.5 billion.