Hugging Face, ServiceNow Launch Open-Source Coding LLMHugging Face, ServiceNow Launch Open-Source Coding LLM

New open source code generation model could lead to a 'wide range of interesting applications'

Ben Wodecki

May 8, 2023

2 Min Read
Superb Images/Getty Images

At a Glance

  • Hugging Face and ServiceNow researchers unveil StarCoder LLM: a small but powerful open source code generation model.
  • StarCoder was found to outperform Google’s PaLM and Meta’s LLaMA at popular benchmarks despite its small size.

Enterprise workflows company ServiceNow and Hugging Face, an ML tools developer, have developed an open source large language generative AI model for coding.

The pair unveiled StarCoder LLM, a 15 billion-parameter model designed to responsibly generate code for the open-scientific AI research community.

StarCoder was the result of ServiceNow and Hugging Face researchers taking the StarCoderBase model, which was trained on licensed data from GitHub spanning over 80 programming languages, and finetuning it on 35 billion Python tokens.

The result is a model that the pair contend outperforms existing open code generation models as well as closed models such as OpenAI’s code-cushman-001, the original Codex model that powered early versions of GitHub Copilot.

According to ServiceNow and Hugging Face, the model boasts a context length of over 8,000 tokens meaning it can process a sizable amount of input that could "enabl(e) a wide range of interesting applications.”

The Hugging Face researchers explain in a blog post: “For example, by prompting the StarCoder models with a series of dialogues, we enabled them to act as a technical assistant. In addition, the models can be used to autocomplete code, make modifications to code via instructions, and explain a code snippet in natural language.”

Related:The Risk of Placing Too Much Faith In AI

StarCoder was put up against several similar models, including Google’s PaLM and LaMDA, and Meta’s LLaMA. The models were evaluated using several benchmarks including HumanEval.

Despite being significantly smaller in size, the researchers found that both StarCoder and the underlying StarCoderBase outperformed its rivals.

The team also observed that by forcing the model to generate an actual coding solution, it increased its performance score.

StarCoder was also tested on MultiPL-E, a multilingual benchmark and achieved similarly impressive results, according to the research team behind it.

The joint effort will “enable the release of powerful base models that empower the community to build a wide range of applications more efficiently than a single company could come up with,” said Leandro von Werra, machine learning engineer at Hugging Face and co-lead of BigCode.

ServiceNow’s research arm and Hugging Face launched the joint BigCode Project in September last year. The project continues to operate as an open scientific collaboration with working groups, task forces and meetups.

The launch of StarCoder follows Hugging Face’s announced it had developed an open source version of ChatGPT, called Hugging Chat.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor for AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as Assistant Editor before being promoted in April 2023. He has previously written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter.

You May Also Like