June 23, 2023
At a Glance
- Microsoft researchers showcase phi-1, a new code generation model that’s just one billion parameters.
- Microsoft also unveils ZeRO++, an improved way for GPUs to communicate that’ll boost AI training and finetuning.
AI researchers from Microsoft have published a new code generation model, phi-1, that’s designed to be lightweight - and it outperforms GPT-3.5, the large language model behind ChatGPT.
It took Microsoft’s researchers just four days to train phi-1 using eight A100 chips from Nvidia. The model was trained on six billion tokens from the web as well as a further one billion tokens generated using GPT-3.5, one of the underlying models used to build OpenAI’s ChatGPT.
In performance, phi-1 scored a pass@1 accuracy of 50.6% on the HumanEval benchmark. The Microsoft model beat StarCoder from Hugging Face and ServiceNow (33.6%), OpenAI’s GPT-3.5 (47%) and Google’s PaLM 2-S (37.6%) despite being substantially smaller in size.
On the MBPP pass@1 test, phi-1 fared better, achieving a 55.5% score. A lot of the aforementioned models have yet to publish results on this benchmark, but WizardLM's WizardCoder scored 51.5% in a test conducted earlier this month. WizardCoder is a 15 billion parameter model vs. 1.3 billion for phi-1.
High-quality data makes the difference
Microsoft's researchers argue that it's the "power of high-quality data" why phi-1 performs so well. To bring the point home, the researchers named their model's paper, 'Textbooks Are All You Need.’
“Just as a comprehensive, well-crafted textbook can provide a student with the necessary knowledge to master a new subject, our work demonstrates the remarkable impact of high-quality data in honing a language model’s proficiency in code-generation tasks," they wrote.
“By crafting ‘textbook quality’ data we were able to train a model that surpasses almost all open-source models on coding benchmarks such as HumanEval and MBPP despite being 10x smaller in model size and 100x smaller in dataset size.”
Phi-1 is limited to Python coding, compared to other coding models available. They said the model is also limited in that it lacks the domain-specific knowledge of larger models such as programming with specific APIs.
To expand on their work, Microsoft’s researchers have suggested using GPT-4 rather than GPT-3.5 to generate synthetic data for the model’s training.
The researchers would also look to improve diversity and non-repetitiveness in its dataset, although the team said they would have to find ways to “inject randomness and creativity into the data generation process, while still maintaining the quality and the coherence of the examples.”
ZeRO++: Accelerating large model fine-tuning
Microsoft’s researchers also announced this week ZeRO++, a new method designed to improve large model pre-training and fine-tuning.
Large AI models like ChatGPT and GPT-4 require vast memory and computing resources to train and fine-tune.
Sometimes when training on a large number of GPUs relative to the batch size, it results in a small per-GPU batch size, requiring frequent communication.
To address this, Microsoft introduced ZeRO++, a system that leverages quantization - the process of mapping continuous infinite values to a smaller set of discrete finite values – combined with data, and communication remapping, to reduce total communication volume by 4x compared with ZeRO, without impacting model quality.
Stay updated. Subscribe to the AI Business newsletter
Effectively, ZeRO++ is designed to improve communication between the model you’re trying to train and the GPUs if the hardware you’re using is too small relative to the model’s size.
According to Microsoft’s researchers, ZeRO++ enables low-bandwidth clusters to achieve similar throughput as those with 4x higher bandwidth.
The team behind the system claims it offers up to 2.2x higher throughput compared to ZeRO, Microsoft’s earlier training optimization system.
ZeRO++ is available for anyone in the AI community and can be accessed via GitHub. The researchers announced that a version for chat will be released “in the coming weeks.”
Read more about:ChatGPT / Generative AI
About the Author(s)
You May Also Like