October 18, 2022
At Nvidia’s recent GTC conference, the company announced a domain-specific cloud service and associated framework called BioNeMo for building, training and tuning of large language models (LLMs) for the life sciences industry.
The ability of LLMs to address large amounts of data along with their transfer learning and self-supervised methods allow them to be applied to large chemical and biological datasets.
Nvidia’s BioNeMo framework is an extension of the NVIDIA NeMo Megatron framework for GPU-accelerated training of large-scale, self-supervised language models through support for industry data formats such as SMILES and FASTA strings − which are textual representations of chemical and biological nucleotide sequences.
In doing so, life science researchers can leverage ever-growing chemical and biological datasets to build larger models resulting in better-performing neural networks for drug discovery.
Also, the BioNeMo cloud service will host pre-trained, domain-specific models for biology and chemistry optimized for inference enabling that developers can get up and running quickly:
ESM-1: This protein LLM, based on the state-of-the-art ESM-1b model published by Meta AI, processes amino acid sequences to generate representations that can be used to predict a wide variety of protein properties and functions. It also improves scientists’ ability to understand protein structure.
OpenFold: The public-private consortium creating state-of-the-art protein modeling tools will make its open-source AI pipeline accessible through the BioNeMo service.
MegaMolBART: Built in collaboration with AstraZeneca and trained on 1.4 billion molecules, this generative chemistry model can be used for reaction prediction, molecular optimization and de novo molecular generation.
ProtT5: The model, developed in a collaboration led by the Technical University of Munich’s RostLab and including Nvidia, extends the capabilities of protein LLMs such as Meta AI’s ESM-1b to sequence generation.
These announcements by Nvidia have multiple implications and potential benefits for the industry. First and foremost, advancements in GPUs and associated software have increased the performance of processing AI workloads.
Nvidia is reporting significant reductions in the time required to train transformer models − and subsequently the time required to perform industry specific tasks such as genomic sequencing and analysis. This increased performance will reduce costs through faster compute times.
Secondly, the hosted cloud service itself is an interesting development. It reflects a rising trend within the industry of hardware vendors entering into the SaaS and PaaS space, and in a sense, competing with hyperscalers on compute provision.
Helps resolve two major constraints
This domain-specific cloud service also helps to address two major constraints within the industry.
First, the skills shortage: The framework and hosted service are maintained by Nvidia and designed to work together, making it easier for enterprises to adopt AI. Second, the service will run on Nvidia hardware and architecture, potentially alleviating the chip shortage for customers but at the same time providing another channel to market Nvidia’s chips.
Additionally, offering the platform as a cloud service along with GPU virtualization, one enhances hardware utilization while converting the cost for enterprises from capital expenditure (CAPEX) to operational expenditure (OPEX), which is customary in ‘as-a-service’ pricing models − and this reduces the amount of funding necessary for enterprises to get up and running using LLMs in their drug discovery workflows.
The developments represent a democratization of these tools and resources to a certain extent. The faster processing performance reduces costs and the time needed to complete AI-related tasks. The cloud service will make Nvidia’s GPU-accelerated compute capability available to a wider audience and offer an alternative to hyperscalers.
Furthermore, partnerships with organizations such as the Broad Institute will bring Nvidia’s hardware and software technology to additional industry R&D platforms.
While the total impact of these announcements is yet to be determined, they represent important developments for the industry. LLM support for biology and chemistry lets researchers build better models by enabling them to use larger datasets, an important development as the domain space for life science in terms of bytes is massive and expanding continually.
Faster processing saves time and money, enabling further experimentation through lower costs but also faster iteration. A domain-specific framework packaged and maintained by a large, well-resourced vendor coupled with a cloud service are all steps in the right direction.
About the Author(s)
You May Also Like