Sponsored By

Meta Scientist: How Large Language Models Work, AI Summit NY 2023

Meta's Kanika Narang peels back the veil on the workflow of large language models

Deborah Yao

December 11, 2023

8 Min Read
Meta's Kanika Narang

Generative AI is a term that has been used a lot this past year, but how is it different from other AI? Kanika Narang, senior AI research scientist at Meta, demystified the technology in a session at the AI Summit New York 2023.

Unlike other forms of AI − which typically have predetermined rules, use structured data and are meant for specific tasks − generative AI harnesses neural networks to create new and original content autonomously without being explicitly programmed to do so.

As many users well know, generative AI models are “actually so good right now that they can do many different tasks,” she said, such as in writing poems or generating realistic images of animals – on Mars.

The backbone for these models, especially for text, are large language models. They are large because they are trained on a vast amount of data. “Think about everything which is out there on the web − all the books that have been printed,” she said. “Humans would take 20,000 years to read all the knowledge encapsulated in these models.”

Using a vast dataset to pre-train these LLMs enables them to do many tasks such as summarizing, translation or question-answering.

The training process

Behind the scenes, the training process works this way: The model looks at the training data, at the sample sentences and then will “tokenize them at a word level,” Narang said. “These words are embedded into a representation … which can encapsulate the semantic meaning of these words.”

“These embeddings along with the positions they occur in the sentence −  they are fed to a model, which is popularly known as transformers,” she continued. “What these transformer models are doing is they are trying to predict probability. They understand the probability of the next word given whatever the input context is.”

“Even with a simple technique, because they are fed with so much data and they can look at much longer histories … (they can) figure out what word should come next” in a sentence, she said.

She said many companies have created LLMs, the most famous of which is OpenAI with its GPT series. Meta has launched Llama 2, which is trained on 1 trillion tokens with sizes of 7 billion, 13 billion or 30 billion parameters.

How to choose an LLM

To choose the best model for your company, Narang said, consider the following factors:

  • Performance: Make sure the LLM is performing well for your use case

  • Latency: Bigger models generally perform better. But depending on your resource, maybe a smaller model can perform just as well for your use case.

  • Cost: Bigger models come with higher costs, since it takes more time and more compute to infer those models.

  • Safety: How safe are the model generations? If your use case is for a younger audience you want to make sure the output is safe for them.

  • Reliability and interpretability

Take for example a pharmaceutical company that developed a cure for COVID-19 and now wants to market its drug. The LLM is tasked with creating a marketing campaign for TV, social media and physical locations.

To start, write a prompt to the LLM that has relevant context: “Imagine you work at a marketing firm,” Narang said to insert in the prompt. Why? “These strategies help the model ground its generation. This kind of prompting is called zero shot because it does not have any other context.”

But even with a simple prompt, the LLM can come up with a “very decent” marketing campaign, including TV commercial ads, brochures and social media posts, she said.

To improve the output, give the LLM more context based on past strategies you know have worked for you in the past. For example, tell it to focus more on leveraging social media highlighting real life stories, or partner with doctors who can talk more about the drug.

“This is called few shot prompting” where the model can learn and refine its results, Narang said.

But because the model does not have a lot of context, it is prone to hallucinate, which is when LLMs generate convincing answers that are false.

What is RAG?

One way to minimize hallucinations is to train it on proprietary or private data using a technique called Retrieval Augmented Generation or RAG.

Going back to the prior example of a pharmaceutical company that wants to create a marketing campaign around its COVID-19 cure, it decided to give the LLM access to clinical trials and patient reviews. To use RAG, start by adding this data into the prompt itself so the model can ground its answers using this external information, Narang said.

How does it work behind the scenes? There is a retriever that goes to an external knowledge base – in this case enterprise data − and retrieves the most similar documents that can be helpful and then adds it to a prompt.

“You can use any out-of-the-box retriever or even fine-tune your retriever for your use case,” Narang said. Think of this process as giving more knowledge to the LLM, similar to having an external plug in.

“RAG architecture is a great tool if you want the model to have access to domain-specific or private information to perform tasks. It reduces model hallucinations because you are trying to constrain the model to only answer questions based only on your documents or internal data,” she said. Even as an added step, “it is still a faster path to production because you are not going anything to the model but you are just changing the prompting.”

RAG’s limitations

However, RAG does not work for scaling to new domains, new languages or other styles. It also increases the token length since you are adding documents so you will have less space in the context window. That means some of your data might get cut off by the model so you might have to deploy some efficient retrieval techniques or training.

While RAG can be used in most use cases, in some use cases it is not useful, she said.

For example, if you want the output in a specific format like Excel but the model has not seen this format, it will not be able to do it. Also problematic is wanting to get answers in a specific style – for example if you want a chatbot to answer questions in a certain manner.

As for function calling – such as accessing a Google API or internal APIs – you need to you do something more: Fine-tuning, or training the model further. However, there is no need to train the model from scratch; you can reuse a pretrained checkpoint and then further train it for your use case.

“Another advantage of fine-tuning is a lot of the times smaller models fine-tuned can perform actually better than a larger model. Even if you have fixed training costs, which is one time, at inference time it will be much lower than the larger model,” she said.

The RLHF technique

Another technique for fine-tuning is Reinforcement Learning Through Human Feedback (RLHF). Here, you can fine-tune the model on additional metrics to generate outputs that are, for example, more helpful, safe and diverse.

How does it work? It starts with fine-tuning on a small sample and then asking the model to generate responses. Next, human annotators rank the generations based on specific task metrics that have been defined. The model either gets feedback in the form of a “reward” or “penalties” allowing it to learn optimal strategies over time.

Access to positive and negative data makes the training more robust, Narang said.

To fine-tune your model, keep in mind the following:

  • Make sure you collect high-quality data collection. The old adage GIGO (garbage in, garbage out) applies. If you do not have enough quality data, you can generate it from the LLM itself and add human annotation.

  • Make sure to evaluate the model, whether using people to do it or an automatic evaluation

  • Ensure training efficiency by using techniques to lower costs further.

  • Non-tech considerations: These models tend to regress because the data keeps on changing. Make sure you have feedback loops established to collect feedback on the models not performing well. Also, invest in staff that can train and maintain these models. Make sure your data is private. If you are fine-tuning on third party servers, make sure the data is anonymized and encrypted. Finally, the model should be ethical, unbiased and safe.

New LLM trends: Multimodality and on-device

What is next for LLMs? Multimodal models is one such trend. These are models that can accept different modalities as inputs, such as image, video, audio along with language.

“I’m very excited about this,” she said. “It can be used to power a lot of applications.”

For example, upload an image of an alcoholic beverage and ask the model to not only identify it but also what recipes it can be used in, she said. Another example is when a bicyclist asks the AI for directions. The model can understand the rider is on a bike and identify bike-friendly lanes to use.

Other applications include those in health care where medical imaging analysis yields more holistic patient reporting and diagnosis, Narang said. Customer service is also ripe for multimodal models, where intelligent assistance can enhance trouble-shooting if clients are able to upload images. Also, marketing campaigns can get a leg up as well where an LLM can create images and videos along with text.

Another new trend is on-device LLMs, Narang said. Models are becoming larger but they do not fit all use cases. Especially when it comes to handling sensitive data, users may not want the information to pass through a third-party server but keep it on the device at the edge, she said.

“So more and more effort is going into translating a lot of this knowledge into smaller models,” Narang said.

About the Author(s)

Deborah Yao

Editor

Deborah Yao runs the day-to-day operations of AI Business. She is a Stanford grad who has worked at Amazon, Wharton School and Associated Press.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like