Alexa, Siri and Google Translate all use them. (First of a 2-part series.)
What ties Amazon’s Alexa, Apple’s Siri, Hubspot's ServiceHub and Google Translate together? They are all built using language models.
Your Alexa and Siri can process speech audio. You can use ServiceHub to improve your customer service needs. And you can translate phrases at the touch of a button using Google Translate. But you cannot perform any of these natural language processing (NLP) tasks without a language model.
But what is a language model? Why is it so crucial to NLP applications? What are parameters? How does one model differ from another? And what are some examples of need-to-know models? AI Business is here to help you navigate the model maze.
What is a language model?
A language model is a deep learning model that uses probabilistic methods to predict the likelihood of certain sequences occurring. Often, this refers to the sequence of words occurring in a sentence but can be extended to things such as protein structures.
By analyzing the data – such as text, audio or images − a language model can learn from the context to predict what comes next in a sequence. Most often, language models are used in NLP applications that generate text-based outputs, such as machine translation.
For example, when you start typing an email, Gmail may complete the sentence for you. This happens because Gmail has been trained on volumes of data to know that when someone starts typing “It was very nice …” often the next words are “… to meet you.”
How do language models work?
Language models are fed sizable amounts of data via an algorithm. The algorithm determines the context of the data − in natural language, not computer-speak − to determine what comes next. The model then applies what the algorithm dictates to produce the desired outcome, such as making a prediction or producing a response to a query.
Essentially, a language model learns the characteristics of basic language and uses what it gleaned to apply to new phrases.
Related story: 7 language models you need to know
There are several different probabilistic approaches to modeling language, which vary depending on the purpose of the language model. From a technical perspective, the various types differ by the amount of text data they analyze and the math they use to analyze it.
For example, a language model designed to generate sentences for an automated Twitter bot may use different math and analyze text data differently than a language model designed for determining the likelihood of a search query.
There are two main types of language models:
- Statistical Language Models – which include probabilistic models that can predict the next word in a sequence.
- Neural Language Models – which use neural networks to make predictions.
Statistical Language Model types
Unigram — Among the simplest type of language model. Unigram models evaluate terms independently and do not require any conditioning context to make a decision. It is used for NLP tasks such as information retrieval.
N-gram — N-grams (also known as Q-grams) are simple and scalable. These models create a probability for predicting the next item in a sequence. N-gram models are widely used in probability, communication theory and statistical natural language processing.
Exponential — Also referred to as maximum entropy models. These models evaluate via an equation that combines feature functions and n-grams. Essentially, it specifies features and parameters of the desired results and leaves analysis parameters vague. Most commonly found in computational linguistics applications.
Neural Language Model types
Speech Recognition: The likes of Alexa and Siri are given their voices through this model type. Essentially, it gives a device the ability to process speech audio.
Machine Translation: The conversion of one language to another via a machine. Google Translator and Microsoft Translate use this type.
Parsing Tools: This model type analyzes data or a sentence to determine whether it follows grammatical rules. Tools like Grammarly use parsing to understand a word’s context to others in a document.
How do models differ from one another?
One common differentiator is size – some language models have a few million parameters while some can have billions or even trillions in some instances.
Parameters are the values that a model learned during the training process. Think of a parameter as a string of related information such as “Ben Franklin born January 1706” or “Boston capital of Massachusetts.”
A model's parameter count essentially determines how skilled it is in solving a problem or query – it makes sense that the more information it has the better it can perform. However, just because a model is larger does not always mean it performs better than a smaller model. This occurs, for example, when the smaller model specializes in certain topics.
Models can also differ based on their quality. To evaluate a model's quality, there are several human-created benchmark tests and datasets that can be used. Such datasets include the following:
Microsoft Research Paraphrase Corpus: a text file containing 5,800 pairs of sentences taken from news sources on the web.
Stanford Sentiment Treebank: consists of 11,855 single sentences extracted from movie reviews to gauge sentiment.
General Language Understanding Evaluation (GLUE) benchmark: a model-agnostic collection of nine sentence- or sentence-pair language tasks built on existing datasets and selected to cover a range of dataset sizes, text genres and degrees of difficulty.
Corpus of Linguistic Acceptability: consists of 10,657 sentences from 23 linguistics publications.
Uses of the technology
Chatbots – The most common example most humans interact with on a daily basis. Models power chatbots to conduct basic tasks like answering queries and directing customers. NLP-powered chatbots can be used to save humans from handling basic tasks – only to be brought in for more advanced circumstances.
Sentiment analysis – Using NLP to gauge the mood and tone of a document. Models can power sentiment analysis tools to identify emotions and contexts depicted in a text document. Commonly found in things like product review monitoring or grammar-checking tools.
Data mining – Used to find anomalies or correlations within large datasets, data mining can be used to extract information in an enterprise setting, like customer relations management, or to obtain relevant data to build an AI model.
Security – NLP models can be used to improve a business’s security arsenal. An enterprise can integrate model-powered algorithms that can extract additional context from a user’s personal information – and generate questions for them to answer to provide them access.