Maximizing neural networks with transfer learning

by Jelani Harper 1 october 2019

The amount of training data required presents the most consistent challenge for deploying deep neural networks at scale to resolve everyday business problems. Most machine learning tasks require huge amounts of labeled training data that exceed the time, effort, and overall resources organizations can spend on operationalizing this technology.

Although there are multiple ways to reduce the quantity of training data required to leverage neural networks for process automation, transfer learning is likely the one most frequently used in production. It’s particularly relevant for neural network applications of deep learning.

According to Tom Wilde, CEO of intelligent automation specialist Indico, if organizations are to try to train intelligent bots to perform text analytics on unstructured data as part of business process management, they would need to teach them with “a hundred thousand to a million examples.”

Conversely, if they circumvented these huge data demands with transfer learning, “You only need to teach it using about 200 examples,” Wilde said. “Once you’ve shown [the neural network model] 200 examples of the task it’s trying to learn, you can build these bots in hours: not days, or months, or years.”

The reduced time and effort required to train models with transfer learning makes neural networks more viable in the enterprise, enhances their utility with deep learning, and makes process automation tasks associated with unstructured content horizontally accessible.

Training data complexities

Transfer learning overcomes the need for massive amounts of training data typically used to teach models, by applying a model’s “existing knowledge into some other application or domain,” explained Arpan Shrivastava, data scientist at Near. Pragmatically, organizations can leverage transfer learning with a vendor’s multi-purpose machine learning model to substantially accelerate the model training period. Transfer learning is frequently deployed with neural network tasks involving natural language processing and image recognition. It’s also gaining credence with Robotics Process Automation workflows, as demonstrated by Indico’s recent partnership with UiPath. Transfer learning enables the enterprise to overcome obstacles related to:

  • Training data shortages: Perhaps the greatest value transfer learning provides is empowering the enterprise to do more with less. In situations in which organizations simply don’t have enough data to train models, transfer learning enables them to still complete this task with the data quantities available.
  • Labeled training data shortages: Although some organizations might have plentiful datasets to train machine learning models, those datasets might be unlabeled. Transfer learning decreases the resources required for annotating this data, which often involves either time-consuming manual processes or outsourcing this task.
  • Imperfect training data: Transfer learning can account for myriad discrepancies in available training data for a specific job. For example, it enables organizations to utilize different languages for training data and for data the model encounters in production for text analytics. It can also accommodate training data imbalances so organizations can teach image recognition systems to classify pedestrians at night, when most of the training data is of pedestrians during the day.

Neural network applicability

Once organizations are able to surpass the difficulties of training neural networks by utilizing transfer learning, they can achieve an astounding degree of effectiveness operationalizing them. Wilde observed: “The essence of neural networks is good at fuzzy problems because, what neural networks do is, they attempt to featurize the problem at a very, very fine-grained level. That’s why neural networks have been successful with machine vision, machine translation, speech to text, and modern NLP.”

The use case for training neural networks with transfer learning is particularly beneficial to intelligent document processing for unstructured text. Conventionally, text analytics platforms require lengthy periods of defining vocabularies, taxonomies, parts of speech, and exhaustive attribute lists to classify and extract information from text. When neural networks are applied to the same job with the computational advantages of deep learning, they effectively say “I don’t need you to do that for me,” Wilde explained. “Just show me the outcome, show me the inputs you’re using, and I will reverse solve all of that historical featurization you used to have to give me.”

Consequently, organizations can conserve considerable time and manpower with intelligent process automation in verticals such as healthcare, in which documents may be emailed to an insurance carrier with pre-approval requests for surgery. Carriers must review these documents to approve the procedure. Based on neural networks’ ability to understand this unstructured text using the method Wilde described, they can automatically extract the relevant information: “the type of surgery, the diagnosis, the doctor, the patient—all the things they’re going to use to decide to come up with an initial stance to decide if [the procedure] should be pre-approved.”

Implementing transfer learning

The issues with the lack of labeled training data resolved by transfer learning directly correlate to supervised learning. This variety of machine learning uses source domain data to demonstrate the results of what a model’s target domain is supposed to predict. Many transfer learning approaches involve these concepts of source and target domains, and include:

  • Learning domain invariant representations: This transfer learning method is based on models learning the features that don’t change between source and target domains via an approach “that will leave only your unlabeled data of source and target domains as inputs,” Shrivastava explained. “Here we are creating some non-changing features in our source and target domain.” This approach has demonstrated success with NLP applications.
  • Using pre-trained features: Although this method is also based on using one model’s features (the source) to inform the features of another model (the target), “the challenge is to use those features that are very general,” Shrivastava commented. Therefore, for image recognition use cases in which models are trained to recognize cats using data about dogs, for example, “We have to learn the underlying structure of the images and we will be looking for more general features using pre-trained features,” he said.
  • Confusing domains: Transfer learning is also induced by deliberately confusing a model’s source and target domains. “Basically we’re training the internal structure of neural networks,” Shrivastava said. “In other words, we are intentionally confusing our model with respect to, say, cats and dogs. So, we’ll be keeping these two tasks together in a deep neural network. Then it will be confusing all the time whether it’s a cat or a dog, but it will be quite useful to us from a transfer learning perspective.”

Adopting transfer learning

These different transfer learning methodologies are critical to reducing the amount of example data organizations need to train deep neural networks. When paired with deep learning compute, neural networks are suitable for most enterprise process automation tasks including complicated text analytics use cases. At present, transfer learning may be the most productive means of addressing the training data issues inhibiting machine learning adoption.

“I think that this problem with AI in terms of its reliance on vast amounts of training data is something that a lot of us are actively trying to solve,” Wilde opined. “Transfer learning as a particular approach has turned out to be scalable and particularly efficient. I have yet to see [any other techniques] in production. But I think, broadly speaking, there’s a recognition that this is a serious pain point that has to be addressed for AI to go mainstream.”


Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.