Weak Supervision: How a New Technique in Machine Learning Makes AI Easy for Everyone

November 6, 2018

6 Min Read

by Abby Levenberg and Adam Devine

A manager in claims underwriting at a large insurance company is tired of her team getting swamped by the tedious, time consuming work of the claims appeal process. Budget cuts have led to headcount reductions, submission volume is rising, and customer satisfaction has fallen for the fifth consecutive quarter.

It’s time to change the way they work. So, she sends a link to a simple browser-based user interface to three people on her team, and through this UI they spend a few hours tagging and highlighting the key information in customer appeal submissions.

After manually processing less than one hundred different appeals between them, a machine learning (ML) model has learned enough about the process to teach itself the rest. By the end of the same workday, AI-driven automation software has taken over the claims appeals process, saving a global insurance company over $2 million a year and decreasing customer response time by 10x.

Sound like business science fiction? It isn’t.

The dream of self-service AI is now a reality

Even a year ago, it would have taken ten times as long for that insurance operation to use AI to automate the judgment work of insurance appeals, chiefly because of the time it takes to create a big enough data set to train machine learning models.

Moreover, it would have taken a dozen or more highly trained subject matter experts (SMEs) weeks of manually tagging various forms of documentation to create a dataset large enough to sufficiently train ML models to perform work with so many variables and conditions.

Recent advances in ML have made it faster and simpler to train models to automate judgment work with unstructured data, which, according to Gartner, is not well suited to most providers of process automation.

Central to these advances is a new technique in the AI world called weak supervision. Despite its inauspicious name, it’s a capability that will democratize AI for business people by reducing the effort required to create the large data sets that machine learning (ML) needs for training.

The historical barriers to self-service AI

Weak learning has eliminated one of many barriers between AI and business people. In addition to creating a large training data set with SMEs, companies had to create a large data science department to apply that training data to models.

After SMEs identified and classified valuable data within countless documents, PDFs, TIFFs and other forms of unstructured content, data scientists would then take clean data sets, select appropriate machine learning models, apply the data to models, select features and run experiments until model accuracy reached acceptable levels.

On top of SME and data science time and expertise, AI has also been a burden on IT departments. Third-party AI products have required extensive integration and configuration to work within the complex landscape of enterprise systems and applications. Pairing two separate RPA and AI products, a common approach to automating processes with repetitive transaction tasks and unstructured work, compromises data fidelity, which inhibits analytics, and it poses more risk by sending data into the cloud.

All of these barriers have been lowered by the combination of:

weak learning for automating data tagging for ML training,
AutoML for automating model selection and training
the native integration of RPA and AI on a single platform.

Automating data tagging

Not all tasks are equally complex. Some are entirely repetitive and follow an invariable set of rules. For these tasks, RPA is ideal. Other tasks have fewer parameters and don’t require large training data sets, which makes hand-labeling data manageable. But many tasks within banks, insurance companies, healthcare organizations and other data-intensive enterprise operations have high volumes of unstructured information and require natural image and language processing.

This type of work has a large number of parameters and requires complex ML models. To ensure automation ROI, it is critical to automate the labeling of the high volumes of training data that complex models need for training.

The work of identifying and tagging the valuable fields in unstructured data like documents, email messages, and PDFs in processes such as customer onboarding in banking, claims handling in insurance and pharmacovigilance in the pharmaceutical industry could take a team of SMEs months.

This is where weak supervision comes in. Rather than tagging each and every sample, SMEs tag a small subset of data, and weak supervision machine learning creates heuristics to tag the rest of the data based on the limited learning generated by those first few samples.

Automating ML training

Weak supervision on its own automates only the first part of the self-service AI assembly line. Critical to making AI easy for business people is automating the complex data science work of model selection and training.

AutoML, a term popularized by Google, automates the work of selecting and training the right models for a given business process. With sufficient training data, AutoML performs in unison hundreds, sometimes thousands of experiments to determine the optimal model and features to automate a cognitive task.

After bringing the model up to a minimum threshold of completeness and accuracy, it turns the model into a bot that replaces manual effort. AutoML allows someone with no knowledge of data science or ML to automate decision-based work. Google pioneered AutoML for image recognition with its Neural Architecture Search (NAS), and WorkFusion pioneered AutoML for automating knowledge work in its Smart Process Automation product.

What’s the benefit?

The faster and more efficiently a company can deliver a service, the wider its margins. Outsourcing increased margins in the 80s and 90s by moving functions to lower cost labor markets. IT projects in the late 90s and 00s automated functions by turning manual functions into custom lines of code.

Both delivered linear improvements to efficiency, but both required significant capital investments. RPA has grown in popularity because it requires less capital expense and lets business people integrate systems that don’t have usable APIs, but businesses have found that maintaining rules-based RPA is costly due to exceptions and constaint bot retraining.

The last frontier in business efficiency is the automation of decision-based cognitive work, which represents 60 to 70% of the typical business process in most industries.

The combination of weak supervision and AutoML puts the ability to automate this more variable, expensive but ultimately repetitive work entirely in the hands of the business people who understand the process and the outcome.

It reduces the time and effort to create automation, and it makes intelligent automation more autonomous and more reliable. Leading businesses have already put weak supervision and AutoML to work through WorkFusion SPA, and analysts expect this new breed of intelligent automation to scale rapidly in 2019.

adam-devine_workfusion_vp-prod-marketing_workfusion.jpg

Adam Devine is Chief Evangelist for Workfusion

Abby Levenberg is Workfusion's VP of Data Science

About the Author(s)

Ciarán Daly

See more from Ciarán Daly

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

The dream of self-service AI is now a reality