Intelligent document imaging with natural language processing and optical character recognition

Intelligent document imaging with natural language processing and optical character recognition

Max Smolaks

August 16, 2019

4 Min Read
A line drawing of an eye on a grid background

by Jelani Harper 16 August 2019

One of the core tenets of digital transformation is the ability to simply extract data from the physical world and input it into digital systems. Document imaging technologies—especially intelligent ones, incorporating facets of natural language processing (NLP), optical character recognition (OCR), and advanced analytics—are critical to enabling downstream IT systems to understand and produce action from the swath of data many organizations still have on paper.

When aided by NLP and certain machine learning models, users can readily integrate information from documents into AI-powered systems for a variety of uses including fraud detection, regulatory compliance, and process automation. The result is not only a significantly enhanced capacity to effect digital transformation at scale, but also the ability to increase overall efficiency by embedding facets of AI into mission-critical workflows, as opposed to mere fringe use cases.

Natural language OCR

Financial services industry offers one of the more convincing examples of the efficacy of intelligent document imaging assisted by NLP and OCR. John Ahearn, global head of trade for Citi Treasury and Trade Solutions, explained that such a tandem helped revolutionize his organization’s letter of credit business line which, as the name implies, was largely based on physical documents. “Banks are making financial commitments on behalf of importers and exporters around the world,” he said. “What we’re saying is, if you provide the following documents, the following goods, we’ll check it against the letter of credit, and if those terms and conditions match, we guarantee you’ll get paid. That’s how services move around the world.”

OCR technologies ensure that the information from such documents is scanned into IT systems for analysis. NLP enriches this process by enabling those systems to recognize relevant concepts in the resulting text, which is beneficial for machine learning analytics required for the items’ approval or denial. “Once we got [the papers] digitized, we started looking at using natural language processing as a way to strip off proper nouns of individuals and places, sending them to our filters, etc.,” Ahearn said.

Multiple advanced analytics models

There’s a direct correlation between the incorporation of NLP and the sort of advanced analytics models Citi was able to run on information gleaned from these documents to learn about its clients’ transactions for what traditionally was “a very paper-intensive…and complicated process,” SAS CTO and COO Oliver Schabenberger remarked. NLP is responsible for processing the multiple parts of speech in the text, which is essential for extracting relevant concepts for analysis.

Although there are different forms of analytics that can be used to better understand customers, their histories, and the purport of their transactions, the basic premise is “technologies like natural language processing and auto advanced analytics techniques like random forest, gradient boosting, [are] very helpful in making this a more automated system,” Schabenberger said. The ensemble modeling techniques he referenced are utilitarian for detecting patterns involving the transactions that influence whether they’re ultimately sanctioned or not.

“The one thing we were never able to do before was, we always looked at a transaction on a very vertical level,” Ahearn said about the manual process previously in place for this use case. “We would look at the data elements that we had in that particular transaction, we would do comparisons and say, yeah it looks okay, or it doesn’t. We were never able to look horizontally. We didn’t really know what the trading patterns of some of these clients were.”

Fraud, regulatory implications

Today, however, because of the speed with which it can scan documents with OCR, intelligently process them with NLP, and run machine learning analytics on the results for transaction approval, Citi is able to get extensive, horizontal views of its client base for “intelligent decision-making, seeing beyond the silos and integrating information across,” Schabenberger said. The implications for fraud detection, regulatory compliance, and trade compliance are considerable.

“We had another example where it was supposedly a domestic leasing company that was selling high value cars to China because they were able to get around some of the import tariffs and some of the restrictions,” Ahearn recalled. “That wasn’t what we thought this customer was doing. So, it’s given us great vision into exactly what the clients are doing and how they’re doing it.”

The ability to draw patterns among different transactions, compliance and trade policies, and entities found via the synthesis of OCR and NLP revolutionized the document imaging process for Citi. It can produce the same boons for other companies as well. This approach is one of the more tangible demonstrations of the fact that true digital transformation is worth all of the current industry hype. Furthermore, it attests to the wide-sweeping utility of the various forms of AI, particularly when they coalesce into a single use case at the center of fundamental business processes. 

Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like