by Jelani Harper


SAN FRANCISCO – The passage of the General Data Protection Regulation was, in some ways, the most significant development for AI in Europe over the past year. It triaged regulatory compliance as the foremost concern for utilizing data-driven processes.

In its wake, organizations housing the PII of European Union citizens had to account for an expansive mandate in which they had to identify consumers’ personal information, locate it, classify it, safeguard it, redact it, and even make it available to consumers on request.

When applying these measures to unstructured data using traditional rules-based approaches or Robotic Process Automation (RPA), organizations are hard-pressed to satisfy these demands when working with data at scale. According to Indico CEO Tom Wilde, however, GDPR’s mandates—and those of almost any other regulation—are tremendously simplified with the timely application of deep learning.

Instead of configuring messy vocabularies, taxonomies, and business glossaries, deep learning “flips this over on its head,” Wilde divulged. “Instead of the computer requiring that we figure out how to talk to it the way it understands, [deep learning] inverts it and says show me examples of what you’re trying to achieve and the inputs you’re using, and I’ll figure it out myself.”

By applying deep learning to matters of regulatory compliance and legal concern, organizations are able to improve their overall effectiveness and efficiency for satisfying regulators, mitigating risk, and preserving their reputations as ethical stewards of data.


Related: Developing a data-driven culture through experiential analytics


Model training and scoring

Deploying deep learning for contract analytics or other document-based analytics within the scope of GDPR yields immediate business value. In legal departments, for example, professionals are tasked with ascertaining which documents (out of potentially thousands) contain information relating to regulations and need to be modified accordingly.

Manual approaches to this task involve cutting and pasting information from each document into spreadsheets for even more time-consuming, largely human, analysis. With Intelligent Process Automation (IPA) powered by deep learning, however, “you would train a model just showing it examples of GDPR or PII clauses that are in compliance and ones that are not in compliance, and then score all those contracts on a continuum between complying and not, and then only focus on the ones that are out of compliance,” Wilde revealed.

The model training period is substantially abbreviated with the use of transfer learning. According to Wilde, transfer learning could enable organizations to use as few as “100 clauses of complying or not complying” to train a deep learning model to solve this problem. Without transfer learning, organizations would need much higher amounts of annotated training data for a far lengthier model training period. Trained models produce outputs with confidence scores. Thus, when operationalizing the model, “if there’s a 90 percent score for complying, you can let it go; if there’s a 90 percent chance they’re not complying, then you know you have to look at those,” Wilde disclosed.


Related: How to build, train, test, and deploy a machine learning model


Intelligence process automation

When analyzing content for regulatory compliance, deep learning empowered IPA is frequently used for classification and extraction of important concepts found in unstructured data. There are an abundance of use cases in which traditional RPA has been supplanted by IPA to accelerate workflows in which unstructured data has become increasingly vital. Organizations looking to underwrite insurance policies, for example, must do more than take customer information from policy documents.

They must also readily parse unstructured “health records, applications, affidavits, and that’s all variable in third parties,” Wilde mentioned. “In the banking world, there’s institutional customer onboarding. You’re getting 10 or 20 documents, half of them from the customer’s paper, half of them from the bank’s paper. You’ve got to be able to handle all of those to be able to automate your process.” The rapid ability to analyze this unstructured content at scale can prove an essential first step for ensuring regulatory compliance, especially when it involves PII.

Auditing and redactions

Still, deep learning proves just as valuable when attempting to redact content from IT systems for the purpose of adhering to regulations. In this case, deep learning is influential when organizations need to “build a redaction model,” Wilde explained. “So, actually the inverse of extraction.” Although the same basic process of training the deep learning models is deployed in this use case, users can train redaction models “to recognize data that would cause problems relative to GDPR,” Wilde noted. “Then you can run content through that redaction model and ensure that it’s able to be stripped out.”

This capability is critical for auditing processes for regulatory compliance or data governance purposes. It focuses deep learning’s prowess on correcting any potential oversights which may have occurred during the initial classification and extraction phase to optimize regulatory compliance measures.

This tandem—classifying and extracting unstructured content with deep learning according to compliance needs, and auditing processes as a final safety mechanism—is the blueprint for “operationalizing compliance,” Wilde said. ‘Because now, if you have this deep learning model set up that’s acting as sort of a final filter before anything goes out the door, you have high confidence that you’re making sure that content is literally stripped out before it goes out.”


Related: Tailoring analytics dashboards for business using machine learning


A compliance pipeline

By using deep learning models to initiate approaches for classification and extraction, then leverage them to redact content relevant to regulations, organizations can effectively implement a pipeline for complying with GDPR or any other regulatory standard.

Such applications of AI technologies are perhaps just one of the many ways in which GDPR has impacted the way data-driven processes take place throughout the enterprise. “I think that the European Union is using this as sort of a fix,” Wilde observed. “If someone really appears to be misbehaving and they find out they’re not in compliance, there’s a very severe penalty for that. So I think this is partly the government trying to induce the right behavior.”

As the paucity of examples of non-compliance with GDPR readily indicate, it appears it’s working.


Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.