Automating data quality in the Internet of Things era

Automating data quality in the Internet of Things era

Max Smolaks

August 21, 2019

6 Min Read

How Master Data Management can help create order out of chaos

by Jelani Harper 21 August 2019

As enticing as the massive amount ofstreaming data in the Internet of Things is, it presents a number of pronounceddata management challenges for ensuring that information adheres to enterpriseconventions of data quality. Consequently, many organizations aren’t gettingthe full value out of their IoT initiatives.

“With the IoT producing massive amounts of data, a lot of enterprises are leaving it out of analytics because it’s just too cost prohibitive with time and resources to try and map all of this into the analytics models,” said Katie Horvath, CEO of data accuracy specialist Naveego

The timely incorporation of machine learningtechnologies can help improve data quality, data profiling, and aspects ofMaster Data Management (MDM) to consistently automate these operations forreliable analytics models.

This process not only reinforces thetransformation of the IoT into the Intelligent Internet of Things (IIoT), butalso delivers “insight into that [IoT] data such that it becomes part of dataquality and part of the golden record,” Horvath said. “Really, what it leads todown the road is auto mapping of IoT devices.”

The golden record in MDM is a single, all-encompassing dataset that capturesall the necessary information from enterprise systems of record and is assumedto be 100 percent accurate.

Intelligent data profiling

Machine learning is directly responsible for the automation of data profiling capabilities that are essential for ensuring good data quality. Data profiling is the rapid statistical assessment of core data attributes to determine structure, format, and other key characteristics. When it’s coupled with machine learning and its advanced pattern identification, it becomes a perfect fit for data quality operations.

This facet of artificial intelligence allowsorganizations to “use profiling to learn about what the data is inside of adata source, as well as build automated data quality checks,” Naveego CTO DerekSmith explained. “We use the profiling and pair it with machine learning tobuild quality checks off of the profiled data.”

Machine learning techniques are effective forpinpointing patterns in sensor data related to sensitive or personallyidentifiable information, and other characteristics of interest in specific usecases. They enable organizations to “make sure that the data is what theyexpected,” Smith said. Competitive options in this space can profile data atthe cloud’s fringe to support edge computing and additional IoT deployments.

Data quality

Machine learning is instrumental in automating data quality measures based on data profiling, which is essential for rapid deployment of IoT. “There’s a human workflow side of it,” Smith said. “We are making data quality suggestions and then allowing the user to see those and put those in place.”

Data quality suggestions might include adviceto protect sensitive information (with mechanisms like masking) or to simplyissue notifications that pressure readings in the oil and gas equipment, forexample, are outside of specified ranges—possibly warranting user action. Thisapproach automates four dimensions of data quality:

  • Accuracy: Automated data quality checks confirm that data conforms to ranges and characteristics outlined by users, informing them of any variation.

  • Consistency: Implicit to the accuracy of these data quality measures is the consistency of the data profiled.

  • Recentness: Connecting data profiling capabilities with those typical of MDM (such as the golden record) provides visibility into whether users have the most recent data—which is an ongoing issue in the IoT. Moreover, golden records allow users to “look across all the values you have in your system and choose the most common one, for example, if several sources agree upon what that value is,” Smith said.

  • Completeness: The timely usage of golden records also provides insight into “how complete the information you have really is, because you can see whether the systems that should have all this information have it, and whether different systems that make up the whole have the information that they should as well,” Smith added.

Master Data Management

Producing the golden record of data collectedin the IIoT is useful in a variety of ways. A golden record helps form thebasis of the data quality measures for comparing current results of machine-generateddata to the desired ones. In healthcare, for example, “hospital systems have awhole bunch of different devices plugging into their network and they want tomake sure that for security purposes they have a golden record of alloweddevices,” Horvath said.

In manufacturing, this approach can create even more tangible business value. “You think about all of the different assembly line machines, and even two different lines having the same machine or having multiple different sensors in the machine,” Horvath said. “That becomes an exponential headache for IoT devices.”

Advanced analytics

Most importantly, the golden record of MDM serves as an optimal source for training datasets for machine learning models. Such a golden record offers an “analytics-ready stream,” Horvath said. The advantages of training machine learning models on streaming data are well documented. “When we think about a training dataset being a snapshot of static [data], well, a leap forward is to make training data in motion,” she added.

These are just some of the benefits a golden record can provide for the IIoT, in addition to ensuring data quality at scale, to help operationalize enterprise information with cognitive computing technologies.

Get the newsletter
From automation advancements to policy announcements, stay ahead of the curve with the bi-weekly AI Business newsletter.