Automating data quality in the Internet of Things eraAutomating data quality in the Internet of Things era
Automating data quality in the Internet of Things era
August 21, 2019
How Master Data Management can help create order out of chaos
by Jelani Harper 21 August 2019
As enticing as the massive amount of
streaming data in the Internet of Things is, it presents a number of pronounced
data management challenges for ensuring that information adheres to enterprise
conventions of data quality. Consequently, many organizations aren’t getting
the full value out of their IoT initiatives.
“With the IoT producing massive amounts of data, a lot of enterprises are leaving it out of analytics because it’s just too cost prohibitive with time and resources to try and map all of this into the analytics models,” said Katie Horvath, CEO of data accuracy specialist Naveego.
The timely incorporation of machine learning
technologies can help improve data quality, data profiling, and aspects of
Master Data Management (MDM) to consistently automate these operations for
reliable analytics models.
This process not only reinforces the
transformation of the IoT into the Intelligent Internet of Things (IIoT), but
also delivers “insight into that [IoT] data such that it becomes part of data
quality and part of the golden record,” Horvath said. “Really, what it leads to
down the road is auto mapping of IoT devices.”
The golden record in MDM is a single, all-encompassing dataset that captures
all the necessary information from enterprise systems of record and is assumed
to be 100 percent accurate.
Intelligent data profiling
Machine learning is directly responsible for the automation of data profiling capabilities that are essential for ensuring good data quality. Data profiling is the rapid statistical assessment of core data attributes to determine structure, format, and other key characteristics. When it’s coupled with machine learning and its advanced pattern identification, it becomes a perfect fit for data quality operations.
This facet of artificial intelligence allows
organizations to “use profiling to learn about what the data is inside of a
data source, as well as build automated data quality checks,” Naveego CTO Derek
Smith explained. “We use the profiling and pair it with machine learning to
build quality checks off of the profiled data.”
Machine learning techniques are effective for
pinpointing patterns in sensor data related to sensitive or personally
identifiable information, and other characteristics of interest in specific use
cases. They enable organizations to “make sure that the data is what they
expected,” Smith said. Competitive options in this space can profile data at
the cloud’s fringe to support edge computing and additional IoT deployments.
Machine learning is instrumental in automating data quality measures based on data profiling, which is essential for rapid deployment of IoT. “There’s a human workflow side of it,” Smith said. “We are making data quality suggestions and then allowing the user to see those and put those in place.”
Data quality suggestions might include advice
to protect sensitive information (with mechanisms like masking) or to simply
issue notifications that pressure readings in the oil and gas equipment, for
example, are outside of specified ranges—possibly warranting user action. This
approach automates four dimensions of data quality:
Accuracy: Automated data quality checks confirm that data conforms to ranges and characteristics outlined by users, informing them of any variation.
Consistency: Implicit to the accuracy of these data quality measures is the consistency of the data profiled.
Recentness: Connecting data profiling capabilities with those typical of MDM (such as the golden record) provides visibility into whether users have the most recent data—which is an ongoing issue in the IoT. Moreover, golden records allow users to “look across all the values you have in your system and choose the most common one, for example, if several sources agree upon what that value is,” Smith said.
Completeness: The timely usage of golden records also provides insight into “how complete the information you have really is, because you can see whether the systems that should have all this information have it, and whether different systems that make up the whole have the information that they should as well,” Smith added.
Master Data Management
Producing the golden record of data collected
in the IIoT is useful in a variety of ways. A golden record helps form the
basis of the data quality measures for comparing current results of machine-generated
data to the desired ones. In healthcare, for example, “hospital systems have a
whole bunch of different devices plugging into their network and they want to
make sure that for security purposes they have a golden record of allowed
devices,” Horvath said.
In manufacturing, this approach can create even more tangible business value. “You think about all of the different assembly line machines, and even two different lines having the same machine or having multiple different sensors in the machine,” Horvath said. “That becomes an exponential headache for IoT devices.”
Most importantly, the golden record of MDM serves as an optimal source for training datasets for machine learning models. Such a golden record offers an “analytics-ready stream,” Horvath said. The advantages of training machine learning models on streaming data are well documented. “When we think about a training dataset being a snapshot of static [data], well, a leap forward is to make training data in motion,” she added.
These are just some of the benefits a golden record can provide for the IIoT, in addition to ensuring data quality at scale, to help operationalize enterprise information with cognitive computing technologies.
About the Author(s)
You May Also Like