AI Business is part of the Informa Tech Division of Informa PLC
This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.
How Master Data Management can help create order out of chaos
by Jelani Harper 21 August 2019
As enticing as the massive amount of
streaming data in the Internet of Things is, it presents a number of pronounced
data management challenges for ensuring that information adheres to enterprise
conventions of data quality. Consequently, many organizations aren’t getting
the full value out of their IoT initiatives.
“With the IoT producing massive amounts of data, a lot of enterprises are leaving it out of analytics because it’s just too cost prohibitive with time and resources to try and map all of this into the analytics models,” said Katie Horvath, CEO of data accuracy specialist Naveego.
The timely incorporation of machine learning
technologies can help improve data quality, data profiling, and aspects of
Master Data Management (MDM) to consistently automate these operations for
reliable analytics models.
This process not only reinforces the
transformation of the IoT into the Intelligent Internet of Things (IIoT), but
also delivers “insight into that [IoT] data such that it becomes part of data
quality and part of the golden record,” Horvath said. “Really, what it leads to
down the road is auto mapping of IoT devices.”
The golden record in MDM is a single, all-encompassing dataset that captures
all the necessary information from enterprise systems of record and is assumed
to be 100 percent accurate.
Machine learning is directly responsible for the automation of data profiling capabilities that are essential for ensuring good data quality. Data profiling is the rapid statistical assessment of core data attributes to determine structure, format, and other key characteristics. When it’s coupled with machine learning and its advanced pattern identification, it becomes a perfect fit for data quality operations.
This facet of artificial intelligence allows
organizations to “use profiling to learn about what the data is inside of a
data source, as well as build automated data quality checks,” Naveego CTO Derek
Smith explained. “We use the profiling and pair it with machine learning to
build quality checks off of the profiled data.”
Machine learning techniques are effective for
pinpointing patterns in sensor data related to sensitive or personally
identifiable information, and other characteristics of interest in specific use
cases. They enable organizations to “make sure that the data is what they
expected,” Smith said. Competitive options in this space can profile data at
the cloud’s fringe to support edge computing and additional IoT deployments.
Machine learning is instrumental in automating data quality measures based on data profiling, which is essential for rapid deployment of IoT. “There’s a human workflow side of it,” Smith said. “We are making data quality suggestions and then allowing the user to see those and put those in place.”
Data quality suggestions might include advice
to protect sensitive information (with mechanisms like masking) or to simply
issue notifications that pressure readings in the oil and gas equipment, for
example, are outside of specified ranges—possibly warranting user action. This
approach automates four dimensions of data quality:
Producing the golden record of data collected
in the IIoT is useful in a variety of ways. A golden record helps form the
basis of the data quality measures for comparing current results of machine-generated
data to the desired ones. In healthcare, for example, “hospital systems have a
whole bunch of different devices plugging into their network and they want to
make sure that for security purposes they have a golden record of allowed
devices,” Horvath said.
In manufacturing, this approach can create even more tangible business value. “You think about all of the different assembly line machines, and even two different lines having the same machine or having multiple different sensors in the machine,” Horvath said. “That becomes an exponential headache for IoT devices.”
Most importantly, the golden record of MDM serves as an optimal source for training datasets for machine learning models. Such a golden record offers an “analytics-ready stream,” Horvath said. The advantages of training machine learning models on streaming data are well documented. “When we think about a training dataset being a snapshot of static [data], well, a leap forward is to make training data in motion,” she added.
These are just some of the benefits a golden record can provide for the IIoT, in addition to ensuring data quality at scale, to help operationalize enterprise information with cognitive computing technologies.
Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.