Data modeling mastery of the Internet of Things

Dr. Lewis McGibbney, data scientist at NASA’s Jet Propulsion Laboratory, talks IoT data and semantic standards

by Jelani Harper 30 August 2019

The IoT means a multitude of things to many different people. Depending on the vertical in which they operate, users perceive the IoT as a herald of new opportunities for increasing revenues and, perhaps, decreasing risk.

Regardless of the use case or industry, “in terms of dimensionality or understanding of how things are connected— clearly, intrinsically interconnected—Internet of Things applications are really driving us to achieve higher dimensionality, higher resolution understanding of the fields that we’re working in,” acknowledged Dr. Lewis McGibbney, data scientist at NASA’s Jet Propulsion Laboratory, located at the California Institute of Technology. “The emergence of big data through the Internet of Things applications has opened up new capabilities.”

The granular nature at which the IoT provides this basic comprehension of the world around us through continuous streaming of sensor data can be improved in two fundamental ways. The first is via automated cognitive analytics. The second is by utilizing standard data models to contextualize sensor information to create a richer, more informed composite.

The synthesis of these approaches not only exponentially increases the value obtained from IoT initiatives, but also makes it much more manageable to handle large amounts of data.

Standardized models

Although such benefits have very broad appeal, they’re most easily demonstrable in scientific computing. Climate change—as a result of global warming—is one of the most pressing ecological and social issues of our time. It’s also one of the foremost concerns for the Earth Science Information Partners (ESIP), at which McGibbney serves as chair of the Semantic Technologies Committee.

ESIP has adopted AllegroGraph to implement Findable, Accessible, Interoperable and Reusable (FAIR) data methods to rapidly access and exchange information among a hodgepodge of global governmental, non-governmental, non-profit, and professional research organizations. To this end, ESIP has incorporated the use of semantic standards, which are particularly valuable in counteracting the low latency data modeling challenges characteristic of the IoT.

Ontological value

The ESIP federation has created a Community Ontology Repository to manage the resources for describing, sharing, and harmonizing the numerous data models populating its several branches of research. Ontologies are semantic data models predicated on universal standards and enabling data of any variation (including structure, format, schema or origin source) to be aligned and interlinked. This modeling approach is very suitable for the real-time IoT use cases in which ESIP is involved, many of which require integrating data from different systems with disparate data models.

McGibbney described an elaborate example in which scientists would need to combine the migratory patterns of an animal species in a complex mashup of datasets involving sea salinity, obtained from IoT devices and NASA satellites. When factoring in additional parameters like wind vectors, the ability to harmonize such datasets and models with semantic technologies is indispensable. “As a basic practice of rectifying the data discrepancies and vocabulary discrepancies, we do that through the process of semantic harmonization,” he said.

Modeling terminology

Of equal importance to the capability to redress differences in schema of the assorted data models is the ability to describe those models in the same terms. Linked data approaches enable standardized vocabularies and taxonomies to describe almost any data attribute, which is crucial for the interoperability of various IoT systems as well as for mapping data within and between ontologies.

With this method, each datapoint is given a unique, machine-readable identifier. “You want to add the concept of taxonomies so even abstract concepts have a unique name,” said Jans Aasman, CEO of Franz, a semantic knowledge graph solution provider. The impact of this methodology is profound when modeling different data types and understanding their relationships. “It doesn’t only get you the thing you’re trying to measure and the measurements you need, but also a larger contextual element to that,” McGibbney said.

Intelligent Internet of Things

When machine learning applications are used with these semantic data models in the IoT, the amount of potential insight expands. Across industries, the changes in datasets—the delta—are of foremost importance in the IoT deployments, regardless of the specific use case. The mashup calculations McGibbney referenced in the preceding NASA example have certain temporal benefits when dealing with constantly generated data. “We can then compare what we have at some point in time with what was maybe a historical trend,” he said. “We would utilize maybe machine learning in that instance to enable us to crunch through, backwards, the historical datasets to identify the anomalies.”

In this respect, machine learning is the enabler of an Intelligent Internet of Things since its massive pattern detection capabilities can discern even the slightest trends over long periods of time—which may very well prove germane to the discussion about global warming and climate change. As McGibbney indicated, “It’s absolutely fundamental in order to produce results of a caliber and temporal fashion which is acceptable for ongoing scientific [research].”

Automation vs. Intelligent Automation

The inclusion of naturally evolving, highly descriptive data models with standards for vocabularies and taxonomies can maximize the value of the IoT deployments McGibbney described. Proprietary or non-semantic data models can demonstrate what occurred with IoT sensor data. The ad-hoc mapping and integration of the ontological approach can explain why patterns changed because of its understanding of the context in which such datasets were generated.

“The point is that with finely curated data in a standardized format, we can understand the contextual component of what’s been observed from each of those platforms,” McGibbney said. “This is what possessional data systems and data representation techniques don’t get you towards, whereas use of semantic technologies gets you this contextual dimension to data where rich knowledge-based activities can then take place.”


Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.