AI Business is part of the Informa Tech Division of Informa PLC
This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.
The data modeling process is arguably the nucleus of deploying cognitive computing technologies and applications across the enterprise today, largely because it’s a requisite for integrating those applications and their data for horizontal use cases.
Consequently, data modeling techniques have kept pace with contemporary data sources and deployments whose value, in most instances, hinges on low latency responses. In 2019, conventional relational data modeling approaches will all but lapse into obsolescence as numerous alternatives—in most cases, purpose-built for real-time use cases integrating heterogeneous data sources—surge to the fore of this facet of data management.
With modern data modeling practices including codeless models, visual model building, representative data shaping, graph techniques, programming and more, 2019 will herald the era in which “when we’re managing data, we’ve already put it in a form such that we can manage all of it together, even though it might be coming from wildly different sources,” Naveego CEO Katie Horvath acknowledged.
Traditional circumscriptions based on schema slowed relational data modeling, and frequently protracted the time required to engineer features when building machine learning models. Today, there are plentiful approaches for overcoming those limits so organizations can “incorporate schema from multiple source systems and be able to bridge those in a single solution,” TigerGraph VP of Marketing Gaurav Deshpande said.
Competitive mechanisms facilitate these advantages not only for schema, but also for disparate types of metadata to overcome this obstacle to data modeling. The resulting rectification of the schema and metadata of different sources creates improved understanding of the relationships between data, which is critical for identifying features for statistical Artificial Intelligence models.
“Typically, the input data into an AI or machine learning project tends to be heterogeneous: it comes from multiple systems,” Deshpande noted. However, by bridging differences in schema and metadata for such sundry data sources, organizations can “actually build a real-time customer 360 that’s based on federated metadata across all these systems…and then create new training data with graph computed features so that you actually have explainable AI as a part of your machine learning,” Deshpande explained.
Modern data modeling techniques are essential to building MDM models, which by definition should rapidly incorporate various schema for uniform models. Horvath mentioned the utility of “smart connectors” in this regard, which connect to any variety of data sources and “tell us automatically what the data looks like, what it contains and what the schema is.” Those initial reads are then used to translate that schema into holistic data representations applicable to multiple types and origins of data used to form a golden record for MDM.
One of the strengths of this approach is it’s architected “in a very distributed way, which allows us to handle massive big datasets in real time as we’re managing the data,” Horvath observed. “The schema is translated into a shape; a shape would include perhaps everything from one data source but also [aspects of] data from other sources.”
The output of this data modeling method is a golden record of the desired domain (which typically includes customers or products) based on the schema of the particular sources—instead of forcing data sources to fit into the schema of an MDM hub. TigerGraph COO Todd Blaschka commented on the inclusiveness of MDM models that enables organizations “to overlay integrated data that they’ve mined and cultivated, plus bring in data from outside sources.”
According to Horvath, one of most urgent uses cases for MDM data models is feeding machine learning models with training data. In other instances, data modeling reinforces data governance as a fundamental layer for the incorporation of numerous sources ideal for deep learning or machine learning. Looker CTO Lloyd Tabb described a governance model that’s “data prep for people. So when you build this model, people will always get the right calculations; things are always exactly the same when you pull data.” A growing alternative to relational data models leveraged by some of the most well known, quintessential AI companies in the market today is to implement the data modeling process via a programming language that offers a granular means of describing the underlying data.
Tabb denoted the core of this approach is to describe four different elements of data use: dimensions, measures, relations and transformations. Dimensions involve scalar calculations, such as combining first names and last names for a name field. Measures are aggregate calculations like deriving total revenue from the sum of sales amounts. Relations involve describing how data are related, such as relating tables for orders to those for users. Transformations pertain to other forms of calculations, such as determining the lifetime value of every user. These four core principles are described in a programming language so that “it becomes very simple to ask questions of the data,” Tabb said—and accelerate data preparation for machine learning accordingly.
In other instances, graphical means are available to the enterprise so that “you actually don’t have to code these models anymore,” SAS Global Financial Services Marketing Manager David Wallace indicated. Such visual approaches to data modeling may be useful in situations in which there’s myriad data sources involved. Blaschka referenced a financial service use case for real-time credit scoring in which organizations take a customer’s “name, social security number, a phone number and run it against their AI, which can also include social network and other data sources.”
For these use cases and others, organizations can rely on approaches in which there’s “this visual way of pulling data in and have the system itself compare different modeling techniques,” Wallace said. This visual approach to modeling and comparing models helps in multiple ways. Firstly, visual comparisons between models can demonstrate “which one has the greatest lift,” Wallace maintained. “There’s some technical measurements that provide the answer to which one has the highest predictive power.” Visual approaches are also influential for using modern methods for issuing explainability of machine learning models.
The data modeling process will remain the foundation of cognitive computing technologies and other low latent applications (like the Internet of Things) for some time. Whether models are created with visual, codeless, programmable, representative, or graph techniques, all of these methods are designed to accelerate the modeling process while including diverse sources. Semantic ontologies, self-describing data formats such as JSON, and advancements in JSON-LD provide many of these same benefits.
Still, the data governance capabilities of data modeling are another important dimension of this data management discipline, particularly when ensuring the data quality upon which the overarching value of data is based. The first step towards “data accuracy and data quality is ensuring the data’s in a format that we would expect,” Horvath remarked, which is one of the essential functions of data modeling. Moreover, that need expands exponentially with the broader adoption rates of cognitive computing technologies such as machine learning.
Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.