by Jelani Harper
Historically, metadata management was regarded as a vital element of data management, a necessary component of data governance, and the means of facilitating data provenance. Although it will continue to buttress these pillars of data-driven processes, there’s an important shift occurring in metadata management directly related to the relatively newfound necessities of edge computing, the Internet of Things, and statistical expressions of Artificial Intelligence.
What will continue to flourish in 2019 is the evolution of the fundamental value proposition of metadata management. In the coming 12 months, metadata management will emerge from a back office concern of IT departments and data stewards to a tactical means of furthering business objectives—and competitive advantage.
According to Dell Boomi VP, CTO and Dell Fellow Michael Morton, this trend evinces “the criticality of using that metadata to mine for additional value. With intelligence and analytics, we can…give more value to the customer using that pile of metadata.”
Metadata’s influence on production environments (and productivity) will increasingly hinge on cataloging its various types, mapping, data modeling, machine learning, and edge computing. Those successful in operationalizing metadata in these areas will profit from metadata management.
Cataloging Operational Metadata
The specific metadata organizations collect will always depend on the particular use case and types of data involved. Still, when relying on metadata for repeated operations like data integration, analytics and other jobs, organizations will continually produce three varieties of metadata, including:
- Application Composition: This type of metadata reveals particulars of the specific application involved in operations. When integrating e-commerce data with Salesforce via a cloud platform, for instance, there’s metadata about the connectors deployed (for each application), the appropriate mapping construct, and even basics like the names of the applications.
- Execution Results: Execution metadata directly stems from the particular job performed. When referencing the aforementioned use case for integrating e-commerce data with Salesforce, Morton indicated execution metadata would involve, “Time stamps: when did it start? Is it scheduled to run every hour? How much data did it move? How long did it run, how much data got pulled out of the database, got transformed, and then ultimately, how much data landed in Salesforce?”
- Run Environment: Metadata about run environments typically focuses on requirements for running a particular job. Morton explained examples of this variety of metadata include facets of memory, file systems, and performance, so that, “Now I have metadata about the environment for which what was built in the cloud ran, wherever it runs.”
These categories are the foundation for accurately cataloging metadata, which is indispensable for mining metadata for competitive advantage.
Mapping is likely the most utilitarian way metadata management affords competitive advantage when deployed with probabilistic AI or IoT use cases. It’s an integral means of rectifying differences in data models and schema required to load applications for operational use. Each time these dissimilarities in data modeling are addressed with successful mapping efforts, metadata is generated about those details. When produced, collected, and catalogued at scale, metadata about these different mapping jobs is reusable for various use cases, departments, and organizational objectives.
Repeating the mapping for specific fields—perhaps for the bi-directional movement of data from a customer profile in a retail database to Salesforce—is much more efficient than customizing that mapping for each integration or job. According to Morton, handcrafting mapping in say, Java, normally “wouldn’t take less than a week or two weeks just to build the mapping.” However, when leveraging the residual metadata from previous mapping efforts for current ones, “You’re basically 90 percent of the way there in a split second,” Morton said. This application of metadata management not only reduces time to value, but also enables organizations to fully exploit the metadata to which they have access.
Although there’s several ways machine learning assists with metadata management, it optimizes the mapping process by suggesting how metadata from previous jobs can streamline efforts for similar ones. When working with data at scale for broad integrations between CRM and e-commerce applications, for example, machine learning suggested mappings “accelerate your mapping,” Morton mentioned. “It’s very rare that one [mapping construct from a previous integration] will match exactly [with a current one], because you’re going to have to modify it. But if you can reduce your development time to a minute, compared to an hour, that’s the beauty.” Machine learning suggestions can decrease this development time by accurately suggesting how to map 40 out of 45 fields, for instance.
Morton described some of the most advantageous algorithms for this dimension of metadata management as “very shallow learning. It does have to learn. Again, I want to give credit that it’s definitely not deep learning. But, you have to do shallow learning.” Shallow machine learning and other forms of rudimentary, algorithmic AI are also useful for recognizing similarities in fields within a system or across multiple ones. When categorizing different fields for last names of customers, for example, across various retail systems, there may be slight alterations in the spellings, characters, and abbreviations for this field. “We have to have at least some intelligence that can recognize that the field name for that metadata is going to mean the same thing,” Morton said. “That’s where it gets applied.”
Organizations can also deploy more advanced forms of machine learning to derive timely action from metadata such as anomaly detection, which may prove pivotal on anything from streaming datasets in the IoT to hourly jobs for e-commerce. Several cloud platforms have options for utilizing supervised machine learning “to train a model for what’s normal for running [your application],” Morton remarked. “Once the results of running that training model [in production] are different than the model that was trained prior to execution, then we know what’s different.” Alternative approaches to anomaly detection such as monitoring log data, for instance, are frequently not sustainable for recurring big datasets. Organizations can more effectively monitor peaks and valleys in data transmissions “from the metadata,” Morton offered. “It’s just bytes.” Alerts issued from this application of metadata management can “detect things, so if there’s a change in your business, [you can know] was the change meant to be,” Morton said.
Low Code Data Models
Effective metadata management is particularly meritorious for reducing the amount of code involved in data modeling—which is required for integrating data. The most insightful machine learning models involve data from multiple sources; longstanding value of data emitted from the IoT stems from integrating and aggregating these sources with traditional data sources. “Whenever you’re moving data from a source to a target, you have to model from one data model to another,” Morton revealed. Using metadata from previous mappings decreases the logic required for the modeling process. However, there are use cases—such as converting currency for international e-commerce applications—requiring unique transformation logic in which additional coding is necessary. “The good thing about low code is, if you need to do code, you can inject code,” Morton ventured. “Low code doesn’t mean that you’re forced to do code; it just means that you could inject code if you need to.”
Many of the boons of metadata management pertaining to mapping, data modeling, and machine learning are realized in edge deployments in the IIoT. The different elements of automation and mapping involved in these processes are timely enough for edge computing, which is largely based on speed and levity of smaller devices at the edge of the cloud. “Customers, especially in IoT environments, want to do lightweight data transformations, which goes back to the mapping again,” Morton said. “They want to do lightweight transformations at the edge before the data goes from the edge to some central server.” The main benefit of the celeritous transformations (expedited by metadata from previous mappings) at the edge is they help filter the transmissions of device data, so only relevant results are sent to centralized locations. “We’re finding that all of a sudden, customers want to do more data manipulations at the edge before it goes to the cloud, and this is what we’re seeing in the IoT space,” Morton added. This approach makes edge deployments more practical, less resource intensive, and useful to the enterprise.
Valuing the Human
The fundamental point of commonality for managing metadata to improve aspects of mapping, data modeling, edge computing, analytics, and data integration is “you’re using metadata to value the customer, the human,” Morton emphasized. Incorporating machine learning in that management process accelerates various data preparation factors for extracting value from the IoT, or for integrating sources for more effectual analytics.
Nonetheless, the more important development for managing metadata in this fashion is the action derived from it and the tactical advantages of the insight it yields. Monitoring metadata for anomaly detection has a host of applications, from maximizing the output of various data-driven jobs to cyber security implications. Reducing time and organizational resources for edge deployments by accelerating transformations on endpoint devices or edge gateways can broaden IoT adoption rates, making everything from smart cities to real-time, micro-targeted marketing a reality. In this regard, metadata management supports some of the more compelling instances of data-driven processes today.
Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.