Data preparation for cognitive computing models, made easy
Data preparation for cognitive computing models, made easy
February 27, 2020
by Jelani Harper 27 February 2020
A disproportionate amount of data science still involves simply readying data for cognitive computing models. Within the tedious realm of data preparation, data scientists spend tremendous effort to ingest, cleanse, and model data in a uniform way to understand its applicability for constructing algorithms.
Leveraging data models with universal standards is a sure means of accelerating this dimension of data preparation. Semantic data models—known as ontologies—are tremendously useful for rectifying differences in schema that can otherwise decelerate data preparation processes.
These models naturally evolve to accommodate new data sources or end-user requirements. When leveraged within an environment of standardized graph technologies, they empower organizations by “applying a business layer of meaning across data, being able to harmonize across different source types, either structured or unstructured, to blend datasets that then can be queried without writing any code,” explained Marty Loughlin, SVP and head of global sales at Cambridge Semantics.
As such, these adaptive models and their surrounding environments make for ideal settings for data scientists to understand and prepare training data for machine learning models.
Uniform schema
Overcoming disparities in the schema of diverse datasets is perhaps the chief data modeling obstacle for blending data for machine learning models, or any other purpose. The universal standards of semantic data models, however, ensure that data of any type or structure variation is aligned the same way. Unifying the schema of respective data models in this manner presents many advantages for data modelers, including:
Easily involving structured and unstructured data: There are a number of difficulties in understanding unstructured data—particularly in relation to structured content. Ontologies eliminate this source of delay for incorporating unstructured data into cognitive computing models.
Adding sources: With other approaches to data modeling “there’s a lot of difficulty when it comes to adding in new data sources which, of course, happens,” Loughlin admitted. In relational environments, for instance, this requires users to re-configure the schema of the entire repository. Standardized semantic data models create an ideal situation for organizations so “if they want to add a new data source that has dimensions that are not already in the model, they can easily add them,” he said.
Querying: More rigid approaches to data modeling require users to define their questions upfront. Reciprocally, the tandem of the graph environment and semantic data models enables anyone to “ask questions that you didn’t anticipate,” Loughlin said. Competitive solutions in this space facilitate this advantage “without writing code,” he added.
Business concepts
Partly because of their standardized approaches to schema, semantic data models are extremely expressive. They are designed to convey the meaning of data as facts that leverage business end-user terminology. What these models express is not just the schema of the data and the relevant metadata; they also enable users to “tie that into a higher level business concept,” Loughlin said.
This synthesis of what Loughlin described as “business concepts alongside metadata” enables data scientists to better understand how their various training datasets can inform the creation of predictive models. According to Loughlin, pairing business concepts with the metadata is possible with other approaches—but not as readily as with semantic data models. “Anyone in the relational world will tell you anything is possible,” Loughlin said. “It’s just a matter of the complexity and the scalability to maintain it over time. The standards that define semantic models just give you so much flexibility and agility, but you can do it in a relational world—it’s just very complex.”
Core implementations
There’s a vast difference between leveraging cognitive computing for novel, fringe use cases and ingraining them within core business processes in verticals like finance, retail, or healthcare. “I think it’s still a science experiment in many use cases,” Loughlin noted about the prevalence of predictive cognitive computing models in some of these industries. “I mean, there are some solutions out there. I posted a comment on LinkedIn a couple of months ago about how so many people are chasing the AI brass ring while the fundamental problems of data management and data integration are not really solved.”
The most notable facet of Loughlin’s comment is there’s an almost causal relationship between these pursuits. If the rigors of data integration and data preparation were mastered by the enterprise, there would surely be more credible applications of cognitive computing deployed in core business processes. The adaptability of semantic data models can help make data science less time-consuming and more viable to organizations today.
Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.
About the Author
You May Also Like