What are the most common problems when it comes to data for AI?What are the most common problems when it comes to data for AI?
AI works only when it’s built on good information management practices. It cannot fix the absence of them
February 22, 2021
The most common challenge is simple – companies realize too late that not only do they not have enough data to implement the kind of AI program that they want but that the data they do have isn’t good enough quality to achieve their goals. It’s at this point that businesses tend to finally acknowledge that they have neglected their information management to great detriment, and their data, which is their core asset, does not have the required structure and qualities.
AI works only when it’s built on good information management practices. It cannot fix the absence of them. If you have not prioritized the creation of structure and information management practices that create context and meaning for your data wherever it is needed, your organization will have created data silos. These silos – effectively “data disconnections'' – make it incredibly challenging to mobilize your data and information for new business use cases, including AI technologies.
Physical separation of data is not a problem – in fact, there are arguably numerous benefits from distributed data solutions. But regardless of where the 0s and 1s are stored, they need to live together on a conceptual plane: this is where data governance and coordinated management is critical. Data that is well structured and well described is the only kind of data that can carry its context and meaning across a business. This ‘data portability’ breaks down existing information silos and prevents new ones from being created.
Establishing a shared and living ‘map’ of your organization's data assets is the most effective way to understand how the company perceives its current data assets. As a starting point for this, I recommend Domain Modeling, which creates a strong foundation for all projects and supports creating portable structured data – essential for AI implementations. Domain Modeling is part of a broader practice that focuses on the structure and language of the business domain and helps organizations understand the variety and shape of their data.
I’m often surprised at how different team members’ conceptions of the business’s domain model can vary, so involve your employees in this. Invest some time and tease the information out. Believe me, the value locked in their heads is a large part of your core business advantage. At its core, this is a de-risking exercise that helps a business to understand the gaps in its shared understanding. Identify together where your data could be better structured to provide value to new services, new experiments and new product ranges.
Companies must challenge their teams on how existing processes should change to support more efficient, and ongoing, innovation. ‘Ongoing’ is the crucial word here. It is easy to look at the immediate changes that need to be made, and capabilities that a company wants to deliver as a priority, but what will add real value when implementing any kind of AI program is looking ahead to the future.
The output of this exercise is a better understanding of your data landscape: where perspectives different regarding the domain; and areas that are poorly realized or would benefit from improvement. This view can then be aligned with an understanding of what drives value and therefore support the prioritization of data improvements, whether that be sourcing missing data, aligning existing data to an agreed structure, or simply improving data quality and robustness.
By adopting a domain-centric approach and using automation where possible, companies can look inward to their own data and domain and critically evaluate it. The better you know your data and how it is understood in the company, the better the ROI will be on AI implementations – particularly if you have elected to build a solution, rather than buy off the shelf.
As we start to explore these principles further, we develop a better understanding of the value of being able to move and access our data in a way that carries with it context and meaning. This supports the creation of an information management strategy that covers how data is authored, described and maintained through its lifecycle.
While acting on this information strategy, many organizations are now considering the role of a Knowledge Graph, effectively an information backbone with the structure of their Domain Model, as part of their data capabilities, to glue their business information together with context and meaning.
This is reflected in the renewed popularity (after the initial boom in the 2010s) of Knowledge Graphs, and in how they are playing a growing role in the AI product ecosystem.
The growing understanding of these facts is driving organizations with large data science teams tasked with building AI and machine learning models, such as eBay, Apple and Nike, to embark on a hiring spree focused on data and information management professionals.
A company requires a combination of experienced people, tools, and good information management practices to implement AI programs that will add real commercial value. Get it right and the positives are clear: new revenue and value-producing opportunities, maximizing chances of success in the marketplace.
Matt Shearer is product director at Data Language, a technical consultancy specializing in machine learning and knowledge graphs, based in the UK.
You May Also Like