Vice President & Distinguished Engineer at IBM Expert Labs, John J Thomas, discusses the importance of trustworthy data in building trusted AI systems. John leads elite technical teams that work with IBM clients to implement Data Fabric, Data Science/AI/ML, and enterprise AI solutions.
Companies around the world are looking to AI and automation to help them solve all sorts of problems, from addressing skills gaps and labor shortages to modeling the impact of external threats like natural disasters or pandemics.
These capabilities are helping organizations become more dynamic, innovative, and resilient, but AI raises new challenges of its own: namely, ensuring your use of it doesn’t introduce adverse consequences to your users or society at large. The practice of AI Ethics that embeds guiding principles and safeguards against misuse is becoming a basic requirement for organizations. As governments in the E.U. and around the world work to enshrine responsible AI practices into law, companies must go beyond internal policies to put systems in place that allow them to demonstrate compliance. And for most organizations, the process of future-proofing their AI solutions starts with taking a hard look at their data.
The evidence also shows that ensuring the trustworthy use of AI is an increasingly urgent priority. Today, 85% of IT professionals say they agree that consumers are more likely to choose companies that are transparent about how their AI models are built, managed and used, according to research from IBM. At the same time, most IT professionals also say that they are not taking important steps to safeguard AI’s trustworthiness like monitoring for bias, model drift or developing an AI Ethics policy. Alarmingly, most organizations wouldn’t be able to act on a strategy even if they had one: Only 40% of companies have a data provenance strategy, meaning they lack the ability to track data and figure out where it came from in the first place.
Data Strategy Built for the Future
It’s intuitive that an AI strategy must be preceded by a data strategy. AI models are trained using data and while workarounds like synthetic data offer promise, most of the data used to power AI is still generated by real applications. If an organization can’t determine where that data came from, they can’t determine whether that data is any good or not, who owns the data and whether it is protected. To consider the inverse, organizations that are realizing the benefits of AI are vastly more likely than anyone else to have a data management and governance system in place. For example, companies that have deployed AI are nearly 300% more likely to be using a data fabric strategy that can automate data discovery and governance compared to those that have not.
Today’s data leaders have a mandate to strive for real-time decisioning and predictive models that help keep the organization ahead. But to get there, they must be able to help their organizations design a data strategy that defines the right approach to making sense of vast amounts of data, aligns their data and business strategies, and identifies the right solutions that span the entire organization. Particularly if those solutions include AI, they must take responsibility for how their data is used to build AI, and how it is maintained and governed over time.
Outlining a Data Governance Policy
A holistic data strategy needs to account for three main factors: It must try to ensure data is high quality; it must account for privacy protections and try to ensure people remain the owners of their own data; and finally, it must be secured against inevitable intrusions and security threats.
A data fabric architecture simplifies data management by allowing businesses to weave together disparate data sources and storage repositories like databases, data lakes and data warehouses. Done right, a data fabric will connect the right people with the right data at the right time — eliminating the technological complexities involved in data movement, transformation, and integration. In short, a data fabric combines four critical priorities: Intelligent data integration, the democratization of data access, stronger data protection and the ability to govern data use to promote its trustworthiness.
Governance issues and data misuse can arise from lack of knowledge as opposed to any nefarious purpose. That’s why companies that take steps to standardize their nomenclature, for example by using active metadata effectively, stand to improve the productivity of data teams by up to 20% by 2024. Engaging with external business partners, academics, and AI-focused organizations can help the industry as a whole work to establish “ethical interoperability”. For example, at IBM we're working with organizations like the Responsible AI Institute so businesses can use IBM technology like AI Factsheets in IBM Cloud Pak for Data alongside RAII's assessment framework to give businesses a tangible way to demonstrate the trustworthiness of their AI systems against industry standards.
The ability to manage data complexity and the ability to govern the responsible use of data and AI go hand in hand. Today, one in five companies doubt their ability to access the data they need when they need to, citing challenges in security, governance and more. Without an architecture that can accommodate access to data across an organization and ensure the quality of the data being used to train AI, these organizations are likely to fall even further behind their peers.
Organizations that have embraced AI are vastly more likely to have adopted a data fabric strategy than their peers for a reason. Only by automating the discovery, use and maintenance of their data can organizations ever hope to truly unlock their potential.