by Jelani Harper
SAN FRANCISCO – The myriad dimensions of artificial intelligence are almost all predicated on amassing as much data as possible for training, learning, language processing, and more. The forces of regulatory adherence are equally as prominent, dictating organizations pare their data, account for all of them, and eliminate them if needed.
With regulatory measures such as Europe’s GDPR and other similar ones populating individual states in the U.S., AI’s future in the enterprise seemingly depends on successful governance measures. Due in no small part to AI, “data is just getting bigger and bigger,” reflects Rod Harrison, StorCentric Chief Technology Officer and Vice President of Engineering. “The cost of disks is decreasing, so it’s becoming less and less of a requirement to erase things—unless you’ve got a legal requirement to do it.”
Today, a mounting number of legal and regulatory requirements are demanding organizations get rid of data upon request, protect personally identifiable information, and safeguard the privacy of consumers. Ergo, the cardinal task for many data governance programs is ensuring standards such as data lineage, lifecycle management, and security decrease the risk of leveraging data for AI.
Related: Tackling trust in machine learning
Blockchain-enhanced data provenance
Some of the core elements of blockchain—the unalterable nature of its chain, its flawless preservation of data’s sequence, and its permissions for validation—make it invaluable for determining data lineage.
Unsurprisingly, some of the more advanced data governance tools for archiving data rely on this technology (or are strikingly similar to it) to facilitate an aspect of data governance foundational to regulatory compliance.
“Everything that touches that file: who did it, when, anything you do to it whether they access it, whether they change it, is all stored,” Harrison explains. “That’s all cryptographically stored and linked together. So even if you tried to nefariously modify that, you can’t re-link that chain.”
Use cases for this facet of archiving systems are critical for almost any element of regulatory compliance, from conventional concerns such as HIPAA and payment card information to aspects of data privacy.
The internal usage of technologies like blockchain and others similar to it are also meritorious for discovery and ediscovery use cases. “If you want to present evidence in a court of law, they would like you to prove that it hasn’t been tampered with,” Harrison says. Such archiving systems employ robust auditing capabilities to demonstrate their immutability to courts of law and regulators alike.
The current focus on data privacy parallel to AI’s resurgence has strongly renewed interest in lifecycle management. Organizations must master this domain of data governance to deploy the data volumes required for many AI applications, yet still eliminate data from their systems for compliance.
“Oftentimes you’ll find if you keep [data] longer than you really have to, you then are actually exposing yourself to potentially other legal requirements,” Harrison mentions. Archiving systems based on object stores give organizations an immense amount of flexibility for enforcing lifecycle management.
Users can mandate expiration dates for data in accordance to governance policies, although they’re ultimately responsible for eradicating these records from their systems. Once the data has expired, “it doesn’t go away until the admin goes in and says I want to get rid of everything that’s expired,” Harrison reveals. “But, [until then] no one can see it and no one can access it. So it covers you from that point of view of saying I retained this for this time required, I can prove it and show you it was available for this period of time and show you on this date it was no longer available.”
Security and disaster recovery
The confluence of data governance and security is a pivotal concern readily addressed by contemporary archiving systems. The immutability of technologies reminiscent of blockchain are primed for plentiful security use cases because they’re “very resistant against things like viruses, malware, and ransomware,” Harrison observes.
Truly competitive solutions in this space give organizations the option of tiering the data archived so that they can be replaced with a shortcut after a determined time period to make way for more recent data. In this instance, users can implement data governance policies to “offload the burden on your primary tier one storage, and only things that are actively being hit would stay local,” Harrison remarks.
“If it’s not touched for 30 days or any amount of [specified] time [the system] will just replace it with a shortcut.” Thus, in the event of catastrophes or modern security threats like ransomware, in which organizations are looking to restore their systems, they get a host of benefits.
Partly because these archiving systems come in pairs they offer backups for restoration, restore quickly, and most of all, restore intelligently because of the shortcuts. The overall effect is “you can restore petabytes of data very quickly because you only restore the shortcut,” Harrison says. “It will look like your systems are all restored very quickly, and then you can fire up your business and production applications.”
Those systems will see the shortcuts and function as though all the data were actually restored. The difference is the data get restored once the shortcuts are accessed, so “you can essentially bootstrap your business back to life after a catastrophic event in exactly the perfect order that you need, rather than trying to guess what to restore [first],” Harrison maintains.
Ultimately, the long term viability of AI depends on the enterprise’s ability to successfully govern these technologies and the copious data amounts required to leverage them. Issues of data provenance, lifecycle management, and security represent some of the fundamental data governance mainstays to which organizations must hold AI data accountable. Legal requirements and regulatory adherence may provide the initial impetus, but organizations must implement these facets of data governance for AI in order to realize the promise of its transformative technologies.
Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.