AI Business is part of the Informa Tech Division of Informa PLC
This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.
On the offensive and defensive elements of data strategy
by Jelani Harper 20 December 2019
As the coming decade nears, the initial excitement attendant to technologies deemed “new” in the present decade is considerably waning. Organizations are devoting much greater focus to the business value derived from various applications of cognitive computing, the Internet of Things, and even Blockchain, than to the hype surrounding them.
Realizing the promise of this assortment of technologies and applications, however, necessitates overcoming the inherent obstacles of dealing with data at scale, external to the enterprise, and with low latency. Each of these factors, each of the aforementioned technologies and their applications, only increase the importance of data strategy to effectively dealing with what simply equates to more data, faster, and more distributed than ever before.
The basic paradigm of utilizing both offensive and defensive dimensions of data strategy remains as relevant as it ever was. “To give this analogy, if you are only doing offense when it comes to data strategy, or only doing defense, you are losing the game,” cautioned EWSolutions president and CEO David Marco. “Your data strategy must include a strong offense coupled with a strong defense.”
The defensive aspect of data strategy includes reducing enterprise risk by dealing with regulatory compliance, legal issues, discovery, and cost. The offensive side is predicated on the reusability of data assets to increase monetization, personalization, and optimization so that, according to David Schweer, product marketing director at Aprimo, organizations can ideally “reuse this [data] multiple times” for competitive advantage—even across use cases.
Metadata management will likely always remain the core of data strategy, data management, and data governance. Metadata management is foundational to accounting for the massive quantities of training data for machine learning models. When pairing facets of the Internet of Things with applications of Artificial Intelligence (the AIoT), organizations must standardize their metadata management. Whether utilized for offensive or defensive purposes, astute metadata management involves:
Marco explained that while metadata management is the part of data management pertaining to the technical applications of data, data governance is “your people processes. That’s how we’re going to build a structure and an organizational framework that allows us to make enterprise decisions about our data.” Organizations must have a formal data governance construct in place as part of their overall data strategy, but especially for supervising the massive data amounts associated with contemporary cognitive computing applications. Both the defensive and offensive sides of data strategy involve data governance hallmarks of lifecycle management, data quality, and data provenance.
Lifecycle management is closely associated with the defensive concerns of data management. Organizations must become increasingly aware of how long they retain data in relation to stringent regulatory compliance for PII and other data types, as well as legal concerns. Marco predicted that a federal version of the General Data Protection Regulation—which focuses on data privacy and has counterparts in several states, most notably California and New York—will eventually be adopted within the U.S. Organizations can substantially decrease the risk of data assets by focusing on retention policies fundamental to their defensive data strategy which not only apply to regulations, but also to what Marco termed, “general legal defense. A lot of companies do a very poor job of purging emails and getting rid of old data sources.”
Lifecycle management is a critical aspect of Digital Asset Management and managing other types of data in which there are clear expiration dates. The complexity of the process is significantly ameliorated by modern centralized repositories. According to Schweer, centralized tools “help inform, hey, should I renew something? So, we can kind of turn risk around and say okay, maybe I should renew this, because this is used 500 different places.” The data provenance capabilities of these options are instrumental for mitigating risk, so when data or content have expired, you know every instance that they have been used, Schweer noted. “And if I need to pull it out immediately, I can pull everything down in an instant.”
The traceability of data provenance is utilitarian for both the offensive and defensive aspects of data strategy. In terms of defense, organizations must prioritize data lineage to understand where data was used, by whom, and how, to ensure—and well as demonstrate to regulators—regulatory compliance is being met. The same sentiment applies to any litigation issues involving risk. Provenance is also one of the best tools for reusing data across multiple use cases to enhance the value. In this respect, a data strategy best practice is to equip data assets with what Schweer called a “unique identifier”, which is immensely helpful in facilitating provenance and some of its advantages for offense.
Understanding the journey of data throughout the enterprise, and as deployed outside the enterprise, is not only useful for reusing it across use cases, but also enables organizations to “track how many times I’m using it so I can really get a good sense of, hey, that’s a really good image there,” Schweer observed. In terms of content, metrics based on the multiple elements of provenance are critical for reusing assets for targeted campaigns, specific audiences, and increased personalization so organizations can “start to really optimize my spend and the pieces of content that I have,” Schweer added. Provenance for data assets delivers the same advantages for reusing data across use cases. It’s also essential for ensuring machine learning models function in production as they did during training. In almost all instances, provenance is revealed by metadata.
Data quality has a reciprocal relationship with data strategy. The boons of trustworthy, recent, de-duplicated data are a verifiable output of successful data strategy. The reliability of any AI or IoT deployment hinges on whether or not data quality levels are met. From a defensive perspective, data quality typifies cost concerns, since redundancies and inaccuracies escalate operational expenses of maintaining IT systems. “People are using that data today to make decisions on their companies, and it’s leading them to the wrong decisions,” Marco suggested.
However, when organizations are able to act on the basic precepts of data strategy, data quality becomes immeasurably useful for offense, reusing data assets, and generating value from data. Simply standardizing all the different fields and terminology of data elements (such as dates, names, etc.), and employing this standardization for classifications and taxonomies, substantially diminishes duplications and inaccuracies. By implementing the proper data governance processes for delineating how people use data, monitoring lifecycle concerns and tracking data provenance, organizations can leverage quality data for the determinations necessary to exploit machine learning and the IoT.
Ultimately, the test for success in the data strategy domain is how well organizations deploy data assets and reuse them. As Schweer observed, “You want to make the minimum amount of content needed to deliver the most valuable experience.”