AI Projects Need Intelligent Data Workflows

Data storage and management leaders can connect the dots between the unstructured data they manage and the best AI tools for the business

Krishna Subramanian , COO, president and co-founder, Komprise

June 24, 2024

6 Min Read
A floating workflow image about someone using a laptop
Getty Images

Above any other technology innovation of the past decade, AI and its powerful little sister generative AI have instigated an overarching priority on understanding and leveraging an organization’s vast data estate. Today, much of the petabyte-scale enterprise data store is not reused or even understood well enough to leverage the expanding array of free and low-cost AI tools available.

This is unfortunate, as many use cases for AI are urgent. Consider the impact that an AI tool to quickly identify sensitive data and make sure that it’s being managed with data compliance could have. Over half (51%) of IT managers surveyed by IDC reported non-compliance with data regulations in the past 12 months, with an average total cost of $1.03 million. Health care is another industry with vast potential for AI to improve collaboration, accuracy and, importantly, outcomes. The Journal of the American Medical Association recently reported on an AI-based model in use at Stanford Hospital that predicts when a patient is declining and alerts the patient’s care team.

Something’s got to give. There are two significant intersections between unstructured data – file and object data, or anything not in a database – and AI. First, we need to feed the AI beast with a steady flow of unstructured data, even though it’s hard to corral this data across hybrid cloud infrastructure. Preparing data for AI was identified as the top priority for IT in the Komprise 2023 State of Unstructured Data Management.

Related:Star Trek Creator's Foundation Offers $1M Prize for AI Promoting Good

Secondly, unstructured data needs organization and classification so that sensitive data is handled correctly and users can search for what they need faster. AI can help do this. This latter is a significant pain point, according to a survey in the second half of 2023 of 334 chief data officers and data leaders. The research, sponsored by AWS, the MIT chief data officer and the Information Quality Symposium, found that 46% of these data leaders identified data quality as the greatest challenge to realizing Generative AI’s potential in their organizations. 

Storage IT professionals have a prominent role to play in facilitating AI and big data analytics initiatives. They must deliver highly performant, secure and scalable storage infrastructure to support AI data workloads. Equally, they need to classify and deliver the right data to these tools to support the work of data scientists and other data stakeholders across the enterprise. Let’s consider the emerging concept of automated data workflows for AI. 

Related:Amazon Facial Recognition Technology Tested at UK Train Stations

Automating and Streamlining the Connections Between AI and Unstructured Data 

The two use cases of feeding the right data to AI and enriching metadata classification using AI both involve data workflows. There’s much ado about all the ways that AI can improve business operations, such as with AI-enhanced customer service processes or AI platforms that identify defects in manufacturing lines. Yet, these AI data workflows are difficult to manually execute and can benefit from systematic automation.

To automate AI workflows, you need to:

  • Search and curate the right data: To create an AI data workflow, you first need a way to search across all your data estates which can be terabytes to petabytes of data to find the relevant data of interest.

  • Manage data governance: When executing AI data workflows, it is important to keep track of what corporate data was fed to which AI process so there is an audit trail. Similarly, it is important to enforce guardrails such as not sharing sensitive data with external processes. Organizations must develop clear corporate policies for data governance for AI and look for automation solutions that help audit and manage data governance.

  • Cut AI costs by persisting results: Since most AI solutions have a pay-per-use billing model, it’s extremely important to avoid nasty surprises of high AI costs due to the same data being processed repeatedly. Hence, having a global index that keeps track of the labels and tags from AI so users can search without having to run the AI process again on the same data is valuable.  

  • Leverage automation: The ability to automatically run the AI workflow on new data ensures that the AI is trained on the latest data without requiring any cumbersome manual effort.

Sample Use Cases for AI Data Workflows

A workflow from the pharmaceutical industry could entail running a custom query across data silos to find all data for Project X using a data management solution. Next, the process could execute an external function on Project X data to look for a specific DNA sequence for a mutation. The data management software is configured to tag such data as “Mutation XYZ” and then moves only that new data set to a cloud AI service for analysis. Once the mutation data is no longer needed, the workflow finishes by moving it to a low-cost archival storage tier. The workflow could repeat with new data sets as often as needed.

Taking this one step further, what if you could apply an AI tool to your data to rapidly segment and enrich the metadata with new tags? A data scientist may not know where all the data from a certain project resides and therefore cannot automate the process of tagging it. Searching manually through files is usually not viable and with AI—it’s no longer necessary either.

A marketing organization could create a workflow in a data management system to search across billions of images to find specific people or objects needed for a campaign by connecting to a tool like Amazon Rekognition. The resulting data set can then be automatically tagged for future use and save hundreds of hours of manual effort.

Or consider the application of Azure Bot Service, which allows developers to build and deploy intelligent chatbots and virtual assistants for customer service. An AI data workflow could analyze data from customer responses and then tag that data based on sentiment or customer issues and move it to a cloud data lake for future analysis.

Today though, these use cases are not easy to implement because there is still a great deal of complexity in preparing data and in understanding how to use AI tools. A 2024 study by IBM revealed that nearly half (45%) of companies report that advances in AI tools that make them more accessible are driving AI adoption. The research also found that only 34% are currently training or reskilling employees to work together with new automation and AI tools.

With many organizations lacking specialized skills in coding and AI tools, there will be ample opportunity for developers and software companies to create streamlined, point-and-click solutions for these workflows. We’ll also see the development of open ecosystems of complementary technologies from which non-IT users can select and build AI projects from the point of collating and classifying the right datasets through applying security and governance to feeding data to AI and monitoring the outcomes and then moving the data sets to an archiving location upon completion of the project.

As the AI industry evolves and matures, we’re seeing a potential complexity barrier that could slow down the positive developments AI can bring to people, businesses and governments. Rising above these challenges requires extreme coordination between individuals across the organization – think chief experience officers, data scientists, security professionals, storage and data management experts, and IT infrastructure people, along with HR and legal – to avoid bad outcomes and ensure that goals are aligned.

Data storage and data management leaders can contribute to this new age by connecting the dots between the unstructured data gold they manage and the best AI tools for the business. Developing and nurturing secure, intelligent AI data workflows is a sensible first step.

About the Author(s)

Krishna Subramanian

COO, president and co-founder, Komprise, Komprise

Krishna Subramanian is the COO, president and co-founder of Komprise. In her career, Subramanian has built three successful venture-backed IT businesses and was named a “2021 Top 100 Women of Influence” by Silicon Valley Business Journal.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like