Sharing corporate data with third parties carries obvious risks

by Jelani Harper 12 August 2019

The techniques of artificial intelligence have become all but commonplace in the contemporary enterprise. Facets of machine learning, natural language processing, chatbots, and virtual assistants dominate numerous backend processes, providing highly beneficial automation.

What’s far less pervasive today is embedding the same capabilities in core business processes, where the acceleration and automation of AI directly results in greater revenues. Transforming the way individual business units function with AI usually involves a paradigm in which organizations deliver several years of current and historical data to vendors, who then create individual models for specific tasks, like reducing customer churn.

According to Ramesh Mahalingam, Chief Executive Officer at business automation firm Vizru, there’s a fundamental problem when organizations “send data out, and you basically start leaking so much information that nobody is in control of the system.”

The two recurring dangers of this method of implementing AI include data harvesting—when vendors replicate, segment, and sell organization’s proprietary data for their own benefit—and malware distribution. Both squander valuable enterprise resources, take place with alarming frequency, but are usually kept at bay by user-friendly platforms giving organizations the tools to develop AI on their own.

Data harvesting

There are many dimensions to the harm which can be caused by third-party data harvesting during enterprise-scale AI projects. This practice not only exploits organizations’ proprietary data — like conventional data breaches — but also gives away the competitive advantage such data affords

“Once 15 years of data is handed down, what does the vendor do with that data?” Mahalingam asked. “That’s what we mean by harvesting. There is so much information that you can slice and dice, you can send it to different models for yourself, anonymize it, or otherwise. You can actually sell that data to competitors in so many different ways.”

In most instances, it’s almost impossible for organizations to establish if their data has been harvested and leveraged by vendors. For example, data in the financial services industry can be sold to an organization’s competitors, to analysts following certain trends, or manufacturers, who can gain unparalleled insight into market trends based on this information.

Distributing malware

Data harvesting implies organizations don’t know who else is capitalizing on their data. Distributing malware implies organizations don’t know exactly what they’re getting when their data is returned—or when they implement solutions devised by vendors based on that data. This is one of the fundamental reasons organizations remain skeptical about handing over their data to third-party AI vendors. Once an organization’s data is outside the corporate firewall, there are no guarantees those datasets will remain protected or follow data governance protocols.

“Some of the largest banks, some of the largest insurance companies, they all worry about companies harvesting data, or becoming a malware [distributor] and them not knowing about it,” Mahalingam said. “Because IT is not making decisions on its own anymore, line of business runs it, and line of business just thinks it’s just some fast point solution to do something small.”

Organizations can have their data returned while accompanied by malware. In this instance, the AI vendor is the initial malware distributor, but whoever interacts with that data going forward—partners, contractors, different business units—can be potentially exposed to risk as well. “They send you back a file,” Mahalingam said. “That information that comes back to your system can turn into malware. It can infect the rest of your environment.”

Taking precautions

When accessing enterprise AI solutions through third-party vendors, organizations run the risk of encountering various aspects of data harvesting and malware distribution. The former enables others to capitalize on the organization’s data; the latter can severely compromise productivity for organizations and their partners by causing security and compliance issues. “When you have 170,000 companies providing services to the market, it is impossible for you to go and do due diligence on all of these companies,” Mahalingam said. “Rather, you need to bring control within your environment.”

Organizations can accomplish this objective by accessing AI services through platforms designed for non-technical, citizen data scientists. Competitive solutions in this space utilize a stateful network for processing AI that serves as a guardrail for accessing third-party services. With this approach, data harvesting and malware risks are mitigated, giving organizations more control over their data and AI resources.


Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.