Sharing corporate data with third parties carries obvious risks
by Jelani Harper 12 August 2019
techniques of artificial intelligence have become all but commonplace in the
contemporary enterprise. Facets of machine learning, natural language processing,
chatbots, and virtual assistants dominate numerous backend processes, providing
highly beneficial automation.
far less pervasive today is embedding the same capabilities in core business
processes, where the acceleration and automation of AI directly results in
greater revenues. Transforming the way individual business units function with
AI usually involves a paradigm in which organizations deliver several years of
current and historical data to vendors, who then create individual models for
specific tasks, like reducing customer churn.
According to Ramesh Mahalingam, Chief Executive Officer at business automation firm Vizru, there’s a fundamental problem when organizations “send data out, and you basically start leaking so much information that nobody is in control of the system.”
two recurring dangers of this method of implementing AI include data harvesting—when
vendors replicate, segment, and sell organization’s proprietary data for their
own benefit—and malware distribution. Both squander valuable enterprise
resources, take place with alarming frequency, but are usually kept at bay by
user-friendly platforms giving organizations the tools to develop AI on their
are many dimensions to the harm which can be caused by third-party data
harvesting during enterprise-scale AI projects. This practice not only exploits
organizations’ proprietary data — like conventional data breaches — but also
gives away the competitive advantage such data affords
15 years of data is handed down, what does the vendor do with that data?” Mahalingam
asked. “That’s what we mean by harvesting. There is so much information that
you can slice and dice, you can send it to different models for yourself,
anonymize it, or otherwise. You can actually sell that data to competitors in
so many different ways.”
In most instances, it’s almost impossible for organizations to establish if their data has been harvested and leveraged by vendors. For example, data in the financial services industry can be sold to an organization’s competitors, to analysts following certain trends, or manufacturers, who can gain unparalleled insight into market trends based on this information.
Data harvesting implies organizations don’t know who else is capitalizing on their data. Distributing malware implies organizations don’t know exactly what they’re getting when their data is returned—or when they implement solutions devised by vendors based on that data. This is one of the fundamental reasons organizations remain skeptical about handing over their data to third-party AI vendors. Once an organization’s data is outside the corporate firewall, there are no guarantees those datasets will remain protected or follow data governance protocols.
of the largest banks, some of the largest insurance companies, they all worry
about companies harvesting data, or becoming a malware [distributor] and them
not knowing about it,” Mahalingam said. “Because IT is not making decisions on
its own anymore, line of business runs it, and line of business just thinks
it’s just some fast point solution to do something small.”
can have their data returned while accompanied by malware. In this instance,
the AI vendor is the initial malware distributor, but whoever interacts with
that data going forward—partners, contractors, different business units—can be
potentially exposed to risk as well. “They send you back a file,” Mahalingam
said. “That information that comes back to your system can turn into malware.
It can infect the rest of your environment.”
accessing enterprise AI solutions through third-party vendors, organizations
run the risk of encountering various aspects of data harvesting and malware
distribution. The former enables others to capitalize on the organization’s
data; the latter can severely compromise productivity for organizations and
their partners by causing security and compliance issues. “When you have
170,000 companies providing services to the market, it is impossible for you to
go and do due diligence on all of these companies,” Mahalingam said. “Rather,
you need to bring control within your environment.”
Organizations can accomplish this objective by accessing AI services through platforms designed for non-technical, citizen data scientists. Competitive solutions in this space utilize a stateful network for processing AI that serves as a guardrail for accessing third-party services. With this approach, data harvesting and malware risks are mitigated, giving organizations more control over their data and AI resources.
Jelani Harper is an editorial consultant servicing the information technology market, specializing in data-driven applications focused on semantic technologies, data governance and analytics.