It's time to unlock dark data

It's time to unlock dark data

Ciarán Daly

September 26, 2017

8 Min Read


Dr. Daniel Tapias is Founder and CEO of Sigma Technologies Global. Sigma was founded in 2008 and is headquartered in San Francisco, with a large footprint in Madrid. With extensive experience in research, design, and product implementation, Dr. Tapias has focused the last ten years on introducing customers to state-of-the-art artificial intelligence advances, concentrating on their deployment and monetization. Dr. Tapias has co-founded other companies in the field, is an Associate Professor of Telecommunication Engineering at the Autonomous University of Madrid, and received his PhD in Telecommunication from the Polytechnic University of Madrid. He is the co-author of a number of books, published over 50 papers, and has taught over 30 courses.  

Dr. Tapias founded Sigma to democratize machine learning solutions. Sigma accomplishes this by providing companies with customizable processes and tools that can extract relevant information out of unstructured, dark data sources in order to equip companies to make better decisions faster. The continuous development of Sigma’s API and Product Platform, AiCurate, has allowed Sigma to process any type of unstructured data—be it text, audio, image, video and/or biometric—from any source, and adapting a bespoke product to each use case efficiently and quickly. These efforts are supported by Sigma’s extensive experience in preparing rigorous high-quality data in order to train algorithms in over 60 languages.

The opportunity of dark data

One of Sigma's central missions is to 'unlock dark data' for enterprises. Dr. Tapias argues that dark data is one of the key opportunities businesses in the digital age must master - but what does this mean? “Dark data is simply data in an unstructured form. It is not easily used because it’s not organized in such a way machines or people can easily recognize, manipulate, and draw conclusions from.”

Analysts at Gartner estimate that 80% of data generated today is dark and unstructured. “Most sources agree that there will be more data created in 2017 than in 2015 and 2016 combined—which was more data than was created in the previous 5000 years of human knowledge. By 2020, 1.7MB of data will be created every second for every single person on the planet.”

That’s a challenge to enterprises—but for those who can leverage it, it’s also a huge opportunity. Dr. Tapias argues that considerable opportunity can only be created by automating the data processing and giving it structure. “The volume of data is so large that it is impossible for businesses operating in the current paradigm to process it, much less understand it and make an effective business decision based off of it. Most of it is dark data—so, there is an urgent need to automate these processes.”

From there, data can be used to develop new neural networks and AI tools which can finally generate extremely accurate insights for businesses. “Once dark data has been transformed into structured data, we can apply analysis and machine learning techniques to detect useful correlations and patterns,” he argues. The problem, of course, comes from extracting that data—and this is where machine learning becomes pertinent. “This data is embedded in unstructured text, audio, speech, image, and video inside a myriad of sources and file formats. Extracting relevant information from these data sources requires tools such as natural language processing and understanding, voice recognition, image and video recognition, and biometric identification.”

AiCurate: Streamlining Access To AI

This is precisely what Sigma's AiCurate API is designed to achieve. AiCurate is capable of performing data processing and recognition on any data source, independent of the format—whether it is text, audio, image, video, or biometric data. As a bolt-on solution with a minimal deployment timeline and cost, AiCurate is able to extract data, prepare it for deep learning techniques, and in turn create a sustainable and automated product which can augment human performance—or, under the appropriate and supervised circumstances, “replace it entirely”, Dr. Tapias explains.

It is solutions like those offered by Sigma which, today and in the future, will enable enterprises to bypass the costly and often slow need to create dedicated data and AI departments in order to leverage unstructured data. “Outsourcing may be the right solution to demonstrate the capabilities of these technologies without incurring the considerable risks associated with technology capital investment and altering business process,” he argues. “Instead, the implementation can be achieved progressively with an expert in the use and application of the technology, while your company remains the domain expert.”

Right now, they’re working with companies engaged in security and threat detection; fraud; call centers; customer service and sales; medical imaging; media content monetization; advertising; insurance; e-commerce; and automobile augmentation. “We focus our efforts on expanding our state-of-the-art expertise in AI,” Dr. Tapias says. “Through close collaboration, we combine our highly specialized knowledge with our client’s experiences and corporate knowledge to produce the best accuracies and results possible.”

Natural Language Processing: The Key To Dark Data

Natural language processing holds the key to organizations’ abilities to sift through dark data—after all, natural language is what underpins every query and request an organization could encounter. Indeed, Sigma deploy the practice throughout their solutions. Dr. Tapias believes natural language processing (NLP) is transforming organizations in three key ways:

  • Through enabling the automation of information extraction and classification from large amounts of texts and / or audio transcription.

  • Dramatically expanding text analytics and obtaining key attributes about the person who wrote the text, such as profile information, gender and age, education level, language proficiency, intent, sentiment, etc.

  • Along with other technologies, NLP is changing the interaction between business and customer through intelligent assistants and virtual agents. He explains that intelligent assistants can not only facilitate communication between humans and machines—they can also predict emotions, intent, and other parameters in real time. This contextual information enables interactions with humans to improve by decreasing errors, time of delivery, and increase customer retention.

The Industry Must Respond To New Ethical Questions

The effects of AI go far beyond those of natural language processing. AI is set to impact every single industry, and Dr. Tapias believes that in the next decade, many companies and most industries will have an AI component to their business strategies. He argues that adopting AI will not only improve processes or decisionmaking, but will be “absolutely essential” to remaining competitive and ensuring businesses’ long-term survival. “Companies will need to adapt to the new paradigm to grow market share, improve products, and lower costs. AI will play a key role in increasing a company's competitiveness and provide the customer a better product or experience. It is not a matter of if AI will disrupt, but when.”

As the deployment of AI expands, ethical questions will become increasingly relevant—particularly when it comes to facial recognition and other biometric identification techniques. Another major area of ethical concern is profiling by detecting characteristics such as age, emotion, intent, and education level. Behavioural marketing and targeted sales will experience dramatic changes - and the power of this information should give us all pause.

Dr. Tapias believes that AI companies have a critical role to play in terms of balancing the benefits and profits of facial recognition with business ethics, privacy concerns, and the right to avoid exploitation. Where that line actually sits remains to be seen, but he argues that, much like e-commerce in the late 90s, it may take a generation for consumers to become accustomed to the intrusiveness these technologies bring along with their benefits. He implores consumers to trust tech developers, but also believes tech firms need to accommodate these issues before it's too late.

“If tech companies do not apply these technologies responsibly, and if the public does not learn to trust technology providers, privacy protection laws will become more extensive—in turn likely limiting many of the benefits of facial recognition before they are borne out,” he argues. “It is in the best interest of our industry to get ahead of this and police ourselves to ensure the protection of privacy. Compliance with existing law is of paramount importance, and the analysis of the new use cases from the ethics standpoint is something that must be monitored continuously.”

Looking Forward: Next Week's AI Summit in San Francisco

“This is Sigma’s third time partnering with The AI Summit, and frankly, we find it to be the best forum to understand our future customers. Our hope is that a few of these customers also walk away after speaking with us with the understanding that working in AI does not require a large investment; that it can be done in a very deliberate and calculated way in order to show return on capital investment before moving onto a new product; that a customized and bespoke solution can be done with an industry partner like Sigma who are experts in the technology; and finally that our customers hold the key to their own industry expertise in order to make AI work for their needs.”

Dr. Daniel Tapias is CEO and Founder of Sigma Technologies. We look forward to seeing him participate in a panel discussion at next week's Summit entitled 'What makes an AI business project successful?'


Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like