Talking the way crowdsourcing is powering today’s enterprise AI solutions with Appen CEO Mark Brayan
by AI Business 20 September 2019
With machine learning increasingly deployed in real-world business settings, we are beginning to see early iterations of AI technologies mature in the enterprise. Last week at the AI Summit San Francisco, businesses were looking to understand how to grow their AI deployments into something transformational across their organizations.
One of the companies looking to help others on this journey is Appen, which attended the Summit last week. Since its technology integration with the recently-acquired Figure Eight, things are changing fast for AI-powered data annotation platform.
We sat down with Appen CEO Mark Brayan, to explore how the integration is transforming the platform, as well as the key strategic considerations for the enterprise and the challenges ahead for AI in business. Mark has more than twenty-five years of experience in technology and services, and oversees Appen’s leadership, strategy, and culture.
Q: Appen utilizes a ‘global crowd’ of one million people to provide its annotated machine learning datasets. Why crowdsourcing? And what are the limitations of this approach, if any?
A: Appen uses crowdsourcing to assist our clients with a variety of use cases including onsite search, speech recognition systems, computer vision and more. AI is becoming a larger part of our lives, appearing in everything from cars to phones to in-home personal assistants. Still in a relatively early stage, AI is constantly improving but relies on huge amounts of training data to get better. The arguments for outsourcing data collection and annotation are similar to those for outsourcing any aspect of your business. You can scale up faster, test and tune with expert guidance, and keep your own teams focused on building and improving your core business and intellectual property (IP).
Using a crowdsourcing approach for AI training data has many benefits and it’s important to choose the right partner that is trusted to deliver high-quality datasets. Especially with outsourced data projects, you need confidence that your data is secure. Only with the right combination of technology and crowd can organizations create high-quality training data.
Q: Earlier this year, Appen announced a technology integration with Figure Eight – what stage is the integration at now, and what can observers expect to see in the near future?
A: We’re very excited about the combination of Appen and Figure Eight. Earlier this year we announced a technical milestone; Figure Eight customers now have access to Appen’s extensive crowd resources directly through the Figure Eight platform. This expansion in skilled annotators, combined with the automated quality controls and machine learning- assisted annotation tools of Figure Eight, gives both Appen and Figure Eight customers the ability to rapidly scale their AI initiatives.
The Appen platform – already the most comprehensive solution for collecting and labeling images, text, speech, audio, and video – combines Figure Eight’s machine learning-assisted annotation tools and self-serve Client Workspace with Appen Connect to oversee Appen’s global multilingual crowd. Consequently, Appen recently announced new feature updates focused on text and speech data. The updates include:
- Machine Learning-Assisted Text Annotation: New text annotation capabilities locate and classify named entity mentions in unstructured text into predefined categories to support entity extraction and span labeling use cases. Users can also now leverage ‘bring your own model’ outputs to accelerate contributor annotations. Machine Learning-Assisted Text Annotation helps natural language processing (NLP) teams scale quality human annotations.
- Machine Learning-Assisted Text Utterance Collection: Conversational AI customers can now leverage machine learning validators to collect unique and high-quality text utterances in their domain of choice. Text utterance collection jobs allow customers to leverage the power of a distributed workforce of fluent annotators to collect text strings based on prompts or scenarios to power their conversational agents.
- Enterprise Analytics: Enterprise Analytics provides in-platform reporting on your organization’s usage. For larger enterprises, Enterprise Analytics supports the management of multiple teams across the platform. Organization and team administrators can view in-depth analytics and are then empowered to make data-driven decisions about allocation, resourcing, and ROI.
Available for multiple use cases, these feature releases—together with the platform’s ML-Assisted Video Object Tracking using Dots, Lines, and Polygons capability—further cement Appen’s unique ability to deliver on the increasing volume, quality, and speed requirements for training data to support the world’s most innovative AI systems.
Q: Appen works across different verticals, from tech to financial services and the automotive industry, to name but a few. Which sector is seeing the biggest growth in AI right now? Conversely, where are the challenges today?
A: The technology sector has typically been an early adopter, but we’re seeing growth in healthcare, marketing and finance industries. These often appear on lists of sectors most likely to be impacted by AI in the near future. That’s not surprising, as all of these industries have plenty of processes core to the business that can be automated or made more efficient by aiding humans with intelligent technology.
One major challenge is that everyone seemingly wants to invest in AI-enabled technology, but many organizations aren’t setting up a roadmap for their approach first. Companies must identify business problems AI might solve, put together a pilot project proposal, and tackle one project first to determine if larger-scale AI investments will provide ample returns.
Many organizations want to climb the AI mountain in one step, but breaking their approach down into manageable pieces is a better way to understand how AI might impact their organization.
Q: How does a company successfully implement Appen’s data solutions?
A: It is critical for today’s AI projects to first have the end in mind and understand what asks are being made of the data before embarking upon the AI journey. The first, and arguably most critical, step is labeling the training data for the AI initiative. Appen offers flexible options to meet our clients’ varied security and budgetary needs for training data annotation, including:
- Managed services for clients in need of experienced project management;
- Self-service through Appen’s SaaS platform for clients wanting to manage their annotation tasks;
- A variety of secure options for clients with sensitive data, including secure facilities, on- premise solutions, onsite specialists and more.
In addition, we offer clients best practices to aid in creating a solid training data strategy.
Without a well-defined strategy for collecting and structuring the data needed to train, test, and tune AI systems, clients run the risk of delayed projects, not being able to scale appropriately, and ultimately, being outpaced by competitors. Creating a solid strategy is the first step to staying competitive. That includes setting a budget, identifying existing data sources, evaluating data labeling platforms, and ensuring data quality and security. Developing a clear data strategy can also help provide the steady pipeline of data required by many machine learning models.
Q: What have been the most significant developments in the AI space this year?
A: As AI usage has increased, we have begun to hear plenty more about the ethical implications of AI. In many ways, algorithm development takes place in a black box; we hear about AI-enabled solutions, but we rarely hear about the specifics of what goes into them. Earlier this year, legislative bodies began to take action. The European Union issued “Ethics Guidelines for Trustworthy Artificial Intelligence,” in hopes that companies will develop AI with design more rooted in morality.
One way we’ve seen organizations approach the development of ethical AI is by attempting to remove bias from their data. One of the EU’s guidelines is “Diversity, non-discrimination, and fairness,” a tenet that states “Services provided by AI should be available to all, regardless of age, gender, race, or other characteristics.” In order to build AI systems that don’t discriminate, companies must first create training data that isn’t biased. To do so, some are turning to crowdsourced data annotation, with the goal of mitigating some of the biases inherent in human labeling by leveraging a crowd that is specifically curated to cover the gender, race and other demographic characteristics required to remove bias.
Finally, with data formats such as video increasing in volumes, we’re seeing increasing requests for collection and annotation for this data type. Whether it is generated through autonomous vehicles, security surveillance and/or media entertainment, video is a continuously growing data format, with over 500,000 hours of video uploaded and 1 billion hours of video consumed on YouTube every day.
We witnessed the development of video object tracking as an essential element to AI as it annotates video content at scale to meet the demand of today’s modern applications. Without object tracking capability, the cost and time required to annotate individual frames in video would be prohibitive and make AI applications that need to understand objects moving through time and space untenable.