This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.
AI 2020: What lies ahead for natural language data
by AI Business
17 December 2019
Natural language technology has fueled a boom in AI adoption, as everyone from small businesses to large corporations seek to introduce streamlined, automated language functions into their customer service and back-end systems. But it’s also an area of confusion, owing to plenty of hype—and industries need to get through this confusion in order to bring the sophisticated natural language solutions of tomorrow to fruition.
To gain a better understanding of what natural language AI will look like in 2020, we sat down with Alex Poulis. Alex is the senior director of AI at Transperfect, where he founded their Dataforce division, which focuses on training data for machine learning.
He’s been involved in language technologies since 2002—long before the world entered its current AI hype cycle—and previously worked with Lionbridge on their data collection efforts.
AIB: What have been the major changes since you first started out in natural language AI?
AP: The major changes have been around the availability of data, as well as the computing power which has enabled the use of different techniques in AI, like deep learning.
Machine learning was something that was already available
back in the day, but it wasn’t being used mainly because we didn’t have as much
data. The big data era started a little bit later, but we didn’t have the
processing power or the cloud computing capabilities that we have nowadays.
AIB: Are enterprises losing faith in the hype around AI?
AP: There have been a couple of hype cycles in the history of artificial intelligence already, but I think we are now probably witnessing the biggest. But we are at a point where AI is really usable for many things—not just customer service chatbots—and I don’t think people are going to get tired of that. Instead, we’re going to see more widespread usage of all of these services.
Our contribution to that world is actually getting companies that develop chatbots and other language technologies to work with more languages, so that they can penetrate new markets.
Most companies that develop AI language capabilities start with English, even across regions. We help them penetrate new markets by providing them with high quality training data for machine learning and by helping them localize the content and the rules that their chatbots use, so that they can approach customers in many different markets. We offer this in over 200 languages.
AIB: What are the challenges of working consistently with that many languages?
AP: One of the challenges that our customers face is that they don’t have enough training data in all these languages.
Our task there is to collect this data for them in different markets, so we have to find adequate resources and people that will do the data collection and data labeling and we have to implement processes and technologies that will help us make sure that all the data we collect is high quality.
We are talking about massive projects that may include thousands of participants across the globe; contributing data and making sure that all this data is of adequate quality is not an easy task. It requires very carefully designed processes and very effective technologies to support these processes.
AIB: How do you manage that with clients?
AP: A common misconception is that this is a very easy and cheap thing to do, because their expectations have been set by services that have been out there for some years.
These services are very cheap to use, they’re easy to use,
but in the end, they are only suitable for simpler tasks. When tasks get a
little bit more complicated and more sophisticated, then we see those solutions
start to fail. Still, clients have the expectation.
Our challenge is then to work with clients to educate them
on what it takes to get high quality data, talk through the risks of getting
low-quality data, and then making sure that our partnership is useful to them
from a data perspective.
AIB: What will be the big developments and roadblocks for natural language technology next year?
AP: Looking forward, our clients’ AI implementations growing in more industries and increasingly narrow domains, so we will need to be ready to specialize and provide specialized data in those domains.
Our clients will also have ever-stricter security requirements because a lot of the data that we process includes confidential, client-specific, or private data. As the regulatory frameworks across the globe become more and more strict, we will have to adapt to that, and we will have to make sure that we offer increasingly secure facilities as a result.