Robert, can you tell us what Arria NLG does?
The NLG in our name stands for Natural Language Generation, and that’s an apt description of what we do. In short, we develop applications that take data and turn it into meaningful text. But that simple description hides a lot of complexity. To explain what’s going on in data you have to understand and interpret that data to turn it into what you might think of as information. Then you have to work out how to best express that information for the audience you’re talking to. That means selecting the content that’s most appropriate for the audience, and expressing it in language that makes sense for that audience. So really what we’re doing is using cutting edge artificial intelligence and computational linguistics techniques to deliver the stories that are hidden in data.
So where did Arria come from?
Arria NLG has been around since 2012, but we have a history that goes back much further. Arria NLG started out as a spinout from the University of Aberdeen called Data2Text that was founded around 2009; and way back before that, myself and Arria’s chief scientist, Ehud Reiter, were active in the academic NLG research community. There are ideas embodied in our technology that go right back to when we started doing research in the area in the early 1980s.
Can you explain a little more about the technology and what differentiates Arria in the market?
As far as we can tell, most of the other players in the market are providing what you might think of as smart template systems; think mail merge on steroids. That’s a perfectly respectable technology, and there are a lot of use cases you can cover with it. It’s also where we started in the early days, but over the last 30 years we’ve gradually made our technology more and more sophisticated to deal with the wide variety of data situations we have come across. One way of summing up the difference is to use the concept of data regularity. Simpler template-based approaches are appropriate if your input data is very regular: for every new story you generate, you know you’ll be talking about the same basic set of variables, and it’s just their values that change. But simpler approaches break down when you have what you might think of as irregular or sparse data, where each story ends up talking about a different configuration of the data variables – in a sense, many of the situations you need to narrate are effectively outliers. When that’s the case, you need a more linguistically informed process that is able to put together fragments of language to build a coherent picture, much as you would assemble a jigsaw puzzle. It’s those sophisticated linguistic processes that we’ve spent 30 years perfecting.
We hear a lot about NLP – why is Arria focused purely on the generation rather than the understanding part of the chain?
For a long time, NLG has been the neglected sibling. So much so that people often think all NLP is Natural Language Understanding, when in fact the term NLP covers both Natural Language Understanding (NLU) and Natural Language Generation. NLU has been very visible for the last couple of decades because we’ve had lots of documents in machine readable form since the early days of computing, and you need NLU techniques to extract something useful from those documents. But of course what’s happened in the last few years is the explosion of data sources, so now suddenly we need to find ways to make sense of massive data sets. And that’s where NLG plays an important role.
There’s obviously many applications for the technology. Can you share some examples of NLG in action?
Our early wins were in the oil and gas sector, and in reporting on the behaviour of rotating machinery. That’s a domain where you have masses of sensor data that can take an expert engineer hours to analyse and explain; but the Arria NLG engine can produce detailed reports along with recommendations for action in a couple of minutes, and those reports are indistinguishable from the reports the expert would write. More recently we’ve been applying the same techniques in management reporting, digital media campaign analytics and financial reporting, where the granularity of the data may not be so extreme as in the sensor world, but you still need to pull together diverse sources of data, integrate all that into a common view of the world, and then reason about it to provide explanations of what is going on: why insurance policy revenue from a particular region is down, how you might reallocate online advertising spend to maximize conversions, or what it is that is driving revenue changes across your product lines.
Are customers requesting on-premise implementations or through the cloud, and which offers the best situation for Arria?
Our earliest deployments were all on-premise, and we still have customers who require that for one reason or another – many businesses are wary of letting sensitive data get outside the firewall. But we’re seeing an increasing shift towards cloud-based services, which is the approach used in most of our more recent deployments. That’s better for us, of course, in terms of ease of software updates and the like; and that has parallel benefits for the customer.
What languages are you currently able to offer, and which are next in development?
We’re very much driven by customer demand in this regard, as in other areas. Our technology architecture is organized in a way that makes it relatively straightforward to add new languages; that’s one of the learnings that comes from the decades of research we’ve put into this. We recently completed a project that delivered German weather reports – weather reporting is an area we’ve been active in for a long time, and we expect to deliver another European language soon as part of that project. For commercial reasons, though, I’m not at liberty to say what that is.
What does the future hold for new developments at Arria?
We’re moving forward in a number of directions. Perhaps the two most important of those are a move towards products that are easy for the user to configure, rather than requiring a professional services engagement; and further evolution of our Software Development Kit, which is out in beta. That’s a really exciting direction, because we’ve long realized that there are many more use cases for NLG than we’ll ever be able to satisfy, so we want to make it possible for others to develop those applications themselves. Our SDK gives customers and third-party developers the ability to build NLG applications of the same sophistication as the apps we deliver today.
That concludes our interview, and if you’re interested in finding out more about Arria and how NLG could benefit your business, you can take a look at the following website: http://www.arria.com/