By Ciarán Daly
NORWAY – With more than 3000 STEM papers published every single day, scientific research is moving faster than ever before. As much as 50% of research papers are said to never be read by anyone other than their authors, referees, and journal editors, and a further 90% of published papers are never cited at all. That could all be about to change thanks to Iris.ai, a knowledge automation start-up whose eponymous e-learning tool promises to transform the way companies and academics conduct R&D – with the goal of democratizing science.
“Scientific research is vital for the survival of the human race but is currently very inaccessible,” says Anita Schjøll Brede, CEO of Iris.ai and a Faculty Member at SingularityU in Denmark. Anita is not your conventional tech CEO. She’s a self-described nomad, splitting her time between Norway, the US, and the rest of the globe, pink mohawk and self-built racecar in tow.
Founded two and a half years ago by a group of students at the Singularity University, a Silicon Valley-based startup incubator and thinktank, Iris.ai was ambitious from the start. As part of the center’s Global Solutions Program, Anita and her team were tasked with developing an exponential technology idea that could positively impact the lives of a billion people within a decade. “We started with a team and a challenge to make the world a better place – which as cliché as it sounds, is true – and we quickly found the problem we wanted to work on,” Anita says.
“It’s very tricky for someone who is not a domain expert to know what is going on in a particular field of research. Research papers are hidden behind paywalls and a citation system that is full of bias, making it very difficult for research departments and non-domain to work out what we do not yet know and what we need to know. These are indexed by a terrible citation system – it’s essentially a popularity index – and there’s just too much content to deal with,” explains Anita. “What it means that even deep domain experts aren’t able to stay up to date with everything they want. We’re in a day and age where we need tools to manage all this information in a better way.”
A supercharged digital research assistant
Using natural language processing and an open access database of about 5 million articles, Iris scans for relevant papers across all disciplines, then narrows them down to a specific reading list of 20 – 30 research papers. Instead of taking 3-4 weeks as it does in an industry setting, Iris can complete this process in a couple of days.
Research is an area where both the advantages and pitfalls of AI are abundantly clear. On the one hand, Iris used AI because, in Anita’s words, it’s not possible to use anything else. “Maybe the alternative would be to have a massive amount of people sitting down and indexing this manually, but even that would be stale and outdated within a few years.”
However, neural networks will always run into problems when encountering the complexities of natural human language. That’s why, so far, the Iris tool has what Anita calls a ‘clearer’ use case in the hard sciences; whereas the meaning and context of words in, say, a philosophy paper can be incredibly nuanced, something like a chemistry paper might be far more specific in its use of words, making it easier for a tool like Iris to index efficiently.
By analysing a dataset of 18 million other articles, Iris extracts the most meaning-bearing words in a document by identifying contextual synonyms and topic hypernyms in order to build a ‘fingerprint’. A neural network algorithm then conducts ‘fingerprint matching’ of document similarity to narrow down the results to a few thousand documents.
What follows is an ‘exploration phase’. Iris isn’t a fully automated tool that just hands users twenty papers and tells them to read based on arbitrary editorial decisions. Users participate in an iterative, semi-automated process of excluding and including different topics in order to reach that reading list of 20 – 30 papers.
“Let’s take a real-life example. We had a research group at Chalmer’s that work on autonomous vehicles. They knew in their exploration phase that the corpus they were dealing with had a number of papers about drone technology in-air and underwater that simply wasn’t relevant to them. They told the tool to exclude any paper that had anything to do with underwater or air, and the tool would exclude that. Then, they could look at a specific control mechanism, and no matter which context it was written in, the tool could include that as well – even if it did feature in some papers about underwater or air drone technology.”
The flipside is that Iris enables users to look both at the papers which have been included and those that have been excluded and provide feedback. This feedback enables Iris to measure tool precision and recall times of the whole process in order to self-improve—and provide users with a concrete figure in terms of precision.
AI tools: building transparency in applied AI
Right now, Iris is finding use across both university research departments and corporate R&D departments. They’re working with corporations with large R&D organisations facing a mountain of internal documentation and reports written up to forty years ago. In many companies, these text documents aren’t indexed, use no citation system, are badly labelled, and are often spread out across several different sources.
“If your organisation has been working with this material for twenty years, you’d expect someone to have found a way to make it lighter, or stronger. That simply hasn’t been possible until today,” argues Anita. In the space of a few hours, Iris can index and sort all of that content, providing users with a corpus of workable documentation.
It’s testament to the Iris.ai team’s vision of a participatory, open knowledge landscape, one without the gatekeepers or middlemen – such as large publishing houses – that Anita sees as slowing down knowledge-sharing and innovation. Nowhere is this more vital than in the field of AI, where fears rightly abound regarding transparency, ethics, and bias.
“The fact is, we are in an industry which has an enormous incentive misalignment. Researchers have to publish or perish. One of the gravest threats we’re facing when it comes to AI are all these black box solutions, where we don’t know how the algorithms produced the answers they do. 95% of all AI research is being prepublished directly to archive, both because they cannot wait for the publishing process and because it should be openly available.
Ultimately, Anita hopes to see more and more figures working in applied AI adopt this openness approach. “I want to see them show their cards, so to speak, because the datasets and the algorithms are so vital to understand for the broader community.”
Based in London, Ciarán Daly is the Editor-in-Chief of AIBusiness.com, covering the critical issues, debates, and real-world use cases surrounding artificial intelligence – for executives, technologists, and enthusiasts alike. Reach him via email here.