AI Business is part of the Informa Tech Division of Informa PLC
This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.
Enterprise AI startup Sorcero has revealed Ingestum, a free and open source (FOSS) content ingestion framework that supports sourcing and transformation of a wide variety of data types into a uniform format.
“We want organizations to benefit from AI and ingestion is a significant barrier,” said Dipanwita Das, CEO at Sorcero. “Open-sourcing Ingestum will democratise ingestion.”
Information may be the lifeblood of companies, but their AI machines are very fussy eaters, according to Sorcero. Many of the most ‘nutritious’ data sources that can be fed to AI systems will contain information that cannot be released from arbitrary and unstructured content formats.
Ingestum was a reaction to a recurring problem Sorcero was running into when serving large enterprise customers, Walter Bender, the company’s CTO and co-founder of the MIT Media Lab, told AI Business in an email: “We needed a way to manage multiple sources (and document types) in a more consistent and repeatable way.
“Every time a customer would show up with new ingestion needs, we needed to find or build a unique solution. With Ingestum, we can reuse our pipelines and get consistent results. Adding a new source or format is a one-time (and light-weight) effort.”
Ingestum is written in Python, built around reusable and programmable pipelines, and is “largely agnostic” of source and output formats. The software is designed to be extended through the use of plug-ins and can be used either as a command-line tool or as a web service.
It also integrates a number of existing FOSS projects including PDFMiner, Google’s Tesseract-OCR Engine and Mozilla's Deep Speech speech-to-text engine.
“Ingestum leverages many existing open source projects, so no one has to reinvent the wheel; it can easily integrate existing workflows, or incorporate existing software as plugins,” Bender said.
Sorcero’s long term objective is to create applications that understand medical and technical language at scale, through its Language Intelligence Platform. The company currently focuses on Life and Health Insurance, and Life Sciences sectors.
Sorcero has invited IT directors, software engineers, and AI researchers to download and use Ingestum today from Gitlab.