Sorcero launches Ingestum, a free and open source data ingestion engine

The project aims to break down unstructured data without destroying its value

Nick Booth

March 23, 2021

2 Min Read

The project aims to break down unstructured data without destroying its value

Enterprise AI startup Sorcero has revealed Ingestum, a free and open source (FOSS) content ingestion framework that supports sourcing and transformation of a wide variety of data types into a uniform format.

“We want organizations to benefit from AI and ingestion is a significant barrier,” said Dipanwita Das, CEO at Sorcero. “Open-sourcing Ingestum will democratise ingestion.”

Bon appétit

Information may be the lifeblood of companies, but their AI machines are very fussy eaters, according to Sorcero. Many of the most ‘nutritious’ data sources that can be fed to AI systems will contain information that cannot be released from arbitrary and unstructured content formats.

Ingestum was a reaction to a recurring problem Sorcero was running into when serving large enterprise customers, Walter Bender, the company’s CTO and co-founder of the MIT Media Lab, told AI Business in an email: “We needed a way to manage multiple sources (and document types) in a more consistent and repeatable way.

“Every time a customer would show up with new ingestion needs, we needed to find or build a unique solution. With Ingestum, we can reuse our pipelines and get consistent results. Adding a new source or format is a one-time (and light-weight) effort.”

Ingestum is written in Python, built around reusable and programmable pipelines, and is “largely agnostic” of source and output formats. The software is designed to be extended through the use of plug-ins and can be used either as a command-line tool or as a web service.

It also integrates a number of existing FOSS projects including PDFMiner, Google’s Tesseract-OCR Engine and Mozilla's Deep Speech speech-to-text engine.

“Ingestum leverages many existing open source projects, so no one has to reinvent the wheel; it can easily integrate existing workflows, or incorporate existing software as plugins,” Bender said.

Sorcero’s long term objective is to create applications that understand medical and technical language at scale, through its Language Intelligence Platform. The company currently focuses on Life and Health Insurance, and Life Sciences sectors.

Sorcero has invited IT directors, software engineers, and AI researchers to download and use Ingestum today from Gitlab.

About the Author(s)

Nick Booth

Reporter

Stay Ahead of the Curve
Get the latest news, insights and real-world applications from the AI Business newsletter

You May Also Like