KNIME removes barriers between model development and production

The new Integrated Deployment functionality saves time and effort

by Max Smolaks 2 April 2020

KNIME, the open source data analytics platform, has added functionality that enables data scientists to move models from development to production without having to alter any code.

Integrated Deployment identifies and packages not just the model, but all of its associated data preparation and post-processing steps so they can be automatically reused.

“This solves perhaps one of the biggest problems in data science today by completely eliminating the gap between the art of data science creation and moving the results into production,” said Michael Berthold, co-founder and CEO of KNIME.

Integrated Deployment was launched at the KNIME Spring Summit 2020, taking place this year as an online-only event.

KNIME creates a workflow to generate an optimal model

“Productionize”

The development of KNIME Analytics Platform (from Konstanz Information Miner) is led by KNIME the company, headquartered in Zurich. It is used for a variety of purposes including data mining, business intelligence and machine learning.

KNIME (the platform) started out in 2006 as a proprietary software product, but made a pivot to GPLv3 – the most ‘hardcore’ free and open source license – with the release of version 2.1 in 2009. This means it can be downloaded, shared and modified without any restrictions, and there’s even a special provision that enables other companies to develop new ‘nodes’ for KNIME and sell them.

The Integrated Deployment process aims to simplify the lives of data scientists that build their models on KNIME. Previously, moving a model into production required manual replication of the exact data creation and model settings; now these can be maintained automatically.

Here’s how it works, according to the company: “Using open-source KNIME Analytics Platform, a workflow is created to generate an optimal model. Integrated Deployment allows a data scientist to mark the portions of the workflow that would be necessary for running in a production environment, including data creation and preparation as well as the model itself, and save them automatically as workflows with all appropriate settings and transformations saved. There is no limitation in this identification process — it can be simple or as advanced (and complex) as required.

“With KNIME Server in production, these captured workflows are then referenced and reused. There is no need to rewrite or recode any of the process.”