pitches open source alternative to AWS SageMaker and Azure ML Engineer

Working to integrate ML workflows into current software development practices

Max Smolaks

March 4, 2021

2 Min Read

Silicon Valley startup has launched major updates for Data Version Control (DVC) and Continuous Machine Learning (CML) software projects, as it hopes to create open source alternatives to machine learning toolsets from major cloud providers like AWS and Microsoft.

The company enables ML engineers and data scientists to more easily work with standard development tools like Git and popular CI/CD stacks.

“AI Platforms are siloed and require everything to go into their own systems creating vendor lock-in,” said Dmitry Petrov, founder and CEO. “ allows users to stay within their application development space and effectively extend the familiar dev environments with tools to support Machine Learning Engineers and Data Scientists.”

DVC and CML fit into the emerging MLOps software category, which is concerned with moving machine learning models from development into production, and running them at scale.

GitFlow for data science was founded in 2018 to develop open source tools to streamline the workflow of data scientists. Today, its projects have more than 200 contributors, and are used by more than 400 companies.

The startup posits that instead of creating separate AI platforms, the industry should integrate ML workflows into current practices for software development.

DVC, for example, is built to make ML models shareable and reproducible, providing users with a Git-like interface for version control – across models, datasets, and intermediate files. It works with remote storage for large files in the cloud or on-premise network storage.

“Harness the full power of Git branches to try different ideas instead of sloppy file suffixes and comments in code,” advertises the DVC project website. “Use automatic metric-tracking to navigate instead of paper and pencil.”

The latest release, DVC 2.0, adds capability to run lightweight ML experiments without the need to commit any code to git, ML model checkpoints versioning, and better CPU/GPU resource allocation.

Meanwhle, CML claims to hide the complexity of clouds from data scientists and ML engineers. It offers an open source library for implementing continuous integration and delivery (CI/CD) – the backbone of modern DevOps – in machine learning projects. The project enables users to automate parts of their development workflow, including model training and evaluation, and auto-generate reports with metrics and plots.

It’s still early days for CML, which has just reached version 0.3.

You can find more technical details about DVC and CML in the video below:

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like