LinkedIn develops centralized data science platform for AI engineers

The platform is expected to become open source soon

Ben Wodecki

February 10, 2022

2 Min Read

The platform is expected to become open source soon

LinkedIn, which stores exabytes of data about a person’s professional life, is now making it easier for data scientists and AI engineers to find and use that data to power such things as job recommendations.

It unveiled a one-stop data platform called DARWIN, which stands for Data science and Artificial intelligence Workbench at LinkedIn. It was developed by LinkedIn engineers to improve productivity damaged by poor developer experiences.

Varun Saxena, Harikumar Velayutham and Balamurugan Gangadharan, the minds behind the platform, said that the data tools they have used previously for exploratory data analysis, experimentation and visualization were too fragmented.

“We soon realized a need for building a unified ‘one-stop’ data science platform that would centralize and serve the various needs of data scientists and AI engineers,” they wrote in a company blog post. DARWIN would house “all the knowledge related to working with data, without having to leave the platform, be it accessing data, understanding it, analyzing it, finding references to build context, or generating reports.”

The system supports multiple engines to query datasets across LinkedIn – including Python, R, Scala and Spark SQL.

DARWIN also provides direct access to data on HDFS, a welcome addition for those using platforms such as Tensorflow.

The platform itself is built atop Kubernetes, which the engineers behind it suggest will give it scalability.

And the team behind it have plans for the platform. Future updates to the system would allow support for built-in visualization capabilities and exploratory data analysis support.

DARWIN will eventually go open source, according to LinkedIn, “so that other organizations looking for similar capabilities can leverage it.”

“Our eventual vision for DARWIN is to realize all the use cases that support the development lifecycles of various user personas and reach a state where either the functionalities of surrounding tools are supported in DARWIN or we integrate with external apps and frameworks."

About the Authors

Ben Wodecki

Assistant Editor

Get the newsletter
From automation advancements to policy announcements, stay ahead of the curve with the bi-weekly AI Business newsletter.