Proprietary monitoring and automation features for the open source database

Max Smolaks

June 17, 2020

3 Min Read

Proprietary monitoring and automation features for the open source database

DataStax, the American startup responsible for a commercial version of Apache Cassandra, has revealed an AI-based software product that makes the popular database much easier to maintain.

Vector, now entering private beta, adds AI-based features that can monitor Cassandra clusters, spot developing issues and recommend actions necessary to fix them.

It fits into the emerging AIOps software category that brings AI-based automation to IT operations, helping support the dazzling variety of hardware and software necessary to keep modern online services alive.

“Even though the architecture is very resilient, this is software that’s running on nodes that have NICs that have routing between them, that sit behind load balancers, that have memory pressures, that have other things running on these boxes even though they shouldn’t,” Ed Anuff, chief product officer at DataStax, told AI Business.

“That’s why even if you have a very resilient architecture, having automated monitoring and interpretation of those results, with alerting and proper escalation, and break-fix recommendations, is incredibly important. What ends up happening is a site reliability engineer goes in and builds that automation for their specific system. Nobody had really done that for Cassandra.”

The raw Cassandra code is used to manage some of the world’s largest datasets by organizations like Apple, eBay, Microsoft, Hulu, Netflix, and many others.

Vector is compatible with both the open source Cassandra distribution available form the Apache Foundation, and the proprietary version, known as DataStax Enterprise (DSE).

Running Cassandra is hard

Cassandra is a non-relational database, which means it excels at dealing with semi-structured and unstructured data that can prove too difficult for SQL. Originally developed at Facebook, it was designed for organizations that practice ‘cattle, not pets’ approach to servers – one of its core selling points is dealing gracefully with failures, due to the fact that the architecture lacks a master node, everything is distributed.

The open source version saw first release in 2008, and in 2011, DSE became the first commercial distribution of Cassandra.

Over the past decade, the database emerged as a favourite with hyperscalers and professional data crunchers, but a major obstacle remained: it was still difficult to maintain if you wanted to do it in-house.

“One of the things that we’ve been working on for a long time has been something that does monitoring and automates maintenance around Cassandra, so that keeping it in production is just a lot easier to do,” Enuff said. “That’s what Vector is all about.”

The software can suggest Cassandra and OS configuration, schema design, and performance and query techniques. It can offer multiple ways to fix any developing issues, and includes detailed explanations on the effects of every decision.

Vector also produces advanced visualizations of system usage, helping developers and operators see and understand how the cluster is performing without having to log into Cassandra nodes.

“It uses several different mechanisms – some of them are algorithmic, machine learning-based, for being able to go and look for operational patterns. Other pieces are based on heuristics and matching certain things that we’ve seen from best practices within data model structure,” Anuff explained.

Some of the automation components within Vector were built using knowledge gained with the recent acquisition of Cassandra consulting and services provider The Last Pickle. But the machine learning models were developed within DataStax, after it trialled ML in a monitoring product called Insights.

According to Anuff, the company currently employs a dozen data scientists developing and refining ML-based features across its product range.

In May, DataStax launched Astra – essentially Cassandra reimagined as a cloud-native service powered by Kubernetes. It aims to considerably simplify the running of the database in public clouds, and ships with several ML-based features on board. Vector is yet another way to make Cassandra easier to use, for customers who insist on getting down to the nitty-gritty of database operation.

Vector is expected to reach general availability this Fall.

You can watch our interview with Jonathan Ellis, co-founder and CTO of DataStax, here.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like