AI vs. Machine Learning: What do we really mean when we say Artificial Intelligence?

by John Matthews, ExtraHop 28 November 2019

Artificial Intelligence (AI) might be the most popular word in cybersecurity. And why not? A piece of technology that learns and thinks like a human brain, but better. Who wouldn’t want that?

The mere word conjures up images of supercomputers, sentient robots and rogue operating systems. The promises of AI are great and cybersecurity has seized upon the concept with great enthusiasm. However, in an industry where jargon reigns, it can be easy to lose sight of the original meaning of concepts or for them to morph into something larger than what they are. That lack of precision can mean poor security decisions and confusion as to where, and how to deploy security resources.

A survey by CapGemini earlier this year revealed just how much the cybersecurity industry expects of Artificial Intelligence with over 60 percent of respondents proclaiming they cannot “identify critical threats” without AI technology.

While the excitement for AI can’t be understated, the term has suffered from overuse in the fervor and stretched the meaning of the term rather far.

A report from MMC Ventures showed that 40 percent of the European tech startups that claim to be using Artificial Intelligence, in fact are not. The report also found that companies that were listed as such received 15 to 50 percent more funding than those that didn’t, further incentivizing the stretch the meaning of “AI” to impress investors.

Artificial Intelligence has many definitions but a good working definition came from Andrew Moore, the former dean of Carnegie Mellon’s University’s School of Computer Science. Simply – “Artificial Intelligence is the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence.”

Machine Learning on the other hand, offers something more modest. Machine Learning is a computer system that can continually learn from new experiences and take on new data autonomously. From there, it can easily automate a lot of the more mundane tasks in IT and continue to improve upon them without much more input needed from its operator.

In cyber this often expresses itself in vendors telling customers that they’re offering Artificial Intelligence when they’re really offering Machine Learning, a tremendously effective piece of technology but ultimately a constitutive part of an Artificial Intelligence.

The importance of the definition should not be understated. “Machine Learning is a subset of Artificial Intelligence.” There is much confusion in the market when the two terms are conflated or interchanged turning into promises that can’t be delivered upon.

One of the most effective uses of Machine Learning in cybersecurity is in Network Detection and Response (NDR). Machine Learning allows organizations to make better use of their data to monitor their own networks and spot anomalous behavior. In many places, that job is still being done manually by humans who have to pore over thousands of false positive alerts a day – dulling their alertness. Machine Learning enables organizations to reduce false positives by learning baseline behavior to understand what is normal and only alert on what is a real risk. This means that security teams can respond to threats faster and with more certainty.

Many NDR solutions are currently rule- or signature-based. Rule-based systems police their environments by matching observed behavior on the network against known signals of suspicious behavior.

However, they typically require constant maintenance and updates. They can only respond to threats that have been seen before and incorporated into a current knowledge base, precluding them from detecting new attacks and tactics. Furthermore, rules are fairly simple for attackers to evade. A minor tweak to an existing attack tactic or piece of malware is often enough to make existing rules ineffective against the attack.

Signature-based systems detect malware from a catalogue of known publicly reported signatures using file hashes. But problems still arise when new, unknown malware families arrive on the scene or adversaries merely change signatures. This approach has no hope against modern polymorphic malware.

The problem with both of these approaches lies in an asymmetry between cyber attackers and their victims. Attackers can develop new attack tactics and new malware variants much more quickly than defenders can develop new defenses. Attackers only need one of their attempts to work, whereas defenders need to defend against every existing tactic while always anticipating never-before-seen attacks.

Here’s where Machine Learning’s edge is sharpest. By learning from real-time observed behavior in the environment that needs to be defended, a ML system can improve upon itself much more effectively than a manually updated signature database, and can refine its results over time without human intervention.

An NDR solution with Machine Learning can learn what suspicious behavior looks like in the context of a specific environment, rather than relying on generic known-bad behavior signals and indicators of compromise. Furthermore, it can adapt to new tactics, and better respond to the ever-changing threat landscape.

AI often means whatever people want it to mean. It can be a powerful marketing term, but as more technologies based on this slippery foundation fail to deliver on their promises, the customer’s trust in the term will wane rapidly. Machine Learning can be a powerful arrow in the security operations quiver, but buyers should learn to recognize the telltale signs of AI snake oil, and should verify that any vendor claiming AI or Machine Learning can deliver on the value they promise.


John Matthews is Chief Information Officer at ExtraHop, a company specializing in network traffic analysis tools.