Including what it calls the first production-scale malware research dataset available to the general public

Chuck Martin, Editorial Director AI & IoT

December 15, 2020

2 Min Read

Including what it calls the first production-scale malware research dataset available to the general public

In an effort to promote the use of AI-based systems in cyber security, British security software vendor Sophos is sharing datasets, tools, and methodologies in four separate areas.

These include research, protection methods, malware detection, and signature generation tools.

“With SophosAI’s new initiative to open its research, we can help influence how AI is positioned and discussed in cyber security moving forward,” said Joe Levy, chief technology officer of the company.

Sharing is caring

For research purposes, Sophos is sharing SOREL-20M (Sophos-ReversingLabs – 20 million), a production-scale dataset with metadata, labels, and features for 20 million Windows Portable Executable files, including 10 million disarmed malware samples. Sophos calls the dataset, developed in a joint project with ReversingLabs, “the first production-scale malware research dataset available to the general public.”

A new AI-powered impersonation protection method is also being shared (discussed at Defcon here), for defense against email “spear phishing,” which can mimic trusted colleagues. The AI-based system was trained on a sample of millions of known attack emails.

For undetected malware, Sophos built a set of publicly available, epidemiology-inspired statistical models for estimating the prevalence of malware infections in total, enabling a better chance of discovery.

And finally, the company has developed and shared YaraML, an open sourced system for automatic signature generation, which “compiles” machine learning models of the kind used in commercial security products into signature languages.

“Today’s cacophony of opaque or guarded claims about the capabilities or efficacy of AI in solutions makes it difficult to impossible for buyers to understand or validate these claims. This leads to buyer skepticism, creating headwinds to future progress at the very moment we’re starting to see great breakthroughs,” Levy said.

“Correcting this through external mechanisms like standards or regulation won’t happen quickly enough. Instead, it requires a grassroots effort and self-policing within our community to produce a set of practices and language that will advance the industry in a disruptive, open and transparent manner.”

About the Author(s)

Chuck Martin

Editorial Director AI & IoT

Chuck Martin, a New York Times Business Bestselling author, futurist and columnist, is Editorial Director at Informa Tech, home of AI Business, IoT World Today and Enter Quantum. Martin has been a leader in emerging digital technologies for more than two decades. He is considered one of the foremost Internet of Things (IoT) experts in the world and his latest book is titled "Digital Transformation 3.0" (The New Business-to-Consumer Connections of The Internet of Things).  He hosts a worldwide podcast titled “The Voices of the Internet of Things with Chuck Martin,” where he converses with top executives from the companies driving the Internet of Things.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like