AI Business is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.

AI Security

Sophos shares data and tech to advance the state of AI in cyber security

Article ImageIncluding what it calls the first production-scale malware research dataset available to the general public

In an effort to promote the use of AI-based systems in cyber security, British security software vendor Sophos is sharing datasets, tools, and methodologies in four separate areas.

These include research, protection methods, malware detection, and signature generation tools.

“With SophosAI’s new initiative to open its research, we can help influence how AI is positioned and discussed in cyber security moving forward,” said Joe Levy, chief technology officer of the company.

Sharing is caring

For research purposes, Sophos is sharing SOREL-20M (Sophos-ReversingLabs – 20 million), a production-scale dataset with metadata, labels, and features for 20 million Windows Portable Executable files, including 10 million disarmed malware samples. Sophos calls the dataset, developed in a joint project with ReversingLabs, “the first production-scale malware research dataset available to the general public.”

A new AI-powered impersonation protection method is also being shared (discussed at Defcon here), for defense against email “spear phishing,” which can mimic trusted colleagues. The AI-based system was trained on a sample of millions of known attack emails.

For undetected malware, Sophos built a set of publicly available, epidemiology-inspired statistical models for estimating the prevalence of malware infections in total, enabling a better chance of discovery.

And finally, the company has developed and shared YaraML, an open sourced system for automatic signature generation, which “compiles” machine learning models of the kind used in commercial security products into signature languages.

“Today’s cacophony of opaque or guarded claims about the capabilities or efficacy of AI in solutions makes it difficult to impossible for buyers to understand or validate these claims. This leads to buyer skepticism, creating headwinds to future progress at the very moment we’re starting to see great breakthroughs,” Levy said.

“Correcting this through external mechanisms like standards or regulation won’t happen quickly enough. Instead, it requires a grassroots effort and self-policing within our community to produce a set of practices and language that will advance the industry in a disruptive, open and transparent manner.”

Trending Stories
All Upcoming Events

Upcoming Webinars

More Webinars

Latest Videos

More videos


More EBooks

Research Reports

More Research Reports
AI Knowledge Hub

Newsletter Sign Up

Sign Up