ExtraHop® Open Sources Machine Learning Dataset to Help Security Teams Detect Malware and Botnet Operations Faster

Raja Mukerji, Chief Scientist and Co-Founder, ExtraHop

ExtraHop, a leader in cloud-native network detection and response (NDR), today announced it is open sourcing its expansive 16 million row dataset – one of the most robust available – to help defend against domains generated by algorithm (DGAs). This is in an effort to level the playing field for defenders and empower businesses of all sizes to better secure their organisations by strengthening defenses against malware and botnet operations.

Amid a widening cybersecurity skills gap (up 26% in the last year) and dwindling resources, the cyber landscape is rapidly evolving. As new threats rapidly appear, open sourced research and datasets are a solution to overcoming the challenges security teams face on a daily basis.

“The challenges we face in security are formidable and dynamic, and, with this initiative, we’re democratising the tools needed for threat research detection for security teams of all sizes, backgrounds, and industries,” said Raja Mukerji, Chief Scientist and Co-Founder, ExtraHop. “Collaboration among the cybersecurity community is invaluable – coming together to share our best work is the only way to remain on the offense and put attackers at a disadvantage. Our research will be a gamechanger for the community and we encourage other teams to open source their own insights that will similarly benefit the industry at large.”

Striving for industry collaboration, ExtraHop is releasing its DGA detector dataset, made up of more than 16 million rows of data, on GitHub to help security teams identify malicious activity in their environments before they become a business problem.

DGAs are used by threat actors to maintain control within an organisation’s environment upon making their entrance onto a network, making attacks difficult to detect and stop. Originally built for ExtraHop’s award-winning NDR platform, Reveal(x), this research can now be used by any security researcher to construct their own machine learning (ML) classifier model to more quickly identify DGAs and intervene in attacks with greater speed and precision. Since its implementation in Reveal(x), the ExtraHop DGA model has demonstrated more than 98% accuracy.

“Giving threat actors the ability to operate undetected and an uptick in these types of attacks, DGAs are increasingly considered a major threat to businesses today,” said Todd Kemmerling, Director of Data Science, ExtraHop. “As we began developing a model for detecting DGAs, it became apparent there was a lack of public datasets accessible to security teams with a wide-ranging set of resources. With this dataset, we are filling that gap, giving any security team access to the pivotal data needed to detect DGAs swiftly.”

Access the full dataset on GitHub today. For more details, read our blog on DGAs.