期刊导航

论文摘要

基于模糊哈希特征表示的恶意软件聚类方法

A malware variant clustering method based on fuzzy hash

作者:肖锦琦(四川大学计算机学院);王俊峰(四川大学计算机学院)

Author:XIAO Jinqi(College of Computer Science, Sichuan University);WANG Jun-Feng(SiChuan University)

收稿日期:2017-08-11          年卷(期)页码:2018,55(3):469-476

期刊名称:四川大学学报: 自然科学版

Journal Name:Journal of Sichuan University (Natural Science Edition)

关键字:恶意软件家族;聚类;模糊哈希;特征提取

Key words:Malware family;Clustering;Fuzzy Hash;Extract features

基金项目:国家重点研究与发展项目(2016YFB0800605); 国家自然科学基金(91338107, 91438119, 91438120); 国家教育部博士点基金(20130181110095); 四川省重点科学技术研究发展项目(2016ZR0087)

中文摘要

目前,每年被拦截到的新型恶意软件变种数已达千万级别,在线恶意软件仓库Virus Share上存储的未分类的恶意软件数量也超过了2700万。将恶意软件按一定的行为模式进行聚类,不仅使新型攻击更易被检测出来,也有助于及时获取恶意软件的发展态势并做出防范措施。为此,本文提出了一种高效的恶意软件聚类方法,对恶意样本进行动态分析并筛选出包括导入、导出函数、软件字符串、运行时资源访问记录以及系统API调用序列等特征,然后将这些特征转换为模糊哈希,选用DPC聚类算法对恶意软件样本进行聚类。并将聚类个数、准确率、召回率、调和平均值以及熵作为聚类效果的外部评估指标,将簇内紧密度以及簇间区分度作为内部评估指标,实验结果表明,与Symantec和ESET-NOD32的分类结果相比,本文提出的方法的聚类家族个数与人工标记的数量最为接近,调和平均值分别提升11.632%,2.41%。

英文摘要

Internet Security companies collect tens of millions of new malware variants each year, Virus Share, the online malware repository, has stored more than 27 million unlabeled malwares. Clustering malware variant according to certain behavior patterns, not only makes the new attack easier to be detected, but also helps us to obtain the malware trends in time and take the corresponding preventive measures. Therefore, this paper proposes a malware variant clustering method which use dynamic analysis technology to extract malware features, including import and export function name, strings, system resource records and system calls, then convert these features to the fuzzy hashes, finally clustering malware samples through the DPC clustering algorithm. We select the number of clusters, precision, recall, F-score and entropy as external criteria, select the intra-cluster cohesion and inter-cluster separation as internal criteria. The experimental results demonstrate that compared with Symantec and ESET-NOD32, the F-score obtained in this paper increased by 11.632% and 2.41%, and the number of clusters is closest to the artificial labeled.

关闭

Copyright © 2020四川大学期刊社 版权所有.

地址:成都市一环路南一段24号

邮编:610065