基于语义API依赖图的恶意代码检测

Malware detection based on semantic API dependency graph

作者：赵翠镕(四川大学网络空间安全学院)；方勇(四川大学网络空间安全学院)；刘亮(四川大学网络空间安全学院)；张磊(四川大学网络空间安全学院)

Author：ZHAO CuiRong(College of Cybersecurity，Sichuan University)；FANG Yong(College of Cybersecurity，Sichuan University)；LIU Liang(College of Cybersecurity，Sichuan University)；ZHANG Lei(College of Cybersecurity，Sichuan University)

收稿日期：2019-11-27 年卷（期）页码：2020,57(3):488-494

期刊名称：四川大学学报: 自然科学版

Journal Name：Journal of Sichuan University (Natural Science Edition)

关键字：恶意代码；API依赖图；AEP；真机分析；随机森林；

Key words：malware detection; API Dependency Graph; AEP; physical machine analysis; Random forest;

基金项目：国家重点研发计划项目(2017YFB0802900)

中文摘要

传统的恶意代码动态分析方法大多基于序列挖掘和图匹配来进行恶意代码检测，序列挖掘易受系统调用注入的影响，图匹配受限于子图匹配的复杂性问题，并且此类方法并未考虑到样本的反检测行为，如反虚拟机.因此检测效果越来越差.这篇文章设计并提出一种基于程序语义API依赖图的真机动态分析方法，在基于真机的沙箱中来提取恶意代码的API调用序列，从而不受反虚拟机检测的影响.这篇文章的特征构建方法是基于广泛应用于信息理论领域的渐近均分性（AEP）概念，基于AEP可以提取出语义信息丰富的API序列，然后以关键API序列依赖图的典型路径来定义程序行为，以典型路径的平均对数分支因子来定义路径的相关性，利用平均对数分支因子和直方图bin方法来构建特征空间.最后采用集成学习算法-随机森林进行恶意代码分类.实验结果表明，这篇文章所提出的方法可以有效分类恶意代码，精确率达到97.1%.

英文摘要

Traditional dynamic analysis methods are mostly based on sequence mining technology and graph matching technology to detect malware. Sequence mining technology is susceptible to system call injection, while graph matching technology is limited by the complexity of subgraph matching. Moreover, these methods don’t consider the anti detection behavior of samples, such as anti virtual machine. Therefore, the accuracy of detection becomes worse and worse. In this paper, we design a physical machine dynamic analysis method based on program semantic API dependency graph. The API call sequences of malware are extracted in the sandbox based on real machine, so as to avoid the influence of anti virtual machine detection. Our feature construction method is based on the asymptotic equipartition property (AEP) concept widely used in information theory. We can extract the semantic information rich API sequences based on AEP, and then the behavior is defined with the typical path of the API dependency graph. We define the relevance of the path by the average logarithmic branch factor of typical paths. The average logarithm branch factor and histogram bin are used to construct the feature space. Finally, this paper adopts the random forest to classify malware. Experimental results show that the proposed method can effectively classify malware with the accuracy of 97.1%.

【关闭】

论文摘要

基于语义API依赖图的恶意代码检测

Malware detection based on semantic API dependency graph