期刊导航

论文摘要

基于最长频繁序列挖掘的恶意代码检测

Malware Detection Based on Longest Frequent API Sequence

作者:黄琨茗(四川大学网络空间安全学院);张磊(四川大学网络空间安全学院);赵奎(四川大学网络空间安全学院);刘亮(四川大学网络空间安全学院)

Author:HUANG Kun-Ming(College of Cybersecurity,Sichuan University);ZHANG Lei(College of Cybersecurity,Sichuan University);ZHAO Kui(College of Cybersecurity,Sichuan University);LIU Liang(College of Cybersecurity,Sichuan University)

收稿日期:2019-10-17          年卷(期)页码:2020,57(4):681-688

期刊名称:四川大学学报: 自然科学版

Journal Name:Journal of Sichuan University (Natural Science Edition)

关键字:恶意代码;最长频繁序列;序列挖掘;词袋模型;随机森林算法;

Key words:malware detection; longest frequent API sequence; sequence mining; Bag-of-word; malware detection;

基金项目:省自然科学基金,国家高技术研究发展计划

中文摘要

基于动态API序列挖掘的恶意代码检测方法未考虑不同类别恶意代码之间的行为差别,导致代表恶意行为的恶意序列挖掘效果不佳,其恶意代码检测效率较低.本文引入面向目标的关联挖掘技术,提出一种最长频繁序列挖掘算法,挖掘最长频繁序列作为特征用于恶意代码检测.首先,该方法提取样本文件的动态API序列并进行预处理;然后,使用最长频繁序列挖掘算法挖掘多个类别的最长频繁序列集合;最后,使用挖掘的最长频繁序列集合构造词袋模型,根据该词袋模型将样本文件的动态API序列转化为向量,使用随机森林算法构造分类器检测恶意代码.本文采用阿里云提供的数据集进行实验,恶意代码检测的准确率和AUC(Area Under Curve)值分别达到了95.6%和0.99,结果表明,本文所提出的方法能有效地检测恶意代码.

英文摘要

Existing malware detection methods based on dynamic API sequence mining do not consider the behavior differences between different types of malware, resulting in low efficiency of malicious code detection. In this paper, an object oriented association mining technology is introduced, and a malware detection method is proposed based on the longest frequent sequence mining algorithm of the same category. First, the method extracts the dynamic API sequences of sample files and preprocesses them; then, the longest frequent sequence mining algorithm is used to mine the longest frequent sequence sets of multiple categories; finally, the longest frequent sequence set is used to construct the word bag model, according to the word bag model, the dynamic API sequences of sample files are transformed into vectors, so that the longest frequent sequence mining algorithm can be used to mine the longest frequent sequence sets of multiple categories. Random forest algorithm is used to construct classifier to detect malicious code. In this paper, we use the data set provided by the Aliyun Security Algorithms Challenge. The accuracy rate and AUC of malware detection are 95.6% and 0.99, respectively. The results show that the proposed method can effectively detect the malware.

关闭

Copyright © 2020四川大学期刊社 版权所有.

地址:成都市一环路南一段24号

邮编:610065