基于图卷积网络的恶意代码聚类

Malware Clustering Based on Graph Convolutional Networks

作者：刘凯(四川大学电子信息学院)；方勇(四川大学网络空间安全学院)；张磊(四川大学网络空间安全学院)；左政(四川大学电子信息学院)；刘亮(四川大学网络空间安全学院)

Author：liukai()；fangyong(College of Cybersecurity, Sichuan University)；zhanglei(College of Cybersecurity, Sichuan University)；zuozheng(College of Electronics and Information Engineering, Sichuan University)；liuliang(College of Cybersecurity, Sichuan University)

收稿日期：2019-01-09 年卷（期）页码：2019,56(4):654-660

期刊名称：四川大学学报: 自然科学版

Journal Name：Journal of Sichuan University (Natural Science Edition)

关键字：恶意代码；图卷积网络；聚类； API 调用图；卷积神经网络

Key words：malware, GCN, clustering, API call graph, CNN

基金项目：国家重点研发计划基金资助项目(2017YFB0802904)

中文摘要

许多新型恶意代码往往是攻击者在已有的恶意代码基础上修改而来,因此对恶意代码的家族同源性分析有助于研究恶意代码的演化趋势和溯源.本文从恶意代码的API调用图入手,结合图卷积网络(GCN),设计了恶意代码的相似度计算和家族聚类模型. 首先,利用反汇编工具提取了恶意代码的API调用,并对API函数进行属性标注. 然后,根据API对恶意代码家族的贡献度,选取关键API函数并构建恶意代码API调用图. 使用GCN和卷积神经网络(CNN)作为恶意代码的相似度计算模型,以API调用图作为模型输入计算恶意代码之间的相似度. 最后,使用DBSCAN聚类算法对恶意代码进行家族聚类. 实验结果表明,本文提出的方法可以达到87.3%的聚类准确率,能够有效地对恶意代码进行家族聚类.

英文摘要

Many new types of malwares are often modified by attackers based on the existing malwares. Therefore, family homology analysis of malwares can help to study of evolutionary trend and traceability of malwares. In this paper, starting from API call graphs of malwares and combined with Graph Convolutional Networks (GCN), we proposed a similarity calculation and family clustering model for malwares. Firstly, we extract API call graphs of malwares with disassembly tools and the attribution of the API functions in the graphs are labeled. Then, we select key API functions by its contribution to the malware families and the API call graphs of malwares are constructed. We use GCN and Convolutional Neural Networks (CNN) as the model of the malware similarity calculation which the inputs are the API call graphs. Finally, we use DBSCAN algorithm to cluster malwares. The experimental results show that the proposed method can achieve 87.3% accuracy and can effectively cluster malware families.

【关闭】

论文摘要

基于图卷积网络的恶意代码聚类

Malware Clustering Based on Graph Convolutional Networks