基于语义分析的恶意JavaScript代码检测方法

Syntax-Based Malicious JavaScript Code Detection Method

作者：邱瑶瑶(四川大学电子信息学院)；方勇(四川大学网络空间安全学院)；黄诚(四川大学网络空间安全学院)；刘亮(四川大学网络空间安全学院)；张星(北京神州绿盟信息安全科技股份有限公司)

Author：QIU YAOYAO(College of Electronics and Information Engineering, Sichuan University)；Fang Yong(College of Cybersecurity, Sichuan University)；Huang Cheng(College of Cybersecurity, Sichuan University)；Liu Liang(College of Cybersecurity, Sichuan University)；Zhang Xing(Nsfocus Information Technology Co., Ltd.)

收稿日期：2018-12-13 年卷（期）页码：2019,56(2):273-278

期刊名称：四川大学学报: 自然科学版

Journal Name：Journal of Sichuan University (Natural Science Edition)

关键字：恶意JavaScript代码检测；抽象语法树；长短时记忆网络；深度学习

Key words：Malicious JavaScript code detection; Abstract syntax tree; Long short-term memory; Deep learning

基金项目：CCF-绿盟科技“鲲鹏”基金(2018008)

中文摘要

JavaScript是一种动态脚本语言，被用于提高网页的交互能力。然而攻击者利用它的动态性在网页中执行恶意代码，构成了巨大威胁。传统的基于静态特征的检测方式难以检测经过混淆后的恶意代码，而基于动态分析检测的方式存在效率低等问题。文章提出了一种基于语义分析的静态检测模型，通过提取抽象语法树的词法单元序列特征，使用word2vec训练词向量模型，将生成的序列向量特征输入到LSTM网络中检测恶意JavaScript脚本。实验表明，该模型能够高效检测混淆的恶意JavaScript代码并提高检测速度，模型的精确率达99.94%，召回率为98.33%。

英文摘要

JavaScript is a dynamic scripting language originally designed to improve the interactive capabilities of web pages. However, attackers use this peculiarity to execute malicious code on web pages, posing a huge threat. The traditional method based on static feature detection is difficult to detect the malicious code after confusion, and the method based on dynamic analysis has low efficiency. This paper proposes a static detection model based on semantic analysis. By extracting the lexical unit sequence features of abstract syntax tree, the word vector is modeled by word2vec, and the generated features are input into the LSTM network to detect malicious JavaScript. Experiments show that the model can effectively detect confused malicious JavaScript code and improve the detection speed, with a precision of 99.94% and recall of 98.33%.

【关闭】

论文摘要

基于语义分析的恶意JavaScript代码检测方法

Syntax-Based Malicious JavaScript Code Detection Method