基于深度主动学习的信息安全领域命名实体识别研究
Cyber security named entity recognition based on deep active learning
作者:彭嘉毅(四川大学电子信息学院);方勇(四川大学网络空间安全学院);黄诚(四川大学网络空间安全学院);刘亮(四川大学网络空间安全学院);姜政伟(中国科学院信息工程研究所, 中国科学院网络测评技术重点实验室)
Author:Peng Jia-yi(College of Electronics and Information Engineering,Sichuan University);Fang Yong(College of Cybersecurity, Sichuan University);HUANG Cheng(College of Cybersecurity, Sichuan University);Liu Liang(College of Cybersecurity, Sichuan University);Jiang Zheng-wei(Key laboratory of network assessment technology, CAS (Institute of Information Engineering, Chinese Academy of Sciences))
收稿日期:2018-11-22 年卷(期)页码:2019,56(3):457-462
期刊名称:四川大学学报: 自然科学版
Journal Name:Journal of Sichuan University (Natural Science Edition)
关键字:信息安全;命名实体识别;主动学习;神经网络;双向长短时记忆网络;条件随机场
Key words:Cyber security, Named entity recognition, Active learning, Neural network, Bi-LSTM, CRF
基金项目:中国科学院网络测评技术重点实验室开放课题基金
中文摘要
针对通用领域模型不能很好地解决信息安全领域的命名实体识别问题,提出了一种基于字符特性,双向长短时记忆网络(Bi-LSTM)与条件随机场(CRF)相结合的信息安全领域命名实体识别方法. 该方法不依赖于人工选取特征,而是通过神经网络模型对序列进行标注,再利用CRF对序列标签的相关性进行约束,提高序列标注的准确性. 此外,针对信息安全领域标注数据样本不足的问题,采用了主动学习方法,使用少量标注样本达到较好的序列标注效果.
英文摘要
To solve the problem of low accuracy in general cyber security named entity recognition (NER) model,a deep active learning method is proposed for NER in general cyber security field, which is based on character feature,Bi LSTM and conditional random field (CRF). The neural network model is for sequence labeling and CRF is for label dependency constraint,which then improves the accuracy of sequence labeling. Furthermore,as for datasets with the insufficient labeled samples in cyber security field,the proposed active learning method is able to achieve better sequence labeling effect with a small number of labeled samples.
【关闭】