期刊导航

论文摘要

中文微博命名体识别

Named entity recognition in Chinese micro blog

作者:韩春燕(四川民族学院计算机科学系);刘玉娇(四川大学计算机学院);琚生根(四川大学计算机学院);李若晨(四川大学计算机学院);苏翀(四川大学计算机学院)

Author:HAN Chun-Yan(College of Computer Science,Sichuan University for Nationalities,);LIU Yu-Jiao(College of Computer, Sichuan University);JU Sheng-Gen(College of Computer, Sichuan University);LI Ruo-Chen(College of Computer, Sichuan University);SU Chong(College of Computer, Sichuan University)

收稿日期:2015-01-04          年卷(期)页码:2015,52(3):511-516

期刊名称:四川大学学报: 自然科学版

Journal Name:Journal of Sichuan University (Natural Science Edition)

关键字:微博; 条件随机场; 命名实体; 三级粒度特征; 短文本

Key words:Micro blog; Conditional random fields; Named entity; Three level features; Short text

基金项目:国家自然科技基金项目(61332066, 81373239)

中文摘要

近年来微博的快速发展为命名体识别提供了新的载体, 同时微博的特点也为命名体识别研究带来了挑战. 针对微博特点, 本文提出了基于拼音相似距离以及文本相似距离聚类算法对微博文本进行规范化, 消除了微博的语言表达不规范造成的干扰. 同时, 本文还提出了篇章级、句子级以及词汇级三级粒度的特征提取, 使用条件随机场模型进行训练数据, 并识别命名体, 采用由微博文本相似聚类获得的实体关系类对命名体类型进行修正. 由于缺少大量的微博训练数据, 本文采用半监督学习框架训练模型. 通过对新浪微博数据的实验结果表明, 本方法能够有效地提高微博中命名体识别的效果

英文摘要

In recent years, the rapid development of micro blog provides named entity recognition(NER) with a new carrier. While the characteristics of the micro blogs also brings challenges for NER research. Considering the characteristics of micro blogs, this paper proposed a mehtod, which was based on an pinyin similar distance and text similar distance, to normalize the micro blogging text, eliminating the interference caused by non standardized expression. Meanwhile, the paper also proposed three level features extraction and applied the conditional random field model to train and identify the named entities. Besides, a simple method was employed to fix the named entity recognition results, which was obtained from clustering the similar micro blogs text. Lacking of training data, this paper built a semi supervised learning framework to train the model. The results of experiment on Sina micro blogs data showed that this approach could improve the named entity recognition effectively.

关闭

Copyright © 2020四川大学期刊社 版权所有.

地址:成都市一环路南一段24号

邮编:610065