期刊导航

论文摘要

一种句词五特征融合模型的复述研究

Research on Word-level Contextual Paraphrase Retrieving with Five-features

作者:何贤江(四川大学 计算机学院);何维维(四川大学 计算机学院);左航(四川大学 计算机学院)

Author:He Xianjiang(School of Computer Sci.,Sichuan Univ.);He Weiwei(School of Computer Sci.,Sichuan Univ.);Zuo Hang(School of Computer Sci.,Sichuan Univ.)

收稿日期:2012-06-27          年卷(期)页码:2012,44(6):127-132

期刊名称:工程科学与技术

Journal Name:Advanced Engineering Sciences

关键字:中文复述;五特征融合;智能识别;二元分类

Key words:Chinese paraphrase;five-feature;intelligent identification;binary classification

基金项目:四川省科技平台支撑计划资助项目(JCPT2011-7)

中文摘要

为解决中文同义词词林无法用做上下文相关的复述语料问题,提出了一种词汇级复述方法。在中文大语料库环境下,根据给定的上下文,提取复述目标词和复述候选词;建立词、句融合的分层概率统计模型,给出了计算句、词复述相似度的5项特征值,用以训练二元分类器,并对候选复述词进行筛选。实验结果证明:1)基于大语料库数据挖掘,获取候选复述词提取方法具有实用价值,每个目标词给定的上下文句子中获取3.1个正确复述词;2)利用二元分类器对复述确认是有效的,精确率达到0.65;3)提取的复述中,有32%在《中文同义词扩展词林》无法查出,有效扩展了传统同义词复述方法。

英文摘要

To solve the weakness of Chinese synonym dictionary Tongyici-Cilin’s,which can’t be used as a context-dependent paraphrase corpus, a word-level paraphrase method was presented to improved the Chinese paraphrase extraction accuracy. Based on its contextual sentence, the target word’s paraphrase candidates were identified and extracted from large-size corpuses. The target word was then paired up with each candidate, and a five-feature probability model captured the information of the target word, the context sentence, and the paraphrase candidates were established. Values of those five features were inputted to train a binary classifier which subsequently filtered out the paraphrase candidates. The experiment proved that through data mining the method for retrieving candidate paraphrases from large-size corpuses had pragmatic value, and on average 3.1 correct paraphrases were obtained for a word. Binary classifier was efficient in filtering out the paraphrases, with an accuracy rate of 0.65. 32% of the retrieved paraphrases could not be found in the Expanded Chinese Synonym Dictionary.

关闭

Copyright © 2020四川大学期刊社 版权所有.

地址:成都市一环路南一段24号

邮编:610065