期刊导航

论文摘要

基于句子语义距离的释义识别研究

Paraphrase Identification Based on Sentence Semantic Distances

作者:黄江平(武汉大学);姬东鸿(武汉大学计算机学院)

Author:Huang Jiangping();()

收稿日期:2016-03-11          年卷(期)页码:2016,48(6):202-207

期刊名称:工程科学与技术

Journal Name:Advanced Engineering Sciences

关键字:释义识别;词向量;句子语义距离;推特

Key words:paraphrase identification; word vector; sentence semantic distances; twitter

基金项目:国家自然科学基金重点项目(61133012);国家自然科学基金面上项目(61373108);国家社会科学基金重点资助项目(11&ZD89)

中文摘要

针对释义识别任务如何学习上下文语义的问题,提出了利用词向量来表示句子语义距离的模型。首先,利用word2vec训练大规模的词向量模型,把词的语义信息利用向量分布式表示;然后通过欧氏距离来计算句子间词的移动开销;最后基于EMD模型实现了从词语义距离到句子语义距离的建模,通过采用句子变换矩阵来实现句子间语义距离的度量,进而从语义相似性方面进行句子释义识别。实验基于SemEval-2015 PIT任务,与作为实验基线的逻辑回归和加权矩阵因数分解方法进行比较,提出的模型采用有监督实验时, 值非常接近实验基线,而采用无监督方法实验时, 值提高了5.8%。

英文摘要

To learn the context semantic information of word for paraphrase identification, the model for representing sentence semantic distances based on word embeddings was proposed for paraphrase detection tasks. Firstly, a large-scale word vectors was trained with word2vec model, which embedded the semantic information in word distributional representation. Then, the travel cost between words in sentences computed with Euclidean distance in the word2vec embedding space. Finally, the model from word embeddings to sentence distances was built based on EMD, and sentence transportation matrix was presented for distance metric between sentences. The sentence semantic distances were used for paraphrase recognition. Experiments based on SemEval-2015 PIT Task showed that the proposed model approximates to the baseline in supervised method and gives an improvement of 5.8% in unsupervised methods, compared to the weighted matrix factorization.

关闭

Copyright © 2020四川大学期刊社 版权所有.

地址:成都市一环路南一段24号

邮编:610065