期刊导航

论文摘要

最大判别特征选择算法在文本分类的优化研究

Bayesian classifier-based maximum discriminant feature selection algorithm for text categorization

作者:刘云(昆明理工大学);黄荣乘(昆明理工大学)

Author:liuyun(Kunming University of Science and Technology);huangrongcheng(Kunming University of Science and Technology)

收稿日期:2018-04-10          年卷(期)页码:2019,56(1):65-70

期刊名称:四川大学学报: 自然科学版

Journal Name:Journal of Sichuan University (Natural Science Edition)

关键字:相对熵;杰弗里斯散度;多项式朴素贝叶斯分类器;特征选择

Key words:KL divergence ;Jeffrey divergence;;Multinomial naive bayes classifier; Feature selection

基金项目:国家自然科学基金(61262040)

中文摘要

采用朴素贝叶斯分类器进行文本分类时,特征选择方法的好坏直接影响到分类器的性能.本文提出一种最大判别(MD)特征选择算法,由训练得到N个类的概率分布后,通过对样本进行测试并得到其特征向量d中每个特征词区分类别的能力,并构造出了一个新的特征向量ε用于分类,使得从中选取的部分特征词具有最大的类别区分能力.仿真结果表明,与cMFD,CSFS和CMFS三种特征选择算法相比,MD特征选择算法能在选取较少特征词情况下,获得更高的分类精度.

英文摘要

When using Naive Bayes classifier to classify texts, the feature selection method has a direct impact on the performance of the classifier.In this paper, a maximum discrimination (MD)feature selection algorithm is proposed. After N types of probability distributions are obtained through training, the ability to distinguish the categories of each feature in its feature vector d is acquiredby testing the sample, and a new feature vector ε is constructed for classification, the selected features from the feature selection have the maximum discrimination capacity for text categorization. Simulation results show that compared with cMFD, CSFS and CMFS feature selection algorithms, MD feature selection algorithm can obtain higher classification accuracy when fewer features are selected.

关闭

Copyright © 2020四川大学期刊社 版权所有.

地址:成都市一环路南一段24号

邮编:610065