面向目标的带先验概率的AdaBoost算法
A Goal-oriented AdaBoost Algorithm with Prior Probabilities
作者:赵向辉(1.中国科学院 成都计算机应用研究所,四川 成都 610041;2.中国科学院 研究生院,北京 100049);姚宇(中国科学院 成都计算机应用研究所;中国科学院 研究生院);付忠良(中国科学院 成都计算机应用研究所;中国科学院 研究生院);苗青(中国科学院 成都计算机应用研究所;中国科学院 研究生院);谢会云(中国科学院 成都计算机应用研究所;中国科学院 研究生院)
Author:Zhao Xianghui(1.Chengdu Computer Applications Inst., Chinese Academy of Sciences,Chengdu 610041, China;2.Graduate Univ. of Chinese Academy of Sciences,Beijing 100049, China);Yao Yu(Chengdu Computer Applications Inst., Chinese Academy of Sciences;Graduate Univ. of Chinese Academy of Sciences);Fu Zhongliang(Chengdu Computer Applications Inst., Chinese Academy of Sciences;Graduate Univ. of Chinese Academy of Sciences);Miao Qing(Chengdu Computer Applications Inst., Chinese Academy of Sciences;Graduate Univ. of Chinese Academy of Sciences);Xie Huiyun(Chengdu Computer Applications Inst., Chinese Academy of Sciences;Graduate Univ. of Chinese Academy of Sciences)
收稿日期:2009-06-18 年卷(期)页码:2010,42(2):139-144
期刊名称:工程科学与技术
Journal Name:Advanced Engineering Sciences
关键字:集成学习;AdaBoost算法;分类器组合;先验概率
Key words:ensemble learning; AdaBoost algorithm; combination of classifiers; prior probability
基金项目:国家重点基础研究发展规划(973计划)项目(2004CB318003);四川省科技支撑计划资助项目(2008SZ0100;2009SZ0214);中国科学院西部之光人才培养项目资助
中文摘要
针对集成学习算法研究中多个分类器的最佳组合问题,本文改进了传统的AdaBoost集成学习算法。用于组合的各个分类器通常是基于样本集通过一定的训练得到,样本集中不同类目标的比率可以反映分类目标的先验概率。本文使用该参数给出了新的组合参数和投票表决阈值计算公式,巧妙的利用样本权值并将其加入到样本属性上进行训练学习,采用新的策略来选择基分类器,给出了面向目标的带先验概率的AdaBoost算法(GWPP AdaBoost算法)和分类器的最佳组合。依据UCI实验数据对传统的AdaBoost 算法、Bagging 算法、GWPP AdaBoost算法的错误率和性能进行了比较分析,验证了GWPP AdaBoost的有效性。
英文摘要
Aiming at the best combination of multiple classifiers in ensemble learning algorithm, this paper improves the traditional AdaBoost algorithm. The combined classifiers are obtained by training the sample set. Using the sample to centralize the ratio of different kinds of targets can reflect prior probability of various classifiers. Through utilizing this parameter, this paper has given new combination parameter and the computation formula of vote threshold. It uses the sample weight skillfully and adds it to sample attributes for training and learning. It adopts new strategies to select base classifiers and gives prior probabilities AdaBoost algorithm of goal-oriented (GWPP AdaBoost algorithm) and the best combination of multiple classifiers. We have made comparative analysis of the error rate and the performance of the ordinary AdaBoost algorithm,Bagging algorithm and the GWPP AdaBoost algorithm based on the UCI datasets. Experiments show the validity of the GWPP AdaBoost algorithms.
【关闭】