基于灰色关联分析的类中心缺失值填补方法

Imputing missing value by class center based on grey relational analysis

作者：刘莎(西安电子科技大学数学与统计学院，西安 710126)；杨有龙(西安电子科技大学数学与统计学院，西安 710126)

Author：lIU Sha(School of Mathematics and Statistics, Xidian University, Xi’an 710126, China)；YANG You-Long(School of Mathematics and Statistics, Xidian University, Xi’an 710126, China)

收稿日期：2019-06-04 年卷（期）页码：2020,57(5):871-878

期刊名称：四川大学学报: 自然科学版

Journal Name：Journal of Sichuan University (Natural Science Edition)

关键字：数据分析；不完整数据；缺失值填补；类中心；灰色关联分析

Key words：Data mining; Incomplete data; Missing value imputation; Class center; Grey relational analysis

基金项目：国家自然科学基金(61573266)

中文摘要

真实数据集中含有缺失值，许多数据分析技术不能直接应用到不完整数据上，且缺失值的存在会明显地降低算法的有效性，缺失数据处理是一个不可缺少的数据预处理过程，因此提出了一个基于统计度量的缺失值填补算法，名为灰色类中心缺失值填补(GCCMVI)方法，利用数据点的类中心和标准差来填补缺失值，此外，通过比较阈值和实例与类中心间相关性的大小关系，决定是否加上(减去)标准差，灰色关联分析用来计算相关性，在缺失值被填补后，得到的完整的数据集用来训练支持向量机(SVM)分类器.在三种类型不同的数据集上进行比较，以分类精度，填补效果，填补时间作为评估准则来衡量算法的有效性.实验结果表明，所提出的算法显著地提高了分类精度和填补效果.

英文摘要

Many data mining techniques cannot be applied directly to incomplete dataset which contains missing values. Furthermore, missing values will significantly reduce the effectiveness of the algorithm. So missing data management is an indispensable data preprocessing process. The proposed imputation method is based on statistical measurements named as grey class center missing value imputation (GCCMVI) approach. The missing values are imputed based on class center and standard deviation. Besides, the standard deviation is added (subtracted) or not determined by comparing the threshold and the relevance between class center and instance. Grey relational analysis is used to compute relevance. After the missing values are filled, the complete dataset is used to train the support vector machine (SVM) classifier. The comparative experiments are carried out on three datasets in different types. The classification accuracy, imputation performance and imputation time are used as criteria to evaluate the effectiveness of the proposed algorithm, experimental results show that it significantly improves the classification accuracy and imputation performance.

下一条：基于主动弹性防御策略的网络控制系统零动态攻击检测

【关闭】

论文摘要

基于灰色关联分析的类中心缺失值填补方法

Imputing missing value by class center based on grey relational analysis