期刊导航

论文摘要

基于等差隐私预算分配的大数据决策树算法

Big Data Decision Tree Algorithm Based on Equal-arrival Privacy Budget Allocation

作者:尚涛(北京航空航天大学 网络空间安全学院, 北京 100083);赵铮(北京航空航天大学 电子信息工程学院, 北京 100083);舒王伟(北京航空航天大学 电子信息工程学院, 北京 100083);刘建伟(北京航空航天大学 网络空间安全学院, 北京 100083)

Author:SHANG Tao(School of Cyber Sci. and Technol., Beihang Univ., Beijng 100083, China);ZHAO Zheng(School of Electronic and Info. Eng., Beihang Univ., Beijing 100083, China);SHU Wangwei(School of Electronic and Info. Eng., Beihang Univ., Beijing 100083, China);LIU Jianwei(School of Cyber Sci. and Technol., Beihang Univ., Beijng 100083, China)

收稿日期:2018-09-25          年卷(期)页码:2019,51(2):130-136

期刊名称:工程科学与技术

Journal Name:Advanced Engineering Sciences

关键字:分类;决策树;差分隐私;大数据

Key words:classification;decision tree;differential privacy;big data

基金项目:国家重点研发计划资助项目(2016YFC1000307)

中文摘要

针对传统差分隐私保护方案以剩余隐私预算的一半逐层分配,即等比分配隐私预算,被应用于决策树时,随着决策树高度的增加,分配至顶层的隐私预算过小,随机噪声过大,分类准确率受到影响的问题,作者提出以差分隐私保护结合主流决策树C4.5分类方法为基本思路,依据决策树高度等差分配隐私预算的方案。差分隐私中的Laplace机制和指数机制确保决策树分类的安全性。作者利用大数据Hadoop平台的MapReduce框架,主程序进行MapReduce参数配置以及外层循环。在执行到每一个节点时,主程序将数据集属性的统计任务交给Mapper类,Reducer类接收Mapper类的统计结果并利用Laplace机制添加随机噪声,加噪结果返回主程序中作为计算信息增益率的参数。主程序利用指数机制选择最佳细分方案,递归过程直至样本数为0时停止。实验采用UCI数据库的car数据集进行测试,在不同隐私预算下将等比分配与等差分配两种方案得到的分类结果准确率进行对比。实验结果表明:本文算法在可接受的分类准确率降低的情况下满足差分隐私保护;与传统隐私预算分配相比,本文算法在相同隐私预算下提高了分类准确率;对于car数据集,本文算法在隐私预算为0.7或0.8时可较好兼顾数据集的安全性和有效性。因此,在一定程度上依据决策树高度等差分配隐私预算的方案可改善分类准确率,可实际应用于决策树分类算法。

英文摘要

In order to address the problem that the traditional differential privacy preservation scheme is distributed layer by layer by half of the remaining privacy budget, i.e., equal ratio allocation of privacy budget, and when it is applied to the decision tree, the privacy budget allocated to the top layer is too small, the random noise is too large, and the classification accuracy is affected with the increase of the height of decision tree, a scheme of equal-arrival privacy budget allocation based on decision tree height difference was proposed, which combined differential privacy protection preservation with mainstream decision tree C4.5 classification algorithm. The Laplace mechanism and the exponential mechanism can ensure the security of the decision tree. This scheme utilized the MapReduce framework of the big data Hadoop platform, and the main program performed parameter configuration of MapReduce and outer loop. When executed to each node, the main program passed the statistical task of the dataset attribute to the Mapper class. The Reducer class received the statistical result of the Mapper class. The Laplace mechanism was used to add random noise. The noise-added result was returned to the main program for calculation the information gain rate. The main program used the exponential mechanism to select the best subdivision scheme. The recursion process stopped until the number of samples was 0. The experiment used the car data set of UCI database to test, and compared classification results of two schemes under different privacy budgets. Experiment showed that this scheme can satisfy differential privacy with acceptable classification accuracy reduction, and improve the classification accuracy under the same privacy budget compared to the traditional privacy budget allocation. For the car data set, the algorithm can balance the security and effectiveness of the data set when the privacy budget was 0.7 or 0.8. Therefore, the scheme of equal-arrival privacy budget allocation based on decision tree height difference can improve classification accuracy to a certain extent. It can be practically applied to decision tree classification.

关闭

Copyright © 2020四川大学期刊社 版权所有.

地址:成都市一环路南一段24号

邮编:610065