一种Spark作业配置参数智能优化方法
A Smart Optimization Method for Spark Jobs’ Configuration Parameters
作者:阮树骅(四川大学 网络空间安全学院,四川 成都 610065;四川大学 网络空间安全研究院,四川 成都 610065);潘梵梵(四川大学 网络空间安全学院,四川 成都 610065);陈兴蜀(四川大学 网络空间安全学院,四川 成都 610065;四川大学 网络空间安全研究院,四川 成都 610065);罗永刚(四川大学 网络空间安全研究院,四川 成都 610065);吴天雄(四川大学 计算机学院,四川 成都 610065)
Author:RUAN Shuhua(College of Cybersecurity, Sichuan Univ., Chengdu 610065, China;Cyber Security Research Inst., Sichuan Univ., Chengdu 610065, China);PAN Fanfan(College of Cybersecurity, Sichuan Univ., Chengdu 610065, China);CHEN Xingshu(College of Cybersecurity, Sichuan Univ., Chengdu 610065, China;Cyber Security Research Inst., Sichuan Univ., Chengdu 610065, China);LUO Yonggang(Cyber Security Research Inst., Sichuan Univ., Chengdu 610065, China);WU Tianxiong(College of Computer, Sichuan Univ., Chengdu 610065, China)
收稿日期:2019-03-23 年卷(期)页码:2020,52(1):191-197
期刊名称:工程科学与技术
Journal Name:Advanced Engineering Sciences
关键字:Spark;配置参数;性能预测;智能优化
Key words:Spark;configuration parameters;performance prediction;smart optimization
基金项目:国家自然科学基金青年科学基金项目(61802270);中央高校基本科研业务费基础研究项目(SCU2018D018)
中文摘要
Spark的配置参数对作业运行性能有较大影响,针对配置参数种类多、参数搜索空间大、参数间相互影响导致人工配置参数调优效率低下的问题,提出了一种Spark作业配置参数智能优化方法。首先,在Spark众多配置参数中选择对作业运行性能影响较大的关键配置参数,建立典型Spark作业的运行数据集,利用支持向量回归算法,构建作业性能预测模型,通过改变数据集的规模,对比分析了模型预测值和作业的真实运行时间,模型评估指标证明了作业性能预测模型的有效性和准确性。其次,基于作业性能预测模型,设计并实现了基于爬山算法、模拟退火算法、递归随机搜索算法以及粒子群算法的配置参数优化算法,并对4种算法的求解质量进行对比分析,实验表明递归随机搜索算法在3种不同类型的作业上收敛结果较优且标准差较小,证明该算法对不同类型作业的适应性较强、稳定性较好。将本文的智能优化配置与传统经验优化配置相比,实验结果表明,智能优化配置为典型Spark作业分别带来了4%、15%、22%的平均性能提升,证明智能优化配置能够高效地获取到具备较好作业适应性的配置,提升作业运行性能。
英文摘要
In order to solve the problem of low efficiency for manual configuration parameter tuning of Spark jobs caused by the large number of parameters, large parameter search space and mutual influence among parameters, a smart configuration parameter tuning method was proposed. Firstly, a set of core configuration parameters of Spark that had great effort to jobs’ performance were selected, then the performance prediction model was built based on the selected configuration parameters and the corresponding jobs’ performance information using support vector regression algorithm. By changing the size of the data set, the predicted value of the model was compared with the real running time of the job, and the evaluation index proved the effectiveness and accuracy of the jobs’ performance prediction model. Secondly, several optimization algorithms of configuration parameters based on hill climbing, simulated annealing, recursive random search and particle swarm were designed and implemented. The solution quality of these algorithms was analyzed by experiments. The experiment results showed unique advantage of recursive random search against other algorithms on both convergence result and convergence variance, which indicate recursive random search had both good stability and adaptability to different types of jobs. Compared with experienced optimization configuration, the experiment results showed that the smart optimization configuration can bring 4%, 15% and 22% average performance improvement to typical Spark jobs, which demonstrate that smart optimization configuration can obtain more suitable parameters for given jobs efficiently and improve running performance of jobs.
上一条:聚氨酯-橡胶复合阻尼材料减振优化设计
【关闭】