期刊导航

论文摘要

多特征全卷积网络的地空通话语音增强方法

A method of multi featured full convolutionalneural network based on speech enhancement in air ground voice communication

作者:高登峰(四川大学计算机学院;四川大学国家空管自动化系统技术重点实验室);杨波(四川大学 国家空管自动化系统技术重点实验室);杨红雨(四川大学 国家空管自动化系统技术重点实验室);刘洪(四川大学 国家空管自动化系统技术重点实验室)

Author:Gao dengfeng(College of Computer Science, Sichuan University.;National Key Laboratory of Air Traffic Control Automation System Technology);YangBo(National Key Laboratory of Air Traffic Control Automation System Technology.);YangHongyu(National Key Laboratory of Air Traffic Control Automation System Technology.);LiuHong(National Key Laboratory of Air Traffic Control Automation System Technology.)

收稿日期:2019-03-28          年卷(期)页码:2020,57(2):289-296

期刊名称:四川大学学报: 自然科学版

Journal Name:Journal of Sichuan University (Natural Science Edition)

关键字:语音增强;语音分离;全卷积神经网络;地空通话;多特征联合学习

Key words:Speech Enhancement; Speech Separation; Full Convolutional Neural Network; Air-Ground Communication; Multi-featured Joint Learning

基金项目:国家自然科学基金委和民航局联合基金(U1833115)

中文摘要

为了研究空中交通管理(下面简称空管)领域中的语音增强问题并且节约存储资源,提出了一个新的语音增强方法。在基于全卷积神经网络(Fully Convolutional Networks, FCN)的基础上加入了跳跃连接(Skip Connection),并引入次要特征来进行联合学习。具体而言,使用语音的对数功率谱(Logarithmic Power Spectrum, LPS)作为网络的主要训练特征,引入对数梅尔倒谱系数(Logarithmic Mel-Frequency Cepstrum, L-MFCC)作为网络的次要训练特征,来联合优化网络参数。实验证明,相较于单个LPS特征输入的架构,结合LPS和L-MFCC的多特征网络架构具有更好的语音增强性能表现,且作为次要特征的L-MFCC还可以用作其它用途。实验还证明,跳跃连接的加入可以很好的提高FCN的网络性能,且相较作为基线的深度神经网络(Deep Neural Networks, DNN)模型,新的网络结构在相同参数数量的情况下,要具有更好的性能。

英文摘要

In order to study speech enhancement in the air traffic control (ATC) and save storage resources, a new speech enhancement method is proposed. Based on Fully Convolutional Networks (FCN), Skip connection is added and secondary features are introduced for joint learning. Specifically, the log power spectra(LPS) of speech is used as the main training feature, and the logarithmic Mel Frequency Cepstrum (L MFCC) is introduced as the secondary training feature to jointly optimizeparameters of FCN. Experiments have shown that the network architecture combining LPS and L MFCC has better speech enhancement performances than that with single LPS feature, and the L MFCC as a secondary feature can also be used for other purposes. Experiments also show that the addition of skip connections can improve the FCN network performances, and the new network structure has better performances with the same number of parametersthan the baseline deep neural network (DNN) method.

关闭

Copyright © 2020四川大学期刊社 版权所有.

地址:成都市一环路南一段24号

邮编:610065