基于噪声分类和双自适应阈值判决的语音活动检测方法

Voice Activity Detection Method Based on the Noise Classification and Double Adaptive Threshold Decision

作者：姚睿(南京航空航天大学自动化学院, 江苏南京 211106)；曾泽清(南京航空航天大学自动化学院, 江苏南京 211106)；杜君杰(南京航空航天大学自动化学院, 江苏南京 211106)

Author：YAO Rui(College of Automation Eng., Nanjing Univ. of Aeronautics and Astronautics, Nanjing 211106, China)；ZENG Zeqing(College of Automation Eng., Nanjing Univ. of Aeronautics and Astronautics, Nanjing 211106, China)；DU Junjie(College of Automation Eng., Nanjing Univ. of Aeronautics and Astronautics, Nanjing 211106, China)

收稿日期：2017-04-18 年卷（期）页码：2018,50(4):170-178

期刊名称：工程科学与技术

Journal Name：Advanced Engineering Sciences

关键字：语音活动检测;双自适应阈值;噪声分类;特征联合

Key words：voice activity detection;double adaptive threshold;noise classification;feature conjunction

基金项目：国家自然科学基金资助项目（61402226）

中文摘要

为了解决复杂背景噪声环境中语音活动检测（voice activity detection， VAD）命中率较低的问题，提出具有环境意识的VAD算法。针对常用算法中采用单阈值抗噪性差的不足，对语音帧和噪声帧相互转换过程采用不同阈值，并对两个阈值进行自适应更新；为克服单一特征无法应对复杂环境的缺陷，提出将统计模型似然比、能量熵特征和平均谐波数量值特征等进行特征联合的方法；引入环境噪声分类的思想，利用支持向量机对噪声环境进行分类，并根据噪声类型选择最优特征组合，进一步提升算法性能。使用NOIZEUS语音库，以babble、pink、white、f16、volvo这5类噪声作为背景噪声，通过仿真实验评估了所提出算法的性能，比较了各类特征组合的命中率。实验结果证明，所提方法的识别效果优于现有算法，针对各种噪声可取得约80%的总体命中率，且能更好地平衡语音命中率和虚警率。

英文摘要

In order to solve the problem of insufficient hit rates of voice activity detection (VAD) in complex background noise environments, an environment-aware VAD algorithm is proposed. Aiming at the poor noise immunity of the single fixed threshold method used in conventional algorithms, different thresholds are adopted during the mutual conversion processes of voice and noise frames, and the thresholds are updated adaptively. And a method of feature combination is proposed to overcome the defect that a single feature cannot cope with the complex noise environments, which combines the likelihood ratio, energy entropy characteristic, and mean harmonic number value characteristic. Then, the idea of environmental noise classification is introduced, which classifies the noise environments using supported vector machine and selects optimal feature combination according to the type of noise environments, so as to improve the performance of the algorithm further. Finally, simulation experiments are conducted to evaluate the performance of the proposed algorithm, in which the NOIZEUS speech database is utilized, and five kinds of noises such as babble, pink, white, f16 and volvo are selected as background noise. And the hit rates of various feature combinations are compared to verify the effectiveness of the algorithm. Experimental results show that the proposed algorithm outperforms existing algorithms and can achieve about 80 % overall hit rate in various noise environments, and it can balance the voice hit rate and the false alarm rate as well.

【关闭】

论文摘要

基于噪声分类和双自适应阈值判决的语音活动检测方法

Voice Activity Detection Method Based on the Noise Classification and Double Adaptive Threshold Decision