研究目的
To propose an improved ranking-based feature enhancement approach for robust speaker recognition by addressing the lack of robustness in automatic speaker or speech recognition systems due to the non-linear effect of noise.
研究成果
The proposed ranking-based feature enhancement method significantly improves the robustness of speaker recognition systems in noisy conditions by leveraging the autocorrelation of ranking sequences and the insensitivity of rank correlation to abnormal data. It outperforms existing methods in various noise conditions, though its effectiveness is contingent upon accurate mask estimation and could benefit from more robust threshold estimation methods and algorithms for estimating unreliable ranking features in highly non-stationary noise.
研究不足
The effectiveness of the proposed method is limited by the accuracy of the mask estimation method, which divides the central frame into reliable and unreliable. False detection can reduce the set of reliable elements or introduce noise-dominated elements. Additionally, the method's performance is less effective in highly non-stationary noise types like white and pink noise.
1:Experimental Design and Method Selection:
The study involves the design of a feature enhancement method that labels the central frame in a sliding window as reliable or unreliable based on SNR-based estimation. In the unreliable case, the ranking is estimated using linear interpolation along time based on rank correlation.
2:Sample Selection and Data Sources:
Speech signals corrupted by additive noise from the NOISEX database are used. The database includes 140 speakers with 200 utterances each, sampled at 16 KHz.
3:List of Experimental Equipment and Materials:
The study utilizes a speech processing toolbox VOICEBOX for estimating the power spectrum of noise and clean speech.
4:Experimental Procedures and Operational Workflow:
The proposed method involves mask estimation, estimation of ranking feature based on rank correlation, and mapping the estimated ranking to a warped feature using a standard normal distribution.
5:Data Analysis Methods:
The performance of the proposed method is evaluated in terms of recognition accuracy in an open-set speaker recognition system based on UBM-GMM, comparing it with other systems using MFCC, feature warping, and missing data method based on linear interpolation.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容