研究目的
Investigating the application of the turbo principle to automatic speech recognition (ASR) to improve performance through information fusion.
研究成果
The turbo ASR approach significantly improves recognition performance over conventional methods, demonstrating the effectiveness of iterative information fusion in both audio-visual and audio-only tasks. The proposed turbo FBA and Viterbi algorithms outperform reference systems, achieving relative WER reductions of 22.4% and 18.2%, respectively.
研究不足
The study acknowledges the computational complexity and the necessity of the state transition matrix T for practical LVCSR applications, suggesting further investigation into its role and structure.
1:Experimental Design and Method Selection:
The study employs the turbo principle, originally from digital communications, to ASR, focusing on the forward-backward algorithm (FBA) and Viterbi algorithm for iterative decoding and information fusion.
2:Sample Selection and Data Sources:
The experiments are based on the GRID audio-visual speech corpus and AURORA-2 database for noise interference, involving 34 native English speakers.
3:List of Experimental Equipment and Materials:
The study uses audio and video recordings, with feature extraction techniques including MFCC and Gabor features.
4:Experimental Procedures and Operational Workflow:
The methodology involves iterative decoding between two component recognizers (CRs), each processing different feature sequences, with extrinsic information exchanged to improve recognition accuracy.
5:Data Analysis Methods:
Performance is evaluated using word recognition accuracy, comparing turbo ASR approaches to baseline and reference methods across different SNR conditions and noise types.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容