研究目的
To present an end-to-end multi-language ASR architecture that allows users to select arbitrary combinations of spoken languages, achieving similar recognition accuracy and nearly-identical latency characteristics as a monolingual system.
研究成果
The proposed multilingual ASR architecture supports multiple languages with minimal impact on accuracy and latency compared to monolingual systems, leveraging DNN-based LID and ASR confidences for real-time language selection.
研究不足
The system's performance degrades as the number of candidate languages increases, and it is computationally expensive to run many monolingual speech recognizers in parallel.
1:Experimental Design and Method Selection:
The architecture combines a DNN-based LID classifier with transcription confidences from individual speech recognizers.
2:Sample Selection and Data Sources:
Utilized the Google 5M LID corpus and Google Multilang Corpus for training and testing.
3:List of Experimental Equipment and Materials:
DNNs for LID and ASR, with specific architectures and training procedures.
4:Experimental Procedures and Operational Workflow:
Real-time language selection based on combined scores from LID and ASR confidences, with various timeout strategies.
5:Data Analysis Methods:
Evaluation metrics include Language Id Accuracy, Word Error Rate (WER), and Real-time Factor (RTF).
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容