- 标题
- 摘要
- 关键词
- 实验方案
- 产品
-
[IEEE 2019 Conference on Lasers and Electro-Optics Europe & European Quantum Electronics Conference (CLEO/Europe-EQEC) - Munich, Germany (2019.6.23-2019.6.27)] 2019 Conference on Lasers and Electro-Optics Europe & European Quantum Electronics Conference (CLEO/Europe-EQEC) - A Multi-Copy Approach to Quantum Entanglement Characterization
摘要: Automatic speech recognition (ASR) systems are used daily by millions of people worldwide to dictate messages, control devices, initiate searches or to facilitate data input in small devices. The user experience in these scenarios depends on the quality of the speech transcriptions and on the responsiveness of the system. For multilingual users, a further obstacle to natural interaction is the monolingual character of many ASR systems, in which users are constrained to a single preset language. In this work, we present an end-to-end multi-language ASR architecture, developed and deployed at Google, that allows users to select arbitrary combinations of spoken languages. We leverage recent advances in language identification and a novel method of real-time language selection to achieve similar recognition accuracy and nearly-identical latency characteristics as a monolingual system.
关键词: Automatic speech recognition (ASR),multilingual,deep neural network (DNN),language identification (LID)
更新于2025-09-23 15:21:01
-
[IEEE 2019 IEEE International Conference on Sensors and Nanotechnology (SENSORS & NANO) - Penang, Malaysia (2019.7.24-2019.7.25)] 2019 IEEE International Conference on Sensors and Nanotechnology - Fabrication and Characterization of Back-Gate Controlled Silicon Nanowire based Field-effect pH Sensor
摘要: Acoustic metrics extracted from speech have the potential to serve as novel biomarkers for a variety of neurological and neurodevelopmental conditions, as is evidenced by the rapidly growing corpus of research articles studying the links between brain impairments and speech. In this paper, we discuss the advantages and the disadvantages of speech biomarkers and the various challenges in the design and the implementation of portable speech-based diagnostic and assessment tools. Furthermore, we provide a case study, presenting our experiences in developing an assessment tool for the detection of mild traumatic brain injuries (concussions) and discuss the challenges in obtaining and analyzing large sets of speech recordings that can be used to study the impact of brain injuries on vocal features.
关键词: vocal features,speech recognition,concussions,mild traumatic brain injuries,Speech analysis
更新于2025-09-23 15:19:57
-
A Model for an Interconnected Photovoltaic System using an Off-grid Inverter as a Reference Node in Island Mode
摘要: This paper presents a new feature extraction algorithm called power normalized Cepstral coefficients (PNCC) that is motivated by auditory processing. Major new features of PNCC processing include the use of a power-law nonlinearity that replaces the traditional log nonlinearity used in MFCC coefficients, a noise-suppression algorithm based on asymmetric filtering that suppresses background excitation, and a module that accomplishes temporal masking. We also propose the use of medium-time power analysis in which environmental parameters are estimated over a longer duration than is commonly used for speech, as well as frequency smoothing. Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing, and without degrading the recognition accuracy that is observed while training and testing using clean speech. PNCC processing also provides better recognition accuracy in noisy environments than techniques such as vector Taylor series (VTS) and the ETSI advanced front end (AFE) while requiring much less computation. We describe an implementation of PNCC using “online processing” that does not require future knowledge of the input.
关键词: modulation filtering,feature extraction,rate-level curve,on-line speech processing,Robust speech recognition,temporal masking,asymmetric filtering,spectral weight smoothing,power function,physiological modeling,medium-time power estimation
更新于2025-09-19 17:13:59
-
[IEEE 2019 IEEE Conference on Electrical Insulation and Dielectric Phenomena (CEIDP) - Richland, WA, USA (2019.10.20-2019.10.23)] 2019 IEEE Conference on Electrical Insulation and Dielectric Phenomena (CEIDP) - The Anti-Interference Method of Michelson Optical Fiber Interferometer for GIS Partial Discharge Ultrasonic Detection
摘要: Performance of automatic speech recognition (ASR) systems can significantly be improved by integrating further sources of information such as additional modalities, or acoustic channels, or acoustic models. Given the arising problem of information fusion, striking parallels to problems in digital communications are exhibited, where the discovery of the turbo codes by Berrou et al. was a groundbreaking innovation. In this paper, we show ways how to successfully apply the turbo principle to the domain of ASR and thereby provide solutions to the above-mentioned information fusion problem. The contribution of our work is fourfold: First, we review the turbo decoding forward-backward algorithm (FBA), giving detailed insights into turbo ASR, and providing a new interpretation and formulation of the so-called extrinsic information being passed between the recognizers. Second, we present a real-time capable turbo-decoding Viterbi algorithm suitable for practical information fusion and recognition tasks. Then we present simulation results for a multimodal example of information fusion. Finally, we prove the suitability of both our turbo FBA and turbo Viterbi algorithm also for a single-channel multimodel recognition task obtained by using two acoustic feature extraction methods. On a small vocabulary task (challenging, since spelling is included), our proposed turbo ASR approach outperforms even the best reference system on average over all SNR conditions and investigated noise types by a relative word error rate (WER) reduction of 22.4% (audio-visual task) and 18.2% (audio-only task), respectively.
关键词: hidden Markov models,Speech recognition,multimedia systems,robustness,iterative decoding
更新于2025-09-19 17:13:59
-
[IEEE TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON) - Kochi, India (2019.10.17-2019.10.20)] TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON) - Speech Enabled Visual Question Answering using LSTM and CNN with Real Time Image Capturing for assisting the Visually Impaired
摘要: The proposed work benefits visually impaired individuals in identifying objects and visualizing scenarios around them independent of any external support. In such a situation, the surrounding and ask an open-ended question, classification question, counting question or yes/no question to the application by speech input. The proposed application uses Visual Question Answering (VQA) to integrate image processing and natural language processing which is also capable of speech to text translation and vice versa that helps to identify, recognize and thus obtain details of any particular image. The work uses a classical CNN-LSTM model where image features and language features are computed separately and combined at a later stage using image features and word embedding obtained from the question and runs a multilayer perceptron on the combined features to obtain the results. The model achieves an accuracy of 57 per cent. The model can also be utilized to develop cognitive interpretation better in kids. As the application is speech enabled it is best suited for the visually impaired with an easy to use GUI.
关键词: VGG16,Visually Impaired,Keras Neural Network Library,ImageNet,gTTS,Feature extraction,Image Recognition,VQA,Word2Vec,Speech Recognition,Glove vector,CNN,Multi Layer Perceptron,LSTM
更新于2025-09-16 10:30:52
-
[IEEE 2019 IEEE 8th International Conference on Advanced Optoelectronics and Lasers (CAOL) - Sozopol, Bulgaria (2019.9.6-2019.9.8)] 2019 IEEE 8th International Conference on Advanced Optoelectronics and Lasers (CAOL) - Laser-controlled Interaction of Cytocrome c with Lipids May Not Disrupt Apoptotic Pathway
摘要: Automatic speech recognition (ASR) systems are used daily by millions of people worldwide to dictate messages, control devices, initiate searches or to facilitate data input in small devices. The user experience in these scenarios depends on the quality of the speech transcriptions and on the responsiveness of the system. For multilingual users, a further obstacle to natural interaction is the monolingual character of many ASR systems, in which users are constrained to a single preset language. In this work, we present an end-to-end multi-language ASR architecture, developed and deployed at Google, that allows users to select arbitrary combinations of spoken languages. We leverage recent advances in language identification and a novel method of real-time language selection to achieve similar recognition accuracy and nearly-identical latency characteristics as a monolingual system.
关键词: Automatic speech recognition (ASR),multilingual,deep neural network (DNN),language identification (LID)
更新于2025-09-16 10:30:52
-
A Fusion Firefly Algorithm with Simplified Propagation for Photovoltaic MPPT under Partial Shading Conditions
摘要: Performance of automatic speech recognition (ASR) systems can significantly be improved by integrating further sources of information such as additional modalities, or acoustic channels, or acoustic models. Given the arising problem of information fusion, striking parallels to problems in digital communications are exhibited, where the discovery of the turbo codes by Berrou et al. was a groundbreaking innovation. In this paper, we show ways how to successfully apply the turbo principle to the domain of ASR and thereby provide solutions to the above-mentioned information fusion problem. The contribution of our work is fourfold: First, we review the turbo decoding forward-backward algorithm (FBA), giving detailed insights into turbo ASR, and providing a new interpretation and formulation of the so-called extrinsic information being passed between the recognizers. Second, we present a real-time capable turbo-decoding Viterbi algorithm suitable for practical information fusion and recognition tasks. Then we present simulation results for a multimodal example of information fusion. Finally, we prove the suitability of both our turbo FBA and turbo Viterbi algorithm also for a single-channel multimodel recognition task obtained by using two acoustic feature extraction methods. On a small vocabulary task (challenging, since spelling is included), our proposed turbo ASR approach outperforms even the best reference system on average over all SNR conditions and investigated noise types by a relative word error rate (WER) reduction of 22.4% (audio-visual task) and 18.2% (audio-only task), respectively.
关键词: robustness,Speech recognition,multimedia systems,iterative decoding,hidden Markov models
更新于2025-09-16 10:30:52