- 标题
- 摘要
- 关键词
- 实验方案
- 产品
-
[IEEE 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) - Nanjing, China (2018.8.27-2018.8.31)] 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) - A Speech-Driven Pupil Response System with Affective Expression Using Hemispherical Displays
摘要: We developed an expressible pupil response interface using hemispherical displays for enhancing human-robot communication. This interface looks like robot’s eyeballs and expresses vivid pupil response by speech input. In particular, the interface can express exaggerated pupil response that human cannot express. In this study, for the basic research of realizing affinitive interaction in human-robot communication, we analyzed the pupil response with affective expression during utterance using a pupil measurement device. On the basis of the analysis result, we developed a speech-driven pupil response system with affective expression using hemispherical displays. This system expresses pupil response with various affective expression by speech input. We carried out an evaluation experiment by using a sensory evaluation with the system acting as the speaker. The result demonstrated that the system with affective expression is effective for enhancing affinitive interaction.
关键词: pupil response,speech-driven system,affective expression,human-robot communication,hemispherical displays
更新于2025-09-23 15:21:21
-
[IEEE 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE) - Aswan, Egypt (2020.2.8-2020.2.9)] 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE) - Graphene AMC Array As A Ground Plane for Beam-Switching at Terahertz Band
摘要: Practically, no knowledge exists on the effects of speech coding and recognition for narrow-band transmission of speech signals within certain frequency ranges especially in relation to the recognition of paralinguistic cues in speech. We thus investigated the impact of narrow-band standard speech coders on the machine-based classification of affective vocalizations and clinical vocal recordings. In addition, we analyzed the effect of speech low-pass filtering by a set of different cut-off frequencies, either chosen as static values in the 0.5–5-kHz range or given dynamically by different upper limits from the first five speech formants (F1–F5). Speech coding and recognition were tested, first, according to short-term speaker states by using affective vocalizations as given by the Geneva Multimodal Emotion Portrayals. Second, in relation to long-term speaker traits, we tested vocal recording from clinical populations involving speech impairments as found in the Child Pathological Speech Database. We employ a large acoustic feature space derived from the Interspeech Computational Paralinguistics Challenge. Besides analysis of the sheer corruption outcome, we analyzed the potential of matched and multicondition training as opposed to miss-matched condition. In the results, first, multicondition and matched-condition training significantly increase performances as opposed to mismatched condition. Second, downgrades in classification accuracy occur, however, only at comparably severe levels of low-pass filtering. The downgrades especially appear for multi-categorical rather than for binary decisions. These can be dealt with reasonably by the alluded strategies.
关键词: Speech analysis,speech coding,emotion recognition,computational paralinguistics
更新于2025-09-23 15:21:01
-
[IEEE 2019 Conference on Lasers and Electro-Optics Europe & European Quantum Electronics Conference (CLEO/Europe-EQEC) - Munich, Germany (2019.6.23-2019.6.27)] 2019 Conference on Lasers and Electro-Optics Europe & European Quantum Electronics Conference (CLEO/Europe-EQEC) - A Multi-Copy Approach to Quantum Entanglement Characterization
摘要: Automatic speech recognition (ASR) systems are used daily by millions of people worldwide to dictate messages, control devices, initiate searches or to facilitate data input in small devices. The user experience in these scenarios depends on the quality of the speech transcriptions and on the responsiveness of the system. For multilingual users, a further obstacle to natural interaction is the monolingual character of many ASR systems, in which users are constrained to a single preset language. In this work, we present an end-to-end multi-language ASR architecture, developed and deployed at Google, that allows users to select arbitrary combinations of spoken languages. We leverage recent advances in language identification and a novel method of real-time language selection to achieve similar recognition accuracy and nearly-identical latency characteristics as a monolingual system.
关键词: Automatic speech recognition (ASR),multilingual,deep neural network (DNN),language identification (LID)
更新于2025-09-23 15:21:01
-
[IEEE 2019 Compound Semiconductor Week (CSW) - Nara, Japan (2019.5.19-2019.5.23)] 2019 Compound Semiconductor Week (CSW) - Electric-field control of optical-spin injection from an InGaAs quantum well to p-doped quantum dots
摘要: Features for speech emotion recognition are usually dominated by the spectral magnitude information while they ignore the use of the phase spectrum because of the difficulty of properly interpreting it. Motivated by recent successes of phase-based features for speech processing, this paper investigates the effectiveness of phase information for whispered speech emotion recognition. We select two types of phase-based features (i. e., modified group delay features and all-pole group delay features), both which have shown wide applicability to all sorts of different speech analysis and are now studied in whispered speech emotion recognition. When exploiting these features, we propose a new speech emotion recognition framework, employing outer product in combination with power and L2 normalization. The according technique encodes any variable length sequence of the phase-based features into a fixed dimension vector regardless of the length of the input sequence. The resulting representation is fed to train a classification model with a linear kernel classifier. Experimental results on the Geneva Whispered Emotion Corpus database, including normal and whispered phonation, demonstrate the effectiveness of the proposed method when compared with other modern systems. It is also shown that, combining phase information with magnitude information could significantly improve performance over the common systems solely adopting magnitude information.
关键词: whispered speech emotion recognition,Phase-based features,outer product
更新于2025-09-23 15:21:01
-
[ACM Press the 2018 ACM Symposium - Warsaw, Poland (2018.06.14-2018.06.17)] Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications - ETRA '18 - Hands-free web browsing
摘要: Hands-free browsers provide an effective tool for Web interaction and accessibility, overcoming the need for conventional input sources. Current approaches to hands-free interaction are primarily categorized in either voice or gaze-based modality. In this work, we investigate how these two modalities could be integrated to provide a better hands-free experience for end-users. We demonstrate a multimodal browsing approach combining eye gaze and voice inputs for optimized interaction, and to suffice user preferences with unimodal benefits. The initial assessment with five participants indicates improved performance for the multimodal prototype in comparison to single modalities for hands-free Web browsing.
关键词: voice input,Web accessibility,speech commands,multimodal interfaces,eye tracking,Hands-free interaction
更新于2025-09-23 15:21:01
-
[IEEE 2019 IEEE International Conference on Sensors and Nanotechnology (SENSORS & NANO) - Penang, Malaysia (2019.7.24-2019.7.25)] 2019 IEEE International Conference on Sensors and Nanotechnology - Fabrication and Characterization of Back-Gate Controlled Silicon Nanowire based Field-effect pH Sensor
摘要: Acoustic metrics extracted from speech have the potential to serve as novel biomarkers for a variety of neurological and neurodevelopmental conditions, as is evidenced by the rapidly growing corpus of research articles studying the links between brain impairments and speech. In this paper, we discuss the advantages and the disadvantages of speech biomarkers and the various challenges in the design and the implementation of portable speech-based diagnostic and assessment tools. Furthermore, we provide a case study, presenting our experiences in developing an assessment tool for the detection of mild traumatic brain injuries (concussions) and discuss the challenges in obtaining and analyzing large sets of speech recordings that can be used to study the impact of brain injuries on vocal features.
关键词: vocal features,speech recognition,concussions,mild traumatic brain injuries,Speech analysis
更新于2025-09-23 15:19:57
-
[IEEE 2018 International Conference Laser Optics (ICLO) - St. Petersburg (2018.6.4-2018.6.8)] 2018 International Conference Laser Optics (ICLO) - Optical Repetition Rate Locking of Ultrafast Yb Doped All Fiber Oscillator for High Intensity OPCPA Systems
摘要: Practically, no knowledge exists on the effects of speech coding and recognition for narrow-band transmission of speech signals within certain frequency ranges especially in relation to the recognition of paralinguistic cues in speech. We thus investigated the impact of narrow-band standard speech coders on the machine-based classification of affective vocalizations and clinical vocal recordings. In addition, we analyzed the effect of speech low-pass filtering by a set of different cut-off frequencies, either chosen as static values in the 0.5–5-kHz range or given dynamically by different upper limits from the first five speech formants (F1–F5). Speech coding and recognition were tested, first, according to short-term speaker states by using affective vocalizations as given by the Geneva Multimodal Emotion Portrayals. Second, in relation to long-term speaker traits, we tested vocal recording from clinical populations involving speech impairments as found in the Child Pathological Speech Database. We employ a large acoustic feature space derived from the Interspeech Computational Paralinguistics Challenge. Besides analysis of the sheer corruption outcome, we analyzed the potential of matched and multicondition training as opposed to miss-matched condition. In the results, first, multicondition and matched-condition training significantly increase performances as opposed to mismatched condition. Second, downgrades in classification accuracy occur, however, only at comparably severe levels of low-pass filtering. The downgrades especially appear for multi-categorical rather than for binary decisions. These can be dealt with reasonably by the alluded strategies.
关键词: Speech analysis,speech coding,computational paralinguistics,emotion recognition
更新于2025-09-23 15:19:57
-
Visualizing Speech-Generated Oral Fluid Droplets with Laser Light Scattering
摘要: Aerosols and droplets generated during speech have been implicated in the person-to-person transmission of viruses, and there is current interest in understanding the mechanisms responsible for the spread of Covid-19 by these means. The act of speaking generates oral fluid droplets that vary widely in size, and these droplets can harbor infectious virus particles. Whereas large droplets fall quickly to the ground, small droplets can dehydrate and linger as “droplet nuclei” in the air, where they behave like an aerosol and thereby expand the spatial extent of emitted infectious particles. We report the results of a laser light-scattering experiment in which speech-generated droplets and their trajectories were visualized.
关键词: Covid-19,speech,virus transmission,laser light-scattering,droplets,aerosols
更新于2025-09-23 15:19:57
-
A Model for an Interconnected Photovoltaic System using an Off-grid Inverter as a Reference Node in Island Mode
摘要: This paper presents a new feature extraction algorithm called power normalized Cepstral coefficients (PNCC) that is motivated by auditory processing. Major new features of PNCC processing include the use of a power-law nonlinearity that replaces the traditional log nonlinearity used in MFCC coefficients, a noise-suppression algorithm based on asymmetric filtering that suppresses background excitation, and a module that accomplishes temporal masking. We also propose the use of medium-time power analysis in which environmental parameters are estimated over a longer duration than is commonly used for speech, as well as frequency smoothing. Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing, and without degrading the recognition accuracy that is observed while training and testing using clean speech. PNCC processing also provides better recognition accuracy in noisy environments than techniques such as vector Taylor series (VTS) and the ETSI advanced front end (AFE) while requiring much less computation. We describe an implementation of PNCC using “online processing” that does not require future knowledge of the input.
关键词: modulation filtering,feature extraction,rate-level curve,on-line speech processing,Robust speech recognition,temporal masking,asymmetric filtering,spectral weight smoothing,power function,physiological modeling,medium-time power estimation
更新于2025-09-19 17:13:59
-
Postmetallization a??Passivated Edge Technologya?? for Separated Silicon Solar Cells
摘要: Features for speech emotion recognition are usually dominated by the spectral magnitude information while they ignore the use of the phase spectrum because of the difficulty of properly interpreting it. Motivated by recent successes of phase-based features for speech processing, this paper investigates the effectiveness of phase information for whispered speech emotion recognition. We select two types of phase-based features (i. e., modified group delay features and all-pole group delay features), both which have shown wide applicability to all sorts of different speech analysis and are now studied in whispered speech emotion recognition. When exploiting these features, we propose a new speech emotion recognition framework, employing outer product in combination with power and L2 normalization. The according technique encodes any variable length sequence of the phase-based features into a fixed dimension vector regardless of the length of the input sequence. The resulting representation is fed to train a classification model with a linear kernel classifier. Experimental results on the Geneva Whispered Emotion Corpus database, including normal and whispered phonation, demonstrate the effectiveness of the proposed method when compared with other modern systems. It is also shown that, combining phase information with magnitude information could significantly improve performance over the common systems solely adopting magnitude information.
关键词: whispered speech emotion recognition,Phase-based features,outer product
更新于2025-09-19 17:13:59