- 标题
- 摘要
- 关键词
- 实验方案
- 产品
-
[IEEE TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON) - Kochi, India (2019.10.17-2019.10.20)] TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON) - Speech Enabled Visual Question Answering using LSTM and CNN with Real Time Image Capturing for assisting the Visually Impaired
摘要: The proposed work benefits visually impaired individuals in identifying objects and visualizing scenarios around them independent of any external support. In such a situation, the surrounding and ask an open-ended question, classification question, counting question or yes/no question to the application by speech input. The proposed application uses Visual Question Answering (VQA) to integrate image processing and natural language processing which is also capable of speech to text translation and vice versa that helps to identify, recognize and thus obtain details of any particular image. The work uses a classical CNN-LSTM model where image features and language features are computed separately and combined at a later stage using image features and word embedding obtained from the question and runs a multilayer perceptron on the combined features to obtain the results. The model achieves an accuracy of 57 per cent. The model can also be utilized to develop cognitive interpretation better in kids. As the application is speech enabled it is best suited for the visually impaired with an easy to use GUI.
关键词: VGG16,Visually Impaired,Keras Neural Network Library,ImageNet,gTTS,Feature extraction,Image Recognition,VQA,Word2Vec,Speech Recognition,Glove vector,CNN,Multi Layer Perceptron,LSTM
更新于2025-09-16 10:30:52