修车大队一品楼qm论坛51一品茶楼论坛,栖凤楼品茶全国楼凤app软件 ,栖凤阁全国论坛入口,广州百花丛bhc论坛杭州百花坊妃子阁

oe1(光电查) - 科学论文

1 条数据
?? 中文(中国)
  • [IEEE TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON) - Kochi, India (2019.10.17-2019.10.20)] TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON) - Speech Enabled Visual Question Answering using LSTM and CNN with Real Time Image Capturing for assisting the Visually Impaired

    摘要: The proposed work benefits visually impaired individuals in identifying objects and visualizing scenarios around them independent of any external support. In such a situation, the surrounding and ask an open-ended question, classification question, counting question or yes/no question to the application by speech input. The proposed application uses Visual Question Answering (VQA) to integrate image processing and natural language processing which is also capable of speech to text translation and vice versa that helps to identify, recognize and thus obtain details of any particular image. The work uses a classical CNN-LSTM model where image features and language features are computed separately and combined at a later stage using image features and word embedding obtained from the question and runs a multilayer perceptron on the combined features to obtain the results. The model achieves an accuracy of 57 per cent. The model can also be utilized to develop cognitive interpretation better in kids. As the application is speech enabled it is best suited for the visually impaired with an easy to use GUI.

    关键词: VGG16,Visually Impaired,Keras Neural Network Library,ImageNet,gTTS,Feature extraction,Image Recognition,VQA,Word2Vec,Speech Recognition,Glove vector,CNN,Multi Layer Perceptron,LSTM

    更新于2025-09-16 10:30:52