研究目的
To develop a speech-enabled Visual Question Answering (VQA) application that assists visually impaired individuals in identifying objects and visualizing scenarios around them by integrating image processing and natural language processing.
研究成果
The developed VQA model, with an overall accuracy of 57.45%, effectively assists visually impaired individuals by providing answers to their questions about their surroundings via speech. The application also has potential uses in improving cognitive skills in children. Future enhancements could focus on increasing accuracy, reducing response time, and adding functionalities like proximity sensors for better environmental guidance.
研究不足
The model achieves an accuracy of 57%, indicating room for improvement in accuracy and confidence in the answers provided. The response time of the application could also be enhanced for better real-time functionality.