Journal of Data Acquisition and Processing

1 Jan 2024, Volume 39 Issue 1

Article

MULTIMODAL FEATURE FUSION FOR IMAGE RETRIEVAL USING DEEP LEARNING

Vani Golagana 1, Prof. S. Viziananda Row2, Prof. P. Srinivasa Rao3

Journal of Data Acquisition and Processing, 2024, 39 (1): 823-839 .

Abstract

Multimodal data analysis, essential for processing information from diverse modalities like text and images, plays a crucial role in applications involving both these elements. As sentiment analysis gains popularity and multimedia data becomes ubiquitous, the integration of images and text proves beneficial across various fields such as image retrieval, image captioning, sentiment analysis, and recommender systems. In this study, we apply multiple image search methods, focusing on both visual and textual aspects. The primary objective is to analyze features in texts and images for the retrieval of relevant images. Our approach revolves around a tripartite strategy. Firstly, we use text input vectors to retrieve images from extensive databases. Secondly, we compare text input vectors with combined text-image vectors. Thirdly, we propose directly comparing fused text and image input vectors of the given query input vectors with fused vectors in the database. This multifaceted approach enables us to explore the relationships between textual and visual elements comprehensively. Our work concentrates on two tasks: individual feature extraction using encoding techniques and fusion strategies to concatenate both text and image vectors. Addressing the need to capture detailed information, we incorporate visual and semantic features into our work. Natural language processing (NLP) and convolutional neural networks (CNNs) are employed to extract features from text and image data, respectively. After feature extraction, the features from multimodalities are fused using concatenation methods in our proposed Holistic Fusion Retrieval (HFR) model. This fusion of features enhances the relevance of extracted images, providing a more com- prehensive representation of the underlying data. Our model (HFR) excels over other methods in performance, achieving an impressive average accuracy of 93% across five different contexts. This underscores its effectiveness in diverse scenarios and showcases its superiority in comparison to existing approaches.

Keyword

multimodal, feature extraction, fusion techniques

PDF Download (click here)