Bimonthly    Since 1986
ISSN 1004-9037
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Distributed by:
China: All Local Post Offices
      1 Jan 2024, Volume 39 Issue 1   

    Vani Golagana 1, Prof. S. Viziananda Row2, Prof. P. Srinivasa Rao3
    Journal of Data Acquisition and Processing, 2024, 39 (1): 823-839 . 


    Multimodal data analysis, essential for processing information from diverse modalities like text and images, plays a crucial role in applications involving both these elements. As sentiment analysis gains popularity and multimedia data becomes ubiquitous, the integration of images and text proves beneficial across various fields such as image retrieval, image captioning, sentiment analysis, and recommender systems. In this study, we apply multiple image search methods, focusing on both visual and textual aspects. The primary objective is to analyze features in texts and images for the retrieval of relevant images. Our approach revolves around a tripartite strategy. Firstly, we use text input vectors to retrieve images from extensive databases. Secondly, we compare text input vectors with combined text-image vectors. Thirdly, we propose directly comparing fused text and image input vectors of the given query input vectors with fused vectors in the database. This multifaceted approach enables us to explore the relationships between textual and visual elements comprehensively. Our work concentrates on two tasks: individual feature extraction using encoding techniques and fusion strategies to concatenate both text and image vectors. Addressing the need to capture detailed information, we incorporate visual and semantic features into our work. Natural language processing (NLP) and convolutional neural networks (CNNs) are employed to extract features from text and image data, respectively. After feature extraction, the features from multimodalities are fused using concatenation methods in our proposed Holistic Fusion Retrieval (HFR) model. This fusion of features enhances the relevance of extracted images, providing a more com- prehensive representation of the underlying data. Our model (HFR) excels over other methods in performance, achieving an impressive average accuracy of 93% across five different contexts. This underscores its effectiveness in diverse scenarios and showcases its superiority in comparison to existing approaches.


    multimodal, feature extraction, fusion techniques

    PDF Download (click here)

SCImago Journal & Country Rank

ISSN 1004-9037


Editorial Board
Author Guidelines
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved