Journal of Data Acquisition and Processing

30 Dec 2022, Volume 37 Issue 5

Article

AN ANALYSIS OF WORD EMBEDDING MODELS WIDE-RANGING OF SOTA TRANSFORMERS

T. Priyanka, A. Mary Sowjanya

Journal of Data Acquisition and Processing, 2022, 37 (5): 1763-1780 .

Abstract

Research on word representation has always been an important area of interest in the antiquity of Natural Language Processing (NLP). Interpreting such intricate linguistic data is essential, since it carries a wealth of information and is useful for many applications. In the context of NLP, Deep Learning manifests as word embeddings, which are extensively used to represent words of a document as multi-dimensional numeric vectors in place of traditional word representations. In deep learning models, word embeddings are crucial part of providing input features for downstream tasks, such as sequence labeling, text classification etc., large amounts of text can be converted into effective vector representations that capture the same semantic information using these approaches. Furthermore, several learning algorithms can use such representations for a range of NLP-related tasks. The effectiveness or accuracy of an embedding can be established if it could be transferred to a downstream task of NLP to surpass the performance levels that could be reached by traditional machine learning algorithms. Over the past decade a number of word embedding methods, mainly catered to the traditional and context-based categories, were proposed. As part of this study, we examine different word representation models in terms of their power of expression, from historical models to today's state-of-the-art word representation language models.

Keyword

NLP, Machine Learning, word embedding, Deep Learning, Language model.

PDF Download (click here)