Journal of Data Acquisition and Processing

30 Dec 2022, Volume 37 Issue 5

Article

MACHINE LEARNING AND EASYOCR BASED LANGUAGE RECOGNITION FOR HANDWRITTEN INDIC EXTRACTION AND CLASSIFICATION

Sakuldeep Singh, Dr R.B.Singh

Journal of Data Acquisition and Processing, 2022, 37 (5): 1823-1835 .

Abstract

Handwritten character and number recognition remains challenging after decades of study of offline Indic recapitulations. This is because the Indic scripts share a similar structure and have characters that look very similar to one another. Similar to other computer vision tasks, state-of-the-art results have been achieved in handwritten Indic scripts recognition by employing deep learning-based methods. This is the case even though the problem is relatively new. However, developing a successful handcrafted Machine learning model for various Indian languages from scratch involves a large amount of trial and error and calls for a lot of expertise about the problem domain. By employing an evolutionary meta-heuristics approach, we were able to streamline the search process and find a solution. We were able to automatically improve our text-extraction and language-recognition capabilities by using this method, which relied on a combination of Machine learning and EasyOcr. We focused on Hindi, Malayalam, Kannada, and Tamil languages with Machine learning models to detect languages present in images using EasyOcr library, proposed five distinct models in which Naive Bayes and Random outperform with accuracy 98.70% and 98% with 100% detection and text extraction rate. This is in comparison to previous work that was focused on single languages such as Bengali, Gujrati, and Devnagari rather than Hindi and Dravidian languages. Take into consideration this fact.

Keyword

OCR, Indian Handwritten Scripts, Machine Learning, EasyOcr, Naïve Bayes, Random Forest.

PDF Download (click here)