Bimonthly    Since 1986
ISSN 1004-9037
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
   
      30 Dec 2022, Volume 37 Issue 5   
    Article

    MACHINE LEARNING AND EASYOCR BASED LANGUAGE RECOGNITION FOR HANDWRITTEN INDIC EXTRACTION AND CLASSIFICATION
    Sakuldeep Singh, Dr R.B.Singh
    Journal of Data Acquisition and Processing, 2022, 37 (5): 1823-1835 . 

    Abstract

    Handwritten character and number recognition remains challenging after decades of study of offline Indic recapitulations. This is because the Indic scripts share a similar structure and have characters that look very similar to one another. Similar to other computer vision tasks, state-of-the-art results have been achieved in handwritten Indic scripts recognition by employing deep learning-based methods. This is the case even though the problem is relatively new. However, developing a successful handcrafted Machine learning model for various Indian languages from scratch involves a large amount of trial and error and calls for a lot of expertise about the problem domain. By employing an evolutionary meta-heuristics approach, we were able to streamline the search process and find a solution. We were able to automatically improve our text-extraction and language-recognition capabilities by using this method, which relied on a combination of Machine learning and EasyOcr. We focused on Hindi, Malayalam, Kannada, and Tamil languages with Machine learning models to detect languages present in images using EasyOcr library, proposed five distinct models in which Naive Bayes and Random outperform with accuracy 98.70% and 98% with 100% detection and text extraction rate. This is in comparison to previous work that was focused on single languages such as Bengali, Gujrati, and Devnagari rather than Hindi and Dravidian languages. Take into consideration this fact.

    Keyword

    OCR, Indian Handwritten Scripts, Machine Learning, EasyOcr, Naïve Bayes, Random Forest.


    PDF Download (click here)

SCImago Journal & Country Rank

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved