|
05 July 2023, Volume 38 Issue 3
|
|
|
Abstract
A rapidly expanding topic of study in the science of artificial intelligence is emotion recognition from speech. In this study, we created a deep learning model to identify voice cues that indicate emotions. For the model's training and testing, we used four different datasets: RAVDESS, SAVEE, CREMA-D, and TESS. It is difficult to create a generalised model that can be applied to all of the collected datasets because of the varied speakers, languages, and emotions in the datasets. To enhance the quality of the data, we pre-processed the audio signals using a variety of methods, including down sampling, normalisation, and feature extraction. We demonstrated that the proposed model attained an accuracy of 80.08% using Mel-Frequency Cepstral Coefficients (MFCCs) and a deep neural network model made up of these coefficients. Our research demonstrates that the suggested model can be utilised to accurately classify emotions from voice data. The created model can be applied in a variety of situations where emotion recognition is essential, including speech therapy, customer service, and human-robot interaction. In conclusion, the suggested emotion detection system identified emotions from speech signals with encouraging success. The study emphasises the value of feature engineering, model selection, and dataset selection in creating an efficient emotion recognition system. The technology has the potential for several uses, including social robots, monitoring mental health, and customised user interfaces.
Keyword
#
PDF Download (click here)
|