Deep Learning-Based Speech Emotion Recognition: Leveraging DiverseDatasets and Augmentation Techniques for Robust Modeling

Ayush Porwal; Praveen Kumar Tyagi; Ajay Sharma; Dheeraj Kumar Agarwal

Please submit manuscripts in either of the following two submission systems

ScholarOne Manuscripts

ScholarOne

勤云稿件系统

Search by Issue

Search by Keywords

News & AnnouncementMORE

【03-29】2015 Outstanding Reviewers
【03-27】2014 Outstanding Reviewers
【02-18】2013 Outstanding Reviewers
【12-29】The First Outstanding Reviewers
【05-04】Copyright Transfer Agreement
【04-04】To authors

Supervised by Ministry of Industry and Information Technology of The People's Republic of China Sponsored by Harbin Institute of Technology Editor-in-chief Yu Zhou ISSNISSN 1005-9113 CNCN 23-1378/T

期刊网站二维码

微信公众号二维码

Related citation:

Ayush Porwal,Praveen Kumar Tyagi,Ajay Sharma,Dheeraj Kumar Agarwal.Deep Learning-Based Speech Emotion Recognition: Leveraging DiverseDatasets and Augmentation Techniques for Robust Modeling[J].Journal of Harbin Institute Of Technology(New Series),2025,32(3):54-65.DOI:10.11916/j.issn.1005-9113.2024005.

【Print】【HTML】【PDF download】【View/Add Comment】【Download reader】【 Close 】

←Previous|Next→

Back Issue Advanced Search

This paper has been: browsed 1016times downloaded 351times	码上扫一扫！
Shared by: Wechat More Font:larger+\|default\|smaller-
Deep Learning-Based Speech Emotion Recognition: Leveraging DiverseDatasets and Augmentation Techniques for Robust Modeling

Author Name	Affiliation
Ayush Porwal	Department of Electronics and Instrumentation Engineering, Shri G.S.Institute of Technology and Science, Indore 452001, Madhya Pradesh, India
Praveen Kumar Tyagi	Department of Electronics and Communication Engineering, Maulana Azad National Institute of Technology, Bhopal 462003, Madhya Pradesh, India
Ajay Sharma	School of Computing Science and Engineering, VIT Bhopal University, Sehore 466114, Madhya Pradesh, India
Dheeraj Kumar Agarwal	Department of Electronics and Communication Engineering, Maulana Azad National Institute of Technology, Bhopal 462003, Madhya Pradesh, India

Abstract:

In recent years, Speech Emotion Recognition (SER) has developed into an essential instrument for interpreting human emotions from auditory data. The proposed research focuses on the development of a SER system employing deep learning and multiple datasets containing samples of emotive speech. The primary objective of this research endeavor is to investigate the utilization of Convolutional Neural Networks (CNNs) in the process of sound feature extraction. Stretching, pitch manipulation, and noise injection are a few of the techniques utilized in this study to improve the data quality. Feature extraction methods including Zero Crossing Rate, Chroma_stft, Mel-scale Frequency Cepstral Coefficients(MFCC), Root Mean Square(RMS), and Mel-Spectogram are used to train a model. By using these techniques, audio signals can be transformed into recognized features that can be utilized to train the model. Ultimately, the study produces a thorough evaluation of the model's performance. When this method was applied, the model achieved an impressive accuracy of 94.57% on the test dataset. The proposed work was also validated on the EMO-BD and IEMOCAP datasets. These consist of further data augmentation, feature engineering, and hyperparameter optimization. By following these development paths, SER systems will be able to be implemented in real-world scenarios with greater accuracy and resilience.

Key words: voice signal emotion recognition deep learning CNN

DOI：10.11916/j.issn.1005-9113.2024005

Clc Number:TN18,TN912.3

Fund:

Search by Issue

Search by Keywords

News & AnnouncementMORE

LINKS