Please submit manuscripts in either of the following two submission systems

    ScholarOne Manuscripts

  • ScholarOne
  • 勤云稿件系统

  • 登录

Search by Issue

  • 2026 Vol.33
  • 2025 Vol.32
  • 2024 Vol.31
  • 2023 Vol.30
  • 2022 Vol.29
  • 2021 Vol.28
  • 2020 Vol.27
  • 2019 Vol.26
  • 2018 Vol.25
  • 2017 Vol.24
  • 2016 vol.23
  • 2015 vol.22
  • 2014 vol.21
  • 2013 vol.20
  • 2012 vol.19
  • 2011 vol.18
  • 2010 vol.17
  • 2009 vol.16
  • No.1
  • No.2

Supervised by Ministry of Industry and Information Technology of The People's Republic of China Sponsored by Harbin Institute of Technology Editor-in-chief Yu Zhou ISSNISSN 1005-9113 CNCN 23-1378/T

期刊网站二维码
微信公众号二维码
Related citation:
【Print】   【HTML】   【PDF download】   View/Add Comment  Download reader   Close
Back Issue    Advanced Search
This paper has been: browsed 200times   downloaded 168times  
Shared by: Wechat More
Improved MFCC features and TWM model for speech emotion recognition
Author NameAffiliationPostcode
Liyan Zhang* School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China 116028
Jiaxin Du School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China 
Shuang Chen School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China 
Jiayan Li School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China 
Abstract:
To solve the problem that traditional MFCC features cannot fully represent dynamic speech features, this paper introduces first-order and second-order differencing on the basis of static MFCC features to extract dynamic MFCC features, and constructs a hybrid model (TWM) combining multi head attention mechanism and improved Wasserstein Generative Adversarial Network (WGAN-GP) on the basis of TIM-NET network. Among them, the multi head attention mechanism not only effectively prevents gradient vanishing, but also allows for the construction of deeper networks that can capture long-range dependencies and learn from information at different time steps, improving the accuracy of the model; WGAN-GP solves the problem of insufficient sample size by improving the quality of speech sample generation. The experimental results show that this method significantly improves the accuracy and robustness of speech emotion recognition on RAVDESS and EMO-DB datasets.
Key words:  Dynamic features  Speech emotion recognition  Multi head attention mechanism  Generative Adversarial Networks
DOI:10.11916/j.issn.1005-9113.24051
Clc Number:TP183
Fund:
Descriptions in Chinese:
  为了解决传统MFCC特征不能完全表示动态语音特征的问题,本文在静态MFCC特征的基础上引入了一阶和二阶差分来提取动态MFCC特征,并在TIM-NET网络的基础上构建了一个结合多头注意机制和改进的Wasserstein生成对抗网络(WGAN-GP)的混合模型(TWM);WGAN-GP通过提高语音样本生成的质量来解决样本量不足的问题。实验结果表明,该方法显著提高了RAVDESS和EMO-DB数据集上语音情感识别的准确性和鲁棒性。

LINKS