| Author Name | Affiliation | Postcode | | Liyan Zhang* | School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China | 116028 | | Jiaxin Du | School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China | | | Shuang Chen | School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China | | | Jiayan Li | School of Railway Intelligent Engineering, Dalian Jiaotong University, Dalian 116028, Liaoning, China | |
|
| Abstract: |
| To solve the problem that traditional MFCC features cannot fully represent dynamic speech features, this paper introduces first-order and second-order differencing on the basis of static MFCC features to extract dynamic MFCC features, and constructs a hybrid model (TWM) combining multi head attention mechanism and improved Wasserstein Generative Adversarial Network (WGAN-GP) on the basis of TIM-NET network. Among them, the multi head attention mechanism not only effectively prevents gradient vanishing, but also allows for the construction of deeper networks that can capture long-range dependencies and learn from information at different time steps, improving the accuracy of the model; WGAN-GP solves the problem of insufficient sample size by improving the quality of speech sample generation. The experimental results show that this method significantly improves the accuracy and robustness of speech emotion recognition on RAVDESS and EMO-DB datasets. |
| Key words: Dynamic features Speech emotion recognition Multi head attention mechanism Generative Adversarial Networks |
| DOI:10.11916/j.issn.1005-9113.24051 |
| Clc Number:TP183 |
| Fund: |
|
| Descriptions in Chinese: |
| 为了解决传统MFCC特征不能完全表示动态语音特征的问题,本文在静态MFCC特征的基础上引入了一阶和二阶差分来提取动态MFCC特征,并在TIM-NET网络的基础上构建了一个结合多头注意机制和改进的Wasserstein生成对抗网络(WGAN-GP)的混合模型(TWM);WGAN-GP通过提高语音样本生成的质量来解决样本量不足的问题。实验结果表明,该方法显著提高了RAVDESS和EMO-DB数据集上语音情感识别的准确性和鲁棒性。 |