PAD三维情感空间中的语音情感识别
CSTR:
作者:
作者单位:

(武汉理工大学 计算机科学与技术学院, 武汉 430063)

作者简介:

陈逸灵(1995—),女,硕士研究生

通讯作者:

王红霞,99575522@qq.com

中图分类号:

TN912.34

基金项目:

国家自然科学基金(51179146)


Speech emotionestimation in PAD 3D emotion space
Author:
Affiliation:

(School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    离散情感描述模型将人类情感标注为离散的形容词标签, 该类模型只能表示有限种类的、单一明确的情感类型, 而维度情感模型从情感的多个维度量化了复杂情感的隐含状态.另外, 常用的语音情感特征梅尔频率倒谱系数(MFCC)存在因分帧处理引起相邻帧谱特征之间相关性被忽略问题, 容易丢失很多有用信息.为此本文提出改进方法, 从语谱图中提取时间点火序列特征、点火位置信息特征对MFCC进行补充, 将这三种特征分别用于语音情感识别, 根据识别结果从PAD维度情感模型的三个维度(Pleasure-displeasure愉悦度、Arousal-nonarousal激活度、Dominance-submissiveness优势度)进行相关性分析得到特征的权重系数, 加权融合后获得情感语音的最终PAD值, 将其映射至PAD三维情感空间中.实验表明, 增加的时间点火序列、点火位置信息不但能探测说话人的情感状态, 同时考虑了相邻频谱间的互相关信息, 与MFCC特征形成互补, 在提升基本情感类型离散识别效果的基础上, 将识别结果表示为PAD三维情感空间中的坐标点, 采用量化的方法揭示情感空间中各种情感的定位与联系, 展示出情感语音中糅杂的情感内容, 为后续复杂的语音情感分类识别奠定研究基础.

    Abstract:

    The discrete emotional description model labels human emotions as discrete adjectives. The model can only represent limited types of single and explicit emotion. The dimensional emotional model quantifies the implied state of complex emotions from the multiple dimensions. In addition, conventional speech emotion feature, Mel Frequency Cepstral Coefficient (MFCC), has the problem of neglecting the correlation between the adjacent frame spectral features due to frame division processing, making it susceptible to loss of much useful information. To solve this problem, this paper proposes an improved method, which extracts the time firing series feature and the firing position information feature from the spectrogram to supplement the MFCC, and applies them in speech emotion estimation respectively. Based on the predicted values, the proposed method calculates the correlation coefficients of each feature from three dimensions, P (Pleasure-displeasure), A (Arousal-nonarousal), and D (Dominance-submissiveness), as feature weights and obtains the final values of PAD in emotion speech after the weighted fusion, and finally maps it to PAD 3D emotion space. The experiments showed that the two added features could not only detect the emotional state of the speaker, but also consider the correlation between the adjacent frame spectral features, complementing to MFCC features. On the basis of improving the effect of discrete estimation of basic emotional types, this method represents the estimation results as coordinate points in PAD 3D emotion space, adopts the quantitative method to reveal the position and connection of various emotions in the emotion space, and indicates the emotion content mixed in the emotion speech. This study lays a foundation for subsequent research on classification estimation of complex speech emotions.

    参考文献
    相似文献
    引证文献
引用本文

陈逸灵,程艳芬,陈先桥,王红霞,李超. PAD三维情感空间中的语音情感识别[J].哈尔滨工业大学学报,2018,50(11):160. DOI:10.11918/j. issn.0367-6234.201806131

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2018-06-21
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2018-10-17
  • 出版日期:
文章二维码