标签归档:Emotion Recognition

Quantifying Intimacy in Language

Intimacy is a fundamental aspect of how we relate to others in social settings. Language encodes the social information of intimacy through both topics and other more subtle cues (such as linguistic hedging and swearing). Here, we introduce a new computational framework for studying expressions of the intimacy in language with an accompanying dataset and deep learning model for accurately predicting the intimacy level of questions (Pearson’s r=0.87). Through analyzing a dataset of 80.5M questions across social media, books, and films, we show that individuals employ interpersonal pragmatic moves in their language to align their intimacy with social settings. Then, in three studies, we further demonstrate how individuals modulate their intimacy to match social norms around gender, social distance, and audience, each validating key findings from studies in social psychology. Our work demonstrates that intimacy is a pervasive and impactful social dimension of language.

https://arxiv.org/pdf/2011.03020.pdf

亲密度是我们社交的基础方面。语言通过主题和其他的方式包含社交信息。因此,我们为计算语言中的亲密度设计了一种架构,这个架构包括一个数据集和一个基于深度学习的算法用于预测亲密度等级(人类的预测精度是0.87)。通过对数据集中80.5百万个问题进行分析,我们得知人们通过在语言中运用社交技巧去匹配社交关系中的亲密度。另外,我们还发现人们会对特定的性别,社交距离以及观众指制定亲密度以匹配这些社交标准。我们的工作展示了通过语言获得的亲密度是普遍的切对社交多元性有巨大影响的指标。

M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues

M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual,  and Speech Cues
We use three modalities, speech, text and the facial features. We first extract features to obtain fs, ft, ff from the raw inputs, is, it and if (purple box). The feature vectors then are checked if they are effective. We use a indicator function Ie (Equation 1) to process the feature vectors (yellow box). These vectors are then passed into the classification and fusion network of M3ER to get a prediction of the emotion (orange box). At the inference time, if we encounter a noisy modality, we regenerate a proxy feature vector (ps, pt or pf ) for that particular modality (blue box).

We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a persample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efficiency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82.7% on IEMOCAP and 89.0% on CMU-MOSEI, which, collectively, is an improvement of about 5% over prior work.

我们提出了M3ER, 一种基于学习的多模态输入情绪识别方法.我们的方法结合多个同时发生的模态(例如人脸,文本,语音),在任何独立的模态中,相比于其它方法对传感器的噪声更为鲁棒. M3ER模型是一个新颖的数据驱动的多类融合方法,可以强调更可靠的线索并在每个样本的基础上抑制其他线索. 通过引入使用规范相关分析来区分无效和有效模态的检查步骤,M3ER对传感器噪声具有鲁棒性。 M3ER还会生成代理功能来代替无效模式。我们通过对两个基准数据集IEMOCAP和CMU-MOSEI进行实验来证明我们网络的效率。我们报告IEMOCAP的平均准确度为82.7%,CMU-MOSEI的平均准确度为89.0%,总体而言,比以前的工作提高了5%。

尚未找到代码资源