Analysis of infoware and software for human affective states recognition

A. A Dvoynikova; Двойникова А. А; M. V Markitantov; Маркитантов М. В; E. V Ryumina; Рюмина Е. В; M. Yu Uzdiaev; Уздяев М. Ю; A. N Velichko; Величко А. Н; D. A Ryumin; Рюмин Д. А; E. E Lyakso; Ляксо Е. Е; A. A Karpov; Карпов А. А

doi:10.15622/ia.21.6.2

Analysis of infoware and software for human affective states recognition

Authors: Dvoynikova A.A¹, Markitantov M.V¹, Ryumina E.V¹, Uzdiaev M.Y.¹, Velichko A.N¹, Ryumin D.A¹, Lyakso E.E², Karpov A.A¹
Affiliations:
1. St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
2. St. Petersburg State University
Issue: Vol 21, No 6 (2022)
Pages: 1097-1144
Section: Artificial intelligence, knowledge and data engineering
URL: https://journal-vniispk.ru/2713-3192/article/view/267197
DOI: https://doi.org/10.15622/ia.21.6.2
ID: 267197

Cite item

Full Text

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

The article presents an analytical review of research in the affective computing field. This research direction is a component of artificial intelligence, and it studies methods, algorithms and systems for analyzing human affective states during interactions with other people, computer systems or robots. In the field of data mining, the definition of affect means the manifestation of psychological reactions to an exciting event, which can occur both in the short and long term, and also have different intensity. The affects in this field are divided into 4 types: affective emotions, basic emotions, sentiment and affective disorders. The manifestation of affective states is reflected in verbal data and non-verbal characteristics of behavior: acoustic and linguistic characteristics of speech, facial expressions, gestures and postures of a person. The review provides a comparative analysis of the existing infoware for automatic recognition of a person’s affective states on the example of emotions, sentiment, aggression and depression. The few Russian-language, affective databases are still significantly inferior in volume and quality compared to electronic resources in other world languages. Thus, there is a need to consider a wide range of additional approaches, methods and algorithms used in a limited amount of training and testing data, and set the task of developing new approaches to data augmentation, transferring model learning and adapting foreign-language resources. The article describes the methods of analyzing unimodal visual, acoustic and linguistic information, as well as multimodal approaches for the affective states recognition. A multimodal approach to the automatic affective states analysis makes it possible to increase the accuracy of recognition of the phenomena compared to single-modal solutions. The review notes the trend of modern research that neural network methods are gradually replacing classical deterministic methods through better quality of state recognition and fast processing of large amount of data. The article discusses the methods for affective states analysis. The advantage of multitasking hierarchical approaches is the ability to extract new types of knowledge, including the influence, correlation and interaction of several affective states on each other, which potentially leads to improved recognition quality. The potential requirements for the developed systems for affective states analysis and the main directions of further research are given.

Keywords

affective states, affective computing, emotions, sentiment, depression, aggression, databases, computer systems

References

Picard R.W. Affective Computing for HCI // HCI (1). 1999. С. 829-833.
Вилюнас В.К. Эмоции // Большой психологический словарь / Под общ. ред. Б.Г. Мещерякова, В.П. Зинченко // СПб.: Прайм-ЕВРОЗНАК. 2007. С. 565-568.
Крафт-Эбинг Р. Учебник психиатрии. 1897. 698 c.
Ильин Е.П. Эмоции и чувства. Издательский дом "Питер". 2011. 782 c.
Тхостов А.Ш., Колымба И.Г. Эмоции и аффекты: общепсихологический и патологический аспекты // Психологический журнал. 1998. № 4. С. 41-48.
Ениколопов С.Н. Понятие агрессии в современной психологии // Прикладная психология. 2001. №. 1. С. 60-72.
Верхоляк О.В., Карпов А.А. Глава «Автоматический анализ эмоционально окрашенной речи» в монографии «Голосовой портрет ребенка с типичным и атипичным развитием» // Е.Е. Ляксо, О.В. Фролова, С.В. Гречаный, Ю.Н. Матвеев, О.В. Верхоляк, А.А. Карпов / под ред. Е.Е. Ляксо, О.В. Фроловой // СПб. Изд-во: Издательско-полиграфическая ассоциация высших учебных заведений. 2020. C. 204.
Двойникова А.А., Карпов А.А. Аналитический обзор подходов к распознаванию тональности русскоязычных текстовых данных // Информационно-управляющие системы. 2020. №. 4 (107). С. 20-30.
Величко А.Н., Карпов А.А. Аналитический обзор систем автоматического определения депрессии по речи. Информатика и автоматизация. №20. 2021. С. 497-529.
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5). American Psychiatric Publishing, Arlington, VA. 2013.
Tzirakis P., Trigeorgis G., Nicolaou M.A., et al. End-to-end multimodal emotion recognition using deep neural networks // IEEE Journal of Selected Topics in Signal Processing. 2017. vol. 11. no. 8. pp. 1301-1309.
Dhall A., Goecke R., Gedeon T. Collecting large, richly annotated facial-expression databases from movies // IEEE Multimedia. 2012. vol. 19. no. 03. pp. 34-41.
Kossaifi J., Tzimiropoulos G., Todorovic S., et al. AFEW-VA database for valence and arousal estimation in-the-wild // Image and Vision Computing. 2017. vol. 65. pp. 23-36.
Kollias D., Zafeiriou S. Aff-wild2: Extending the aff-wild database for affect recognition // arXiv preprint arXiv:1811.07770. 2018.
Lien J.J., Kanade T., Cohn J.F., et al. Automated facial expression recognition based on FACS action units // Proceedings of third IEEE international conference on automatic face and gesture recognition. IEEE. 1998. pp. 390-395.
Busso C., Bulut M., Lee C.-C., et al. IEMOCAP: Interactive emotional dyadic motion capture database // Language Resources and Evaluation. 2008. vol. 42. no. 4. pp. 335-359.
Ringeval F., Sonderegger A., Sauer J., et al. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions // Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE. 2013. pp. 1-8.
Kossaifi J., Walecki R., Panagakis Y., et al. SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild // IEEE Transactions on Pattern Analysis & Machine Intelligence. 2021. vol. 43. no. 03. pp. 1022-1040.
McKeown G., Valstar M.F., Cowie R., et al. The SEMAINE corpus of emotionally coloured character interactions // Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE. 2010. pp. 1079-1084.
Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing // Proceedings of the International Conference on Speech and Computer. Springer, Cham. 2018. pp. 501-510.
Poria S., Hazarika D., Majumder N., et al. Meld: A multimodal multi-party dataset for emotion recognition in conversations // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. pp. 527-536.
Zadeh A.B., Liang P.P., Poria S., et al. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. pp. 2236-2246
Pérez-Rosas V., Mihalcea R., Morency L.P. Utterance-level multimodal sentiment analysis // Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. vol. 1. 2013. pp. 973-982.
Zadeh A., Zellers R., Pincus E., et al. Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages // IEEE Intelligent Systems. 2016. vol. 31. no. 6. pp. 82-88.
Morency L.P., Mihalcea R., Doshi P. Towards multimodal sentiment analysis: Harvesting opinions from the web // Proceedings of the 13th International Conference on Multimodal Interfaces. 2011. pp. 169-176.
Yu W., Xu H., Meng F., et al. Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. pp. 3718-3727.
Lefter I., Rothkrantz L.J.M., Burghouts G., et al. Addressing multimodality in overt aggression detection // Proceedings of the International Conference on Text, Speech and Dialogue. Springer, Berlin, Heidelberg. 2011. pp. 25-32.
Lefter I., Burghouts G.J., Rothkrantz L.J.M. An audio-visual dataset of human–human interactions in stressful situations // Journal on Multimodal User Interfaces. 2014. vol. 8. no. 1. pp. 29-41.
Lefter I., Rothkrantz L.J.M. Multimodal cross-context recognition of negative interactions // Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE. 2017. pp. 56-61.
Lefter I., Jomker C.M., Tuente S.K., et al. NAA: A multimodal database of negative affect and aggression // Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE. 2017. pp. 21-27.
Nievas E.B., Déniz-Suárez O., Garcia G., et al. Violence detection in video using computer vision techniques // Proceedings of the International conference on Computer analysis of images and patterns. Springer, Berlin, Heidelberg. 2011. pp. 332-339.
Perez M., Kot A.C., Rocha A. Detection of real-world fights in surveillance videos // Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2019. pp. 2662-2666.
Cheng M., Cai K., Li M. Rwf-2000: An open large scale video database for violence detection // Proceedings of the 25th International Conference on Pattern Recognition (ICPR). IEEE. 2021. pp. 4183-4190.
Kumar R., Reganti A.N., Bhatia A., et al. Aggression-annotated corpus of hindi-english code-mixed data // arXiv preprint arXiv:1803.09402. 2018.
Bozyiğit A., Utku S., Nasibov E. Cyberbullying detection: Utilizing social media features // Expert Systems with Applications. 2021. vol. 179. p. 115001.
Gratch J., Artstein R., Lucas G., et al. The Distress Analysis Interview Corpus of Human and Computer Interviews // Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). Reykjavik, Iceland. 2014. pp. 3123-3128.
Valstar M., Schuller B., Smith K., et al. AVEC 2013: the continuous audio/visual emotion and depression recognition challenge // Proceedings of the 3rd ACMinternational workshop on Audio/visual emotion challenge (AVEC’13). Association for Computing Machinery, New York, NY, USA. 2013. pp. 3–10.
Yang Y., Fairbairn C., Cohn J. Detecting depression severity from vocal prosody // IEEE Transactions on Affective computing. 2013. vol. 4. no. 2. pp. 142–150.
Alghowinem S., Goecke R., Wagner M., et al. From joyous to clinically depressed: Mood detection using spontaneous speech // Proceedings of FLAIRS Conference, G.M. Youngblood and P.M. McCarthy, Eds. AAAI Press. 2012. pp. 141–146.
Huang Z., Epps J., Joachim D., et al. Depression detection from short utterances via diverse smartphones in natural environmental conditions // Proceedings of Interspeech. 2018. pp. 3393–3397.
Ryumina E., Karpov A. Facial expression recognition using distance importance scores between facial landmarks // CEUR Workshop Proceedings. 2020, vol. 274. pp. 1-10.
Axyonov, А., Ryumin, D., Kagirov, I. Method of Multi-Modal Video Analysis of Hand Movements for Automatic Recognition of Isolated Signs of Russian Sign Language // Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2021. vol. XLIV-2/W1-2021. pp. 7–13.
He K., Zhang X., Ren S., et al. Deep residual learning for image recognition // Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. pp. 770–778.
Simonyan K., Zisserman A. Very deep convolutional networks for largescale image recognition // Proceedings of the 3rd International Conference on Learning Representations (ICLR). 2015. pp. 1–14.
Niu B., Gao Z., Guo B. Facial expression recognition with LBP and ORB features // Computational Intelligence and Neuroscience. 2021. vol. 2021.
Verma S., Wang J., Ge Zh., et al. Deep-HOSeq: Deep higher order sequence fusion for multimodal sentiment analysis // Proceedings of IEEE International Conference on Data Mining (ICDM). IEEE. 2020. pp. 561-570.
Eyben F., Weninger, F., Gross, F., et al. Recent developments in opensmile, the munich open-source multimedia feature extractor // Proceedings of ACM International Conference on Multimedia. 2013. pp. 835–838.
Schuller B.W., Batliner A., Bergler C., et al. The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates // Proceedings of Interspeech. 2021. pp. 431–435.
Eyben F., Scherer K.R., Schuller, B.W., et al. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing // IEEE transactions on affective computing. 2015. vol. 7. no. 2. pp. 190–202.
Schmitt M., Ringeval F., Schuller B.W. At the border of acoustics and linguistics: Bag-of-Audio-Words for the recognition of emotions in speech // Proceedings of Interspeech. 2016. pp. 495–499.
Kaya H., Karpov A.A., Salah A.A. Fisher vectors with cascaded normalization for paralinguistic analysis // Proceedings of Interspeech. 2015. pp. 909–913.
Zhao Z., Zhao Y., Bao Z., et al. Deep spectrum feature representations for speech emotion recognition // Proceedings of Workshop on Affective Social Multimedia Computing and First Multi-Modal Affective Computing of Large-Scale Multimedia Data. 2018. pp. 27–33.
Freitag M., Amiriparian S., Pugachevskiy S., et al. AuDeep: Unsupervised learning of representations from audio with deep recurrent neural networks // The Journal of Machine Learning Research. 2017. vol. 18. no. 1. pp. 6340–6344.
Shor J., Jansen A., Maor R., et al. Towards Learning a Universal Non-Semantic Representation of Speech // Proceedings of Interspeech. 2020. pp. 140–144.
Wagner J., Triantafyllopoulos A., Wierstorf H., et al. Dawn of the transformer era in speech emotion recognition: closing the valence gap // arXiv preprint arXiv:2203.07378. 2022. pp. 1-25.
Degottex G., Kane J., Drugman T., et al. COVAREP – A collaborative voice analysis repository for speech technologies // Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014. pp. 960-964.
Sogancioglu G., Verkholyak O., Kaya H., et al. Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition // Proceedings of Interspeech. 2020. pp. 2097-2101.
Sebastian J., Pierucci P. Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts // Proceedings of Interspeech. 2019. pp. 51-55.
Xu H., Zhang H., Han K., et al. Learning alignment for multimodal emotion recognition from speech // arXiv preprint arXiv:1909.05645. 2019.
Dellaert F., Polzin T., Waibel A. Recognizing emotion in speech // Proceedings of the 4th Int. Conf. Spoken Lang. Process (ICSLP). 1996. pp. 1970–1973.
Neiberg D., Elenius K., Laskowski K. Emotion recognition in spontaneous speech using GMMs // Proceedings of the 9th Int. Conf. Spoken Lang. Process. 2006. pp. 809–812.
Nogueiras A., Moreno A., Bonafonte A., et al. Speech emotion recognition using hidden Markov models // Proceedings of the 7th Eur. Conf. Speech Commun. Technol. 2001. pp. 746–749.
Raudys Š. On the universality of the single-layer perceptron model // Neural Networks and Soft Computing. Physica, Heidelberg. 2003. pp. 79-86.
Wang J., Lu S., Wang S.-H., et al. A review on extreme learning machine // Multimedia Tools and Applications. 2021. pp. 1-50.
Kruse R., Borgelt C., Klawonn F., et al. Multi-layer perceptrons // Computational Intelligence. Springer, Cham. 2022. pp. 53-124.
Sainath T.N., Vinyals O., Senior A., et al. Convolutional, long short-term memory, fully connected deep neural networks // Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015. pp. 4580–4584.
Kim J., Truong K.P., Englebienne G., et al. Learning spectro-temporal features with 3D CNNs for speech emotion recognition // Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction (ACII). 2017. pp. 383–388.
Chao L., Tao J., Yang M., et al. Long short term memory recurrent neural network based multimodal dimensional emotion recognition // Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. 2015. pp. 65-72.
Wang J., Xue M., Culhane R., et al. Speech emotion recognition with dual-sequence lstm architecture // Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020. pp. 6474–6478.
Chen Q., Huang, G. A novel dual attention-based blstm with hybrid features in speech emotion recognition // Engineering Applications of Artificial Intelligence. 2021. vol. 102. p. 104277.
Zhao J., Mao X., Chen L. Speech emotion recognition using deep 1d & 2d cnn lstm networks // Biomedical Signal Processing and Control. 2019. vol. 47. pp. 312–323.
Milner R., Jalal M.A., Ng R.W., et al. A cross-corpus study on speech emotion recognition // Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2019. pp. 304-311.
Vaswani A., Shazeer N., Parmar N., et al. Attention is all you need // In Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017). 2017. vol. 30. pp. 1-11.
Ho N.H., Yang H.J., Kim S.H., et al. Multimodal approach of speech emotion recognition using multi-level multi-head fusion attention-based recurrent neural network // IEEE Access. 2020. vol. 8. pp.61672-61686.
Hsu W.N., Bolte B., Tsai Y.-H.H., et al. Hubert: Self-supervised speech representation learning by masked prediction of hidden units // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021. vol. 29. pp. 3451–3460.
Siriwardhana S., Reis A., Weerasekera R., et al. Jointly fine-tuning “bert-like” self-supervised models to improve multimodal speech emotion recognition // arXiv preprint arXiv:2008.06682. 2020. pp. 1-5.
Kratzwald B., Ilic S., Kraus M., et al. Deep learning for affective computing: Text-based emotion recognition in decision support // Decision Support Systems. 2018. vol. 115. pp. 24–35.
Stappen L., Baird A., Christ L., et al. The muse 2021 multimodal sentiment analysis challenge: Sentiment, emotion, physiological-emotion, and stress // Proceedings of the 29th ACM International Conference on Multimedia (ACM MM). 2021. pp. 5706–5707.
Dresvyanskiy D., Ryumina E., Kaya H., et al. 2022. End-to-End Modeling and Transfer Learning for Audiovisual Emotion Recognition in-the-Wild // Multimodal Technologies and Interaction. vol 6. no. 2. pp. 11.
Fedotov D., Kaya H., Karpov A. Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition: Challenges and Mixup // Proceedings of 20th International Conference on Speech and Computer (SPECOM-2018). 2018. pp. 155-165.
Wu C.H., Lin J.C., Wei W.L. Survey on audiovisual emotion recognition: databases, features, and data fusion strategies // APSIPA Transactions on Signal and Information Processing. 2014. vol. 3. pp. 18.
Al Osman H., Falk T.H. Multimodal affect recognition: Current approaches and challenges // Emotion and Attention Recognition Based on Biological Signals and Images. 2017. pp. 59-86.
Liu D., Wang Z., Wang L., et al. Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning // Frontiers in Neurorobotics. 2021. pp. 13.
Zhang C., Yang Z., He X., et al. Multimodal intelligence: Representation learning, information fusion, and applications // IEEE Journal of Selected Topics in Signal Processing. 2020. vol. 14. no. 3. pp. 478-493.
Markitantov M., Ryumina E., Ryumin D., et al. Biometric Russian Audio-Visual Extended MASKS (BRAVE-MASKS) Corpus: Multimodal Mask Type Recognition Task. // Proceedings of Interspeech. 2022. pp. 1756-1760.
Yang L., Sahli H., Xia X., et al. Hybrid depression classification and estimation from audio video and text information // Proceedings of 7th Annual Workshop on Audio/Visual Emotion Challenge (AVEC). 2017. pp. 45-51.
Mai S., Hu H., Xing S. Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion // Proceedings of the AAAI Conference on Artificial Intelligence. 2020. vol. 34. no. 01. pp. 164-172.
Ghosal D., Akhtar M.S., Chauhan D., et al. Contextual inter-modal attention for multi-modal sentiment analysis // Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2018. pp. 3454-3466.
Akhtar M.S., Chauhan D.S., Ghosal D., et al. Multi-task learning for multi-modal emotion recognition and sentiment analysis // arXiv preprint arXiv:1905.05812. 2019. pp. 1-10.
Sun Z., Sarma P, Sethares W., et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis // Proceedings of the AAAI Conference on Artificial Intelligence. 2020. vol. 34. no. 05. pp. 8992-8999.3.
Mai S., Hu H., Xing S. Divide, conquer and combine: Hierarchical feature fusion network with local and global perspectives for multimodal affective computing // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. pp. 481-492.
Chauhan D.S., Akhtar M.S., Ekbal A., et al. Context-aware interactive attention for multi-modal sentiment and emotion analysis // Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. pp. 5647-5657.
Delbrouck J.B., Tits N., Brousmiche M., et al. A transformer-based joint-encoding for emotion recognition and sentiment analysis // arXiv preprint arXiv:2006.15955. 2020.
Khare A., Parthasarathy S., Sundaram S. Self-Supervised learning with cross-modal transformers for emotion recognition // IEEE Spoken Language Technology Workshop (SLT). IEEE. 2021. pp. 381-388.
Zaib S., Asif M., Arooj M. Development of Aggression Detection Technique in Social Media // International Journal of Information Technology and Computer Science. 2019. vol. 5. no. 8. pp. 40-46.
Левоневский Д.К., Савельев А.И. Подход и архитектура для систематизации и выявления признаков агрессии в русскоязычном текстовом контенте // Вестник Томского государственного университета. Управление, вычислительная техника и информатика. 2021. № 54. С. 56-64.
Sadiq S., Mehmood A., Ullah S., et al. Aggression detection through deep neural model on twitter // Future Generation Computer Systems. 2021. vol. 114. pp. 120-129.
Tommasel A., Rodriguez J.M., Godoy D. Textual Aggression Detection through Deep Learning // TRAC@ COLING 2018. 2018. pp. 177-187.
Mandl T., Modha S., Shahi G.K., et al. Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages // arXiv preprint arXiv:2112.09301. 2021.
Potharaju Y., Kamsali M., Kesavari C.R. Classification of Ontological Violence Content Detection through Audio Features and Supervised Learning // International Journal of Intelligent Engineering and Systems. 2019. vol. 12. no. 3. pp. 20-30.
Sahoo S., Routray A. Detecting aggression in voice using inverse filtered speech features // IEEE Transactions on Affective Computing. 2016. vol. 9. no. 2. pp. 217-226.
Santos F., Durães D., Marcondes F.M., et al. In-car violence detection based on the audio signal // Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning. Springer, Cham. 2021. pp. 437-445.
Liang Q., Li Y., Chen B., et al. Violence behavior recognition of two-cascade temporal shift module with attention mechanism // Journal of Electronic Imaging. 2021. vol. 30. no. 4. pp. 043009.
Уздяев М.Ю. Нейросетевая модель многомодального распознавания человеческой агрессии // Вестник КРАУНЦ. Физико-математические науки. 2020. Т. 33. №. 4. С. 132-149.
Yao Y., Papakostas M., Burzo M., et al. MUSER: MUltimodal Stress detection using Emotion Recognition as an Auxiliary Task // Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 2021. pp. 2714-2725.
Sangwan S., Chauhan D.S., Akhtar M., et al. December. Multi-task gated contextual cross-modal attention framework for sentiment and emotion analysis // Proceedings of International Conference on Neural Information Processing. 2019. pp. 662-669.
Kollias D., Zafeiriou S. Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface // arXiv preprint arXiv:1910.04855. 2019. pp. 1-15.
Li Y., Zhao T., Kawahara T. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning // Proceedings of Interspeech. 2019. pp. 2803-2807.
Yu W., Xu H., Yuan Z., et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis // Proceedings of the AAAI Conference on Artificial Intelligence. 2021. vol. 35. no. 12. pp. 10790-10797.
Vu M.T., Beurton-Aimar M., Marchand S. Multitask multi-database emotion recognition // Proceedings of IEEE/CVF International Conference on Computer Vision. 2021. pp. 3637-3644.
Velichko A., Markitantov M., Kaya H., et al. Complex Paralinguistic Analysis of Speech: Predicting Gender, Emotions and Deception in a Hierarchical Framework // Proceedings of Interspeech. 2022. pp. 4735-4739.

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register

Analysis of infoware and software for human affective states recognition

Full Text

Abstract

Keywords

About the authors

References

Supplementary files