Analysis of infoware and software for human affective states recognition
- Authors: Dvoynikova A.A1, Markitantov M.V1, Ryumina E.V1, Uzdiaev M.Y.1, Velichko A.N1, Ryumin D.A1, Lyakso E.E2, Karpov A.A1
-
Affiliations:
- St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
- St. Petersburg State University
- Issue: Vol 21, No 6 (2022)
- Pages: 1097-1144
- Section: Artificial intelligence, knowledge and data engineering
- URL: https://journal-vniispk.ru/2713-3192/article/view/267197
- DOI: https://doi.org/10.15622/ia.21.6.2
- ID: 267197
Cite item
Full Text
Abstract
About the authors
A. A Dvoynikova
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
Email: dvoynikova.a@iias.spb.su
14-th Line V.O. 39
M. V Markitantov
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
Email: m.markitantov@yandex.ru
14-th Line V.O. 39
E. V Ryumina
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
Email: ryumina.e@iias.spb.su
14-th Line V.O. 39
M. Yu Uzdiaev
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
Email: uzdyaev.m@iias.spb.su
14-th Line V.O. 39
A. N Velichko
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
Email: velichko.a.n@mail.ru
14-th Line V.O. 39
D. A Ryumin
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
Email: ryumin.d@iias.spb.su
14-th Line V.O. 39
E. E Lyakso
St. Petersburg State University
Email: lyakso@gmail.com
University Emb. 7-9
A. A Karpov
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
Email: karpov@iias.spb.su
14-th Line V.O. 39
References
- Picard R.W. Affective Computing for HCI // HCI (1). 1999. С. 829-833.
- Вилюнас В.К. Эмоции // Большой психологический словарь / Под общ. ред. Б.Г. Мещерякова, В.П. Зинченко // СПб.: Прайм-ЕВРОЗНАК. 2007. С. 565-568.
- Крафт-Эбинг Р. Учебник психиатрии. 1897. 698 c.
- Ильин Е.П. Эмоции и чувства. Издательский дом "Питер". 2011. 782 c.
- Тхостов А.Ш., Колымба И.Г. Эмоции и аффекты: общепсихологический и патологический аспекты // Психологический журнал. 1998. № 4. С. 41-48.
- Ениколопов С.Н. Понятие агрессии в современной психологии // Прикладная психология. 2001. №. 1. С. 60-72.
- Верхоляк О.В., Карпов А.А. Глава «Автоматический анализ эмоционально окрашенной речи» в монографии «Голосовой портрет ребенка с типичным и атипичным развитием» // Е.Е. Ляксо, О.В. Фролова, С.В. Гречаный, Ю.Н. Матвеев, О.В. Верхоляк, А.А. Карпов / под ред. Е.Е. Ляксо, О.В. Фроловой // СПб. Изд-во: Издательско-полиграфическая ассоциация высших учебных заведений. 2020. C. 204.
- Двойникова А.А., Карпов А.А. Аналитический обзор подходов к распознаванию тональности русскоязычных текстовых данных // Информационно-управляющие системы. 2020. №. 4 (107). С. 20-30.
- Величко А.Н., Карпов А.А. Аналитический обзор систем автоматического определения депрессии по речи. Информатика и автоматизация. №20. 2021. С. 497-529.
- American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5). American Psychiatric Publishing, Arlington, VA. 2013.
- Tzirakis P., Trigeorgis G., Nicolaou M.A., et al. End-to-end multimodal emotion recognition using deep neural networks // IEEE Journal of Selected Topics in Signal Processing. 2017. vol. 11. no. 8. pp. 1301-1309.
- Dhall A., Goecke R., Gedeon T. Collecting large, richly annotated facial-expression databases from movies // IEEE Multimedia. 2012. vol. 19. no. 03. pp. 34-41.
- Kossaifi J., Tzimiropoulos G., Todorovic S., et al. AFEW-VA database for valence and arousal estimation in-the-wild // Image and Vision Computing. 2017. vol. 65. pp. 23-36.
- Kollias D., Zafeiriou S. Aff-wild2: Extending the aff-wild database for affect recognition // arXiv preprint arXiv:1811.07770. 2018.
- Lien J.J., Kanade T., Cohn J.F., et al. Automated facial expression recognition based on FACS action units // Proceedings of third IEEE international conference on automatic face and gesture recognition. IEEE. 1998. pp. 390-395.
- Busso C., Bulut M., Lee C.-C., et al. IEMOCAP: Interactive emotional dyadic motion capture database // Language Resources and Evaluation. 2008. vol. 42. no. 4. pp. 335-359.
- Ringeval F., Sonderegger A., Sauer J., et al. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions // Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE. 2013. pp. 1-8.
- Kossaifi J., Walecki R., Panagakis Y., et al. SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild // IEEE Transactions on Pattern Analysis & Machine Intelligence. 2021. vol. 43. no. 03. pp. 1022-1040.
- McKeown G., Valstar M.F., Cowie R., et al. The SEMAINE corpus of emotionally coloured character interactions // Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE. 2010. pp. 1079-1084.
- Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing // Proceedings of the International Conference on Speech and Computer. Springer, Cham. 2018. pp. 501-510.
- Poria S., Hazarika D., Majumder N., et al. Meld: A multimodal multi-party dataset for emotion recognition in conversations // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. pp. 527-536.
- Zadeh A.B., Liang P.P., Poria S., et al. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. pp. 2236-2246
- Pérez-Rosas V., Mihalcea R., Morency L.P. Utterance-level multimodal sentiment analysis // Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. vol. 1. 2013. pp. 973-982.
- Zadeh A., Zellers R., Pincus E., et al. Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages // IEEE Intelligent Systems. 2016. vol. 31. no. 6. pp. 82-88.
- Morency L.P., Mihalcea R., Doshi P. Towards multimodal sentiment analysis: Harvesting opinions from the web // Proceedings of the 13th International Conference on Multimodal Interfaces. 2011. pp. 169-176.
- Yu W., Xu H., Meng F., et al. Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. pp. 3718-3727.
- Lefter I., Rothkrantz L.J.M., Burghouts G., et al. Addressing multimodality in overt aggression detection // Proceedings of the International Conference on Text, Speech and Dialogue. Springer, Berlin, Heidelberg. 2011. pp. 25-32.
- Lefter I., Burghouts G.J., Rothkrantz L.J.M. An audio-visual dataset of human–human interactions in stressful situations // Journal on Multimodal User Interfaces. 2014. vol. 8. no. 1. pp. 29-41.
- Lefter I., Rothkrantz L.J.M. Multimodal cross-context recognition of negative interactions // Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE. 2017. pp. 56-61.
- Lefter I., Jomker C.M., Tuente S.K., et al. NAA: A multimodal database of negative affect and aggression // Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE. 2017. pp. 21-27.
- Nievas E.B., Déniz-Suárez O., Garcia G., et al. Violence detection in video using computer vision techniques // Proceedings of the International conference on Computer analysis of images and patterns. Springer, Berlin, Heidelberg. 2011. pp. 332-339.
- Perez M., Kot A.C., Rocha A. Detection of real-world fights in surveillance videos // Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2019. pp. 2662-2666.
- Cheng M., Cai K., Li M. Rwf-2000: An open large scale video database for violence detection // Proceedings of the 25th International Conference on Pattern Recognition (ICPR). IEEE. 2021. pp. 4183-4190.
- Kumar R., Reganti A.N., Bhatia A., et al. Aggression-annotated corpus of hindi-english code-mixed data // arXiv preprint arXiv:1803.09402. 2018.
- Bozyiğit A., Utku S., Nasibov E. Cyberbullying detection: Utilizing social media features // Expert Systems with Applications. 2021. vol. 179. p. 115001.
- Gratch J., Artstein R., Lucas G., et al. The Distress Analysis Interview Corpus of Human and Computer Interviews // Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). Reykjavik, Iceland. 2014. pp. 3123-3128.
- Valstar M., Schuller B., Smith K., et al. AVEC 2013: the continuous audio/visual emotion and depression recognition challenge // Proceedings of the 3rd ACMinternational workshop on Audio/visual emotion challenge (AVEC’13). Association for Computing Machinery, New York, NY, USA. 2013. pp. 3–10.
- Yang Y., Fairbairn C., Cohn J. Detecting depression severity from vocal prosody // IEEE Transactions on Affective computing. 2013. vol. 4. no. 2. pp. 142–150.
- Alghowinem S., Goecke R., Wagner M., et al. From joyous to clinically depressed: Mood detection using spontaneous speech // Proceedings of FLAIRS Conference, G.M. Youngblood and P.M. McCarthy, Eds. AAAI Press. 2012. pp. 141–146.
- Huang Z., Epps J., Joachim D., et al. Depression detection from short utterances via diverse smartphones in natural environmental conditions // Proceedings of Interspeech. 2018. pp. 3393–3397.
- Ryumina E., Karpov A. Facial expression recognition using distance importance scores between facial landmarks // CEUR Workshop Proceedings. 2020, vol. 274. pp. 1-10.
- Axyonov, А., Ryumin, D., Kagirov, I. Method of Multi-Modal Video Analysis of Hand Movements for Automatic Recognition of Isolated Signs of Russian Sign Language // Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2021. vol. XLIV-2/W1-2021. pp. 7–13.
- He K., Zhang X., Ren S., et al. Deep residual learning for image recognition // Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. pp. 770–778.
- Simonyan K., Zisserman A. Very deep convolutional networks for largescale image recognition // Proceedings of the 3rd International Conference on Learning Representations (ICLR). 2015. pp. 1–14.
- Niu B., Gao Z., Guo B. Facial expression recognition with LBP and ORB features // Computational Intelligence and Neuroscience. 2021. vol. 2021.
- Verma S., Wang J., Ge Zh., et al. Deep-HOSeq: Deep higher order sequence fusion for multimodal sentiment analysis // Proceedings of IEEE International Conference on Data Mining (ICDM). IEEE. 2020. pp. 561-570.
- Eyben F., Weninger, F., Gross, F., et al. Recent developments in opensmile, the munich open-source multimedia feature extractor // Proceedings of ACM International Conference on Multimedia. 2013. pp. 835–838.
- Schuller B.W., Batliner A., Bergler C., et al. The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates // Proceedings of Interspeech. 2021. pp. 431–435.
- Eyben F., Scherer K.R., Schuller, B.W., et al. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing // IEEE transactions on affective computing. 2015. vol. 7. no. 2. pp. 190–202.
- Schmitt M., Ringeval F., Schuller B.W. At the border of acoustics and linguistics: Bag-of-Audio-Words for the recognition of emotions in speech // Proceedings of Interspeech. 2016. pp. 495–499.
- Kaya H., Karpov A.A., Salah A.A. Fisher vectors with cascaded normalization for paralinguistic analysis // Proceedings of Interspeech. 2015. pp. 909–913.
- Zhao Z., Zhao Y., Bao Z., et al. Deep spectrum feature representations for speech emotion recognition // Proceedings of Workshop on Affective Social Multimedia Computing and First Multi-Modal Affective Computing of Large-Scale Multimedia Data. 2018. pp. 27–33.
- Freitag M., Amiriparian S., Pugachevskiy S., et al. AuDeep: Unsupervised learning of representations from audio with deep recurrent neural networks // The Journal of Machine Learning Research. 2017. vol. 18. no. 1. pp. 6340–6344.
- Shor J., Jansen A., Maor R., et al. Towards Learning a Universal Non-Semantic Representation of Speech // Proceedings of Interspeech. 2020. pp. 140–144.
- Wagner J., Triantafyllopoulos A., Wierstorf H., et al. Dawn of the transformer era in speech emotion recognition: closing the valence gap // arXiv preprint arXiv:2203.07378. 2022. pp. 1-25.
- Degottex G., Kane J., Drugman T., et al. COVAREP – A collaborative voice analysis repository for speech technologies // Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014. pp. 960-964.
- Sogancioglu G., Verkholyak O., Kaya H., et al. Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition // Proceedings of Interspeech. 2020. pp. 2097-2101.
- Sebastian J., Pierucci P. Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts // Proceedings of Interspeech. 2019. pp. 51-55.
- Xu H., Zhang H., Han K., et al. Learning alignment for multimodal emotion recognition from speech // arXiv preprint arXiv:1909.05645. 2019.
- Dellaert F., Polzin T., Waibel A. Recognizing emotion in speech // Proceedings of the 4th Int. Conf. Spoken Lang. Process (ICSLP). 1996. pp. 1970–1973.
- Neiberg D., Elenius K., Laskowski K. Emotion recognition in spontaneous speech using GMMs // Proceedings of the 9th Int. Conf. Spoken Lang. Process. 2006. pp. 809–812.
- Nogueiras A., Moreno A., Bonafonte A., et al. Speech emotion recognition using hidden Markov models // Proceedings of the 7th Eur. Conf. Speech Commun. Technol. 2001. pp. 746–749.
- Raudys Š. On the universality of the single-layer perceptron model // Neural Networks and Soft Computing. Physica, Heidelberg. 2003. pp. 79-86.
- Wang J., Lu S., Wang S.-H., et al. A review on extreme learning machine // Multimedia Tools and Applications. 2021. pp. 1-50.
- Kruse R., Borgelt C., Klawonn F., et al. Multi-layer perceptrons // Computational Intelligence. Springer, Cham. 2022. pp. 53-124.
- Sainath T.N., Vinyals O., Senior A., et al. Convolutional, long short-term memory, fully connected deep neural networks // Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015. pp. 4580–4584.
- Kim J., Truong K.P., Englebienne G., et al. Learning spectro-temporal features with 3D CNNs for speech emotion recognition // Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction (ACII). 2017. pp. 383–388.
- Chao L., Tao J., Yang M., et al. Long short term memory recurrent neural network based multimodal dimensional emotion recognition // Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. 2015. pp. 65-72.
- Wang J., Xue M., Culhane R., et al. Speech emotion recognition with dual-sequence lstm architecture // Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020. pp. 6474–6478.
- Chen Q., Huang, G. A novel dual attention-based blstm with hybrid features in speech emotion recognition // Engineering Applications of Artificial Intelligence. 2021. vol. 102. p. 104277.
- Zhao J., Mao X., Chen L. Speech emotion recognition using deep 1d & 2d cnn lstm networks // Biomedical Signal Processing and Control. 2019. vol. 47. pp. 312–323.
- Milner R., Jalal M.A., Ng R.W., et al. A cross-corpus study on speech emotion recognition // Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2019. pp. 304-311.
- Vaswani A., Shazeer N., Parmar N., et al. Attention is all you need // In Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017). 2017. vol. 30. pp. 1-11.
- Ho N.H., Yang H.J., Kim S.H., et al. Multimodal approach of speech emotion recognition using multi-level multi-head fusion attention-based recurrent neural network // IEEE Access. 2020. vol. 8. pp.61672-61686.
- Hsu W.N., Bolte B., Tsai Y.-H.H., et al. Hubert: Self-supervised speech representation learning by masked prediction of hidden units // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021. vol. 29. pp. 3451–3460.
- Siriwardhana S., Reis A., Weerasekera R., et al. Jointly fine-tuning “bert-like” self-supervised models to improve multimodal speech emotion recognition // arXiv preprint arXiv:2008.06682. 2020. pp. 1-5.
- Kratzwald B., Ilic S., Kraus M., et al. Deep learning for affective computing: Text-based emotion recognition in decision support // Decision Support Systems. 2018. vol. 115. pp. 24–35.
- Stappen L., Baird A., Christ L., et al. The muse 2021 multimodal sentiment analysis challenge: Sentiment, emotion, physiological-emotion, and stress // Proceedings of the 29th ACM International Conference on Multimedia (ACM MM). 2021. pp. 5706–5707.
- Dresvyanskiy D., Ryumina E., Kaya H., et al. 2022. End-to-End Modeling and Transfer Learning for Audiovisual Emotion Recognition in-the-Wild // Multimodal Technologies and Interaction. vol 6. no. 2. pp. 11.
- Fedotov D., Kaya H., Karpov A. Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition: Challenges and Mixup // Proceedings of 20th International Conference on Speech and Computer (SPECOM-2018). 2018. pp. 155-165.
- Wu C.H., Lin J.C., Wei W.L. Survey on audiovisual emotion recognition: databases, features, and data fusion strategies // APSIPA Transactions on Signal and Information Processing. 2014. vol. 3. pp. 18.
- Al Osman H., Falk T.H. Multimodal affect recognition: Current approaches and challenges // Emotion and Attention Recognition Based on Biological Signals and Images. 2017. pp. 59-86.
- Liu D., Wang Z., Wang L., et al. Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning // Frontiers in Neurorobotics. 2021. pp. 13.
- Zhang C., Yang Z., He X., et al. Multimodal intelligence: Representation learning, information fusion, and applications // IEEE Journal of Selected Topics in Signal Processing. 2020. vol. 14. no. 3. pp. 478-493.
- Markitantov M., Ryumina E., Ryumin D., et al. Biometric Russian Audio-Visual Extended MASKS (BRAVE-MASKS) Corpus: Multimodal Mask Type Recognition Task. // Proceedings of Interspeech. 2022. pp. 1756-1760.
- Yang L., Sahli H., Xia X., et al. Hybrid depression classification and estimation from audio video and text information // Proceedings of 7th Annual Workshop on Audio/Visual Emotion Challenge (AVEC). 2017. pp. 45-51.
- Mai S., Hu H., Xing S. Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion // Proceedings of the AAAI Conference on Artificial Intelligence. 2020. vol. 34. no. 01. pp. 164-172.
- Ghosal D., Akhtar M.S., Chauhan D., et al. Contextual inter-modal attention for multi-modal sentiment analysis // Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2018. pp. 3454-3466.
- Akhtar M.S., Chauhan D.S., Ghosal D., et al. Multi-task learning for multi-modal emotion recognition and sentiment analysis // arXiv preprint arXiv:1905.05812. 2019. pp. 1-10.
- Sun Z., Sarma P, Sethares W., et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis // Proceedings of the AAAI Conference on Artificial Intelligence. 2020. vol. 34. no. 05. pp. 8992-8999.3.
- Mai S., Hu H., Xing S. Divide, conquer and combine: Hierarchical feature fusion network with local and global perspectives for multimodal affective computing // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. pp. 481-492.
- Chauhan D.S., Akhtar M.S., Ekbal A., et al. Context-aware interactive attention for multi-modal sentiment and emotion analysis // Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. pp. 5647-5657.
- Delbrouck J.B., Tits N., Brousmiche M., et al. A transformer-based joint-encoding for emotion recognition and sentiment analysis // arXiv preprint arXiv:2006.15955. 2020.
- Khare A., Parthasarathy S., Sundaram S. Self-Supervised learning with cross-modal transformers for emotion recognition // IEEE Spoken Language Technology Workshop (SLT). IEEE. 2021. pp. 381-388.
- Zaib S., Asif M., Arooj M. Development of Aggression Detection Technique in Social Media // International Journal of Information Technology and Computer Science. 2019. vol. 5. no. 8. pp. 40-46.
- Левоневский Д.К., Савельев А.И. Подход и архитектура для систематизации и выявления признаков агрессии в русскоязычном текстовом контенте // Вестник Томского государственного университета. Управление, вычислительная техника и информатика. 2021. № 54. С. 56-64.
- Sadiq S., Mehmood A., Ullah S., et al. Aggression detection through deep neural model on twitter // Future Generation Computer Systems. 2021. vol. 114. pp. 120-129.
- Tommasel A., Rodriguez J.M., Godoy D. Textual Aggression Detection through Deep Learning // TRAC@ COLING 2018. 2018. pp. 177-187.
- Mandl T., Modha S., Shahi G.K., et al. Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages // arXiv preprint arXiv:2112.09301. 2021.
- Potharaju Y., Kamsali M., Kesavari C.R. Classification of Ontological Violence Content Detection through Audio Features and Supervised Learning // International Journal of Intelligent Engineering and Systems. 2019. vol. 12. no. 3. pp. 20-30.
- Sahoo S., Routray A. Detecting aggression in voice using inverse filtered speech features // IEEE Transactions on Affective Computing. 2016. vol. 9. no. 2. pp. 217-226.
- Santos F., Durães D., Marcondes F.M., et al. In-car violence detection based on the audio signal // Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning. Springer, Cham. 2021. pp. 437-445.
- Liang Q., Li Y., Chen B., et al. Violence behavior recognition of two-cascade temporal shift module with attention mechanism // Journal of Electronic Imaging. 2021. vol. 30. no. 4. pp. 043009.
- Уздяев М.Ю. Нейросетевая модель многомодального распознавания человеческой агрессии // Вестник КРАУНЦ. Физико-математические науки. 2020. Т. 33. №. 4. С. 132-149.
- Yao Y., Papakostas M., Burzo M., et al. MUSER: MUltimodal Stress detection using Emotion Recognition as an Auxiliary Task // Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 2021. pp. 2714-2725.
- Sangwan S., Chauhan D.S., Akhtar M., et al. December. Multi-task gated contextual cross-modal attention framework for sentiment and emotion analysis // Proceedings of International Conference on Neural Information Processing. 2019. pp. 662-669.
- Kollias D., Zafeiriou S. Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface // arXiv preprint arXiv:1910.04855. 2019. pp. 1-15.
- Li Y., Zhao T., Kawahara T. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning // Proceedings of Interspeech. 2019. pp. 2803-2807.
- Yu W., Xu H., Yuan Z., et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis // Proceedings of the AAAI Conference on Artificial Intelligence. 2021. vol. 35. no. 12. pp. 10790-10797.
- Vu M.T., Beurton-Aimar M., Marchand S. Multitask multi-database emotion recognition // Proceedings of IEEE/CVF International Conference on Computer Vision. 2021. pp. 3637-3644.
- Velichko A., Markitantov M., Kaya H., et al. Complex Paralinguistic Analysis of Speech: Predicting Gender, Emotions and Deception in a Hierarchical Framework // Proceedings of Interspeech. 2022. pp. 4735-4739.
Supplementary files
