A Novel Long Short-Term Memory Network Model For Multimodal Music Emotion Analysis In Affective Computing

Wenwen Chen

doi:10.6180/jase.202303_26(3).0008

A Novel Long Short-Term Memory Network Model For Multimodal Music Emotion Analysis In Affective Computing

Computer Science and Information Engineering

Wenwen Chen This email address is being protected from spambots. You need JavaScript enabled to view it.¹

¹The Music College of JiMei University (JMU) 21 Yindou Road, Jimei District, Xiamen City, Fujian Province, 361021 China

Received: April 11, 2022
Accepted: May 14, 2022
Publication Date: June 11, 2022

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202303_26(3).0008

ABSTRACT

The emotion recognition of medium audio/video in affective computing has important application value for deep cognition in human-computer interaction (HCI)/brain-computer interaction (BCI) and other fields. Especially in the modern distance education, music emotion analysis can be used as one of the important techniques for real-time evaluation of teaching process. In complex dance scenes, the accuracy of music emotion analysis with traditional methods is not high. Therefore, this paper proposes a novel long short-term memory (LSTM) network model for multimodal music emotion analysis in affective computing. Dual-channel LSTM is used to simulate human auditory and visual processing pathways respectively to process the emotional information of music and facial expressions. Then, we train and test the model on an open bi-modal music dataset. Based on the LSTM model, the analytic hierarchy process (AHP) is introduced to fuse weighted feature at decision level. Finally, experiments show that the proposed method can effectively improve the recognition rate, and save a lot of training time.

Keywords: Music emotion analysis, human-computer interaction, LSTM, analytic hierarchy process, affective computing

REFERENCES

[1] P. Stegman, C. Crawford, M. Andujar, A. Nijholt, and J. Gilbert, (2020) “Brain-computer interface software: A review and discussion" IEEE Transactions on Human-Machine Systems 50(2): 101–115. DOI: 10.1109/THMS.2020.2968411.
[2] H. Kuai, N. Zhong, J. Chen, Y. Yang, X. Zhang, P. Liang, K. Imamura, L. Ma, and H. Wang, (2021) “Multi-source brain computing with systematic fusion for smart health" Information Fusion 75: 150–167. DOI: 10.1016/j.inffus.2021.03.009.
[3] N. Yusupova, D. Bogdanova, N. Komendantova, and H. Hassani, (2021) “Extracting Information on Affective Computing Research from Data Analysis of Known Digital Platforms: Research into Emotional Artificial Intelligence" Digital 1(3): 162–172.
[4] J. Han, Z. Zhang, M. Pantic, and B. Schuller, (2021) “Internet of emotional people: Towards continual affective computing cross cultures via audiovisual signals" Future Generation Computer Systems 114: 294–306. DOI: 10.1016/j.future.2020.08.002.
[5] H. He, (2018) “Research on prediction of internet public opinion based on grey system theory and fuzzy neural network" Journal of Intelligent and Fuzzy Systems 35(1): 325–332. DOI: 10.3233/JIFS-169591.
[6] Z. Long, X. Zhang, L. Zhang, G. Qin, S. Huang, D. Song, H. Shao, and G. Wu, (2021) “Motor fault diagnosis using attention mechanism and improved adaboost driven by multi-sensor information" Measurement: Journal of the International Measurement Confederation 170: DOI: 10.1016/j.measurement.2020.108718.
[7] K. Zhang and S. Sun, (2013) “Web music emotion recognition based on higher effective gene expression programming" Neurocomputing 105: 100–106. DOI: 10.1016/j.neucom.2012.06.041.
[8] R. Jeen Retna Kumar, M. Sundaram, N. Arumugam, and V. Kavitha, (2021) “Face feature extraction for emotion recognition using statistical parameters from subband selective multilevel stationary biorthogonal wavelet transform" Soft Computing 25(7): 5483–5501. DOI: 10.1007/s00500-020-05550-y.
[9] J. Wathan, A. Burrows, B. Waller, and K. McComb, (2015) “EquiFACS: The equine facial action coding system" PLoS ONE 10(8): DOI: 10.1371/journal.pone.0131738.
[10] J. Hu, P. Yan, Y. Su, D. Wu, and H. Zhou, (2021) “A Method for Classification of Surface Defect on MetalWorkpieces Based on Twin Attention Mechanism Generative Adversarial Network" IEEE Sensors Journal 21(12): 13430–13441. DOI: 10.1109/JSEN.2021.3066603.
[11] H. Laiz, M. Klonz, E. Kessler, M. Kampik, and R. Lapuh, (2003) “Low-frequency ac-dc voltage transfer standards with new high-sensitivity and low-power-coefficient thin-film multijunction thermal converters" IEEE Transactions on Instrumentation and Measurement 52(2): 350–354. DOI: 10.1109/TIM.2003.810037.
[12] S. Haggag, S. Mohamed, A. Bhatti, H. Haggag, and S. Nahavandi. “Noise level classification for EEG using Hidden Markov Models”. In: cited By 2. 2015, 439–444. DOI: 10.1109/SYSOSE.2015.7151974.
[13] O. Chia Ai, M. Hariharan, S. Yaacob, and L. Sin Chee, (2012) “Classification of speech dysfluencies with MFCC and LPCC features" Expert Systems with Applications 39(2): 2157–2165. DOI: 10.1016/j.eswa.2011.07.065.
[14] Y.-L. Hsu, J.-S. Wang, W.-C. Chiang, and C.-H. Hung, (2020) “Automatic ECG-Based Emotion Recognition in Music Listening" IEEE Transactions on Affective Computing 11(1): 85–99. DOI: 10.1109/TAFFC.2017.2781732.
[15] C.Weng, B. Lu, and Q. Gu, (2022) “A multi-scale kernelbased network with improved attention mechanism for rotating machinery fault diagnosis under noisy environments" Measurement Science and Technology 33(5): DOI: 10.1088/1361-6501/ac4598.
[16] Y.Wang and L. Guan, (2008) “Recognizing human emotional state from audiovisual signals" IEEE Transactions on Multimedia 10(4): 659–668. DOI: 10.1109/TMM.2008.921734.
[17] Y. Chang, M. Vieira, M. Turk, and L. Velho, (2005) “Automatic 3D facial expression analysis in videos" Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3723 LNCS: 293–307. DOI: 10.1007/11564386_23.
[18] S. Zhang, S. Zhang, T. Huang,W. Gao, and Q. Tian, (2018) “Learning Affective Features with a Hybrid Deep Model for Audio-Visual Emotion Recognition" IEEE Transactions on Circuits and Systems for Video Technology 28(10): 3030–3043. DOI: 10.1109/TCSVT.2017.2719043.
[19] S. Sahoo and A. Routray. “Emotion recognition from audio-visual data using rule based decision level fusion”. In: cited By 18. 2017, 7–12. DOI: 10.1109/TechSym.2016.7872646.
[20] Y. Yao, G. Marcialis, M. Pontil, P. Frasconi, and F. Roli, (2003) “Combining flat and structured representations for fingerprint classification with recursive neural networks and support vector machines" Pattern Recognition 36(2): 397–406. DOI: 10.1016/S0031-3203(02)00039-0.
[21] A. Chashmi, V. Rahmati, B. Rezasoroush, M. Alamoti, M. Askari, and F. Khalili, (2021) “Predicting Customer Turnover Using Recursive Neural Networks"Wireless Communications and Mobile Computing 2021: DOI: 10.1155/2021/6623052.
[22] A. Jisi and S. Yin, (2021) “A new feature fusion network for student behavior recognition in education" Journal of Applied Science and Engineering (Taiwan) 24(2):133–140. DOI: 10.6180/jase.202104_24(2).0002.
[23] I. Huang, Y.-H. Lu, M. Shafiq, A. Laghari, and R. Yadav, (2021) “A Generative Adversarial Network Model Based on Intelligent Data Analytics for Music Emotion Recognition under IoT" Mobile Information Systems 2021: DOI: 10.1155/2021/3561829.
[24] M. Li, H. Xu, X. Liu, and S. Lu, (2018) “Emotion recognition from multichannel EEG signals using K-nearest neighbor classification" Technology and Health Care 26(S1): S509–S519. DOI: 10.3233/THC-174836.
[25] S. Khan and V. Ahmed. “Classification of pulmonary crackles and pleural friction rubs using MFCC statistical parameters”. In: cited By 3. 2016, 2437–2440. DOI: 10.1109/ICACCI.2016.7732422.
[26] B. Ghosh, (2021) “Spatial mapping of groundwater potential using data-driven evidential belief function, knowledge-based analytic hierarchy process and an ensemble approach" Environmental Earth Sciences 80(18): DOI: 10.1007/s12665-021-09921-y.
[27] W. Geping, R. Haojie, Q. Cheng, F. Wei, Y. Gongjin, and W. Guangyu, (2021) “Assessment method for wind resistance resilience of power grid based on extension analytic hierarchy process" International Journal of Industrial and Systems Engineering 38(4): 416–431. DOI: 10.1504/IJISE.2021.116949.
[28] M. Zhang, X. Zhang, S. Guo, X. Xu, J. Chen, and W. Wang, (2021) “Urban micro-climate prediction through long short-term memory network with long-term monitoring for on-site building energy estimation" Sustainable Cities and Society 74: DOI: 10.1016/j.scs.2021.103227.
[29] N. Jean Effil and R. Rajeswari, (2022) “Wavelet scattering transform and long short-term memory network-based noninvasive blood pressure estimation from photoplethysmograph signals" Signal, Image and Video Processing 16(1): DOI: 10.1007/s11760-021-01952-z.
[30] D. Wu, Y. Zhang, M. Ourak, K. Niu, J. Dankelman, and E. Poorten, (2021) “Hysteresis Modeling of Robotic Catheters Based on Long Short-Term Memory Network for Improved Environment Reconstruction" IEEE Robotics and Automation Letters 6(2): 2106–2113. DOI: 10.1109/LRA.2021.3061069.
[31] S. Yin and H. Li, (2020) “Hot Region Selection Based on Selective Search and Modified Fuzzy C-Means in Remote Sensing Images" IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13: 5862–5871. DOI: 10.1109/JSTARS.2020.3025582.
[32] J. Luo and X. Zhang, (2022) “Convolutional neural network based on attention mechanism and Bi-LSTM for bearing remaining life prediction" Applied Intelligence 52(1): 1076–1091. DOI: 10.1007/s10489-021-02503-2.
[33] S. Wang, J. Li, T. Cao, H. Wang, P. Tu, and Y. Li, (2020) “Dance Emotion Recognition Based on Laban Motion Analysis Using Convolutional Neural Network and Long Short-Term Memory" IEEE Access 8: 124928–124938. DOI: 10.1109/ACCESS.2020.3007956.
[34] J. Kacur, B. Puterka, J. Pavlovicova, and M. Oravec, (2021) “On the speech properties and feature extraction methods in speech emotion recognition" Sensors 21(5):1–27. DOI: 10.3390/s21051888.
[35] J. Luo, M. Wu, Z. Wang, Y. Chen, and Y. Yang, (2021) “Progressive low-rank subspace alignment based on semisupervised joint domain adaption for personalized emotion recognition" Neurocomputing 456: 312–326. DOI: 10.1016/j.neucom.2021.05.064.
[36] Z. Peng, J. Dang, M. Unoki, and M. Akagi, (2021) “Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech" Neural Networks 140: 261–273. DOI: 10.1016/j.neunet.2021.03.027.

ABSTRACT

REFERENCES

Latest Articles