Music Sentiment Analysis Based on Multi-Modal Intelligent Computing and Deep Learning

2026

2026-05-17

Lu Huang

JiLin Provincial Institute of Education, Changchun, Jilin 130022, China

Received: February 26, 2026
Accepted: April 4, 2026
Publication Date: May 17, 2026

Accuracy with the number of iterations

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: BibTeX | http://dx.doi.org/10.6180/jase.202609_32.039

Download PDF

The study of intelligent computing and deep learning has become a prominent research topic among both industrial and academic researchers in recent years. As a typical form of intelligent computing and deep learning, with ongoing advancements in affective computing, the close connection between deep learning, multi-modal information, and emotion has gradually garnered the attention of researchers. Existing methods still exhibit many shortcomings in the perception, understanding, and expression of machine emotions. A computational model of emotion that integrates emotion perception, information fusion, and deep learning is proposed. The model is a deep learning-oriented network perception model that accepts visual, auditory, and textual inputs to achieve an understanding of uncertain emotions. Experiments demonstrate that the model performs well in various multi-modal emotion computations. The studies presented in this paper provide important guidance for the application of both multi-modal intelligent computing and deep learning.

Keywords: Intelligent Computing; Deep Learning; Music Sentiment Analysis; Multi-modal Information

[1] D. Han, Y. Kong, J. Han, and G. Wang, (2022) “A survey of music emotion recognition” Frontiers of Computer Science 16: 1–11. DOI: 10.1007/s11704-021-0569-4.
[2] Y. Hu, (2022) “Music emotion research based on reinforcement learning and multimodal information” Journal of Mathematics 2022: 1–10. DOI: 10.1155/2022/2446399.
[3] L. M. Gómez and M. N. Cáceres. “Applying data mining for sentiment analysis in music”. In: International Conference on Practical Applications of Agents and Multi-Agent Systems. Cham: Springer, 2017, 198–205. DOI: 10.1007/978-3-319-61578-3_20.
[4] K. Napier and L. Shamir, (2018) “Quantitative sentiment analysis of lyrics in popular music” Journal of Popular Music Studies 30: 161–176. DOI: 10.1525/jpms.2018.300411.
[5] S. Shukla, P. Khanna, and K. K. Agrawal. “Review on sentiment analysis on music”. In: 2017 International Conference on Infocom Technologies and Unmanned Systems (ICTUS). IEEE, 2017, 777–780. DOI: 10.1109/ICTUS.2017.8286111.
[6] W. Chen, (2022) “A novel long short-term memory network model for multimodal music emotion analysis in affective computing” Journal of Applied Science and Engineering 26: 367–376. DOI: 10.6180/jase.202303_26(3).0008.
[7] R. Kaur and S. Kautish. “Multimodal sentiment analysis: A survey and comparison”. In: Research Anthology on Implementing Sentiment Analysis Across Multiple Disciplines. IGI Global, 2022, 1846–1870. DOI: 10.4018/978-1-6684-6303-1.ch098.
[8] D. Ghosal, M. S. Akhtar, D. Chauhan, S. Poria, A. Ekbal, and P. Bhattacharyya. “Contextual inter-modal attention for multi-modal sentiment analysis”. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 3454–3466. DOI: 10.18653/v1/D18-1382.
[9] B. R. Gudivaka, (2021) “Designing AI-assisted music teaching with big data analysis” Current Science Humanities 9: 1–14.
[10] J. Liu, P. Zhang, Y. Liu, W. Zhang, and J. Fang, (2021) “Summary of multi-modal sentiment analysis technology” Journal of Frontiers of Computer Science and Technology 15: 1165. DOI: 10.3778/j.issn.1673-9418.2012075.
[11] H. Wen, S. You, and Y. Fu, (2021) “Cross-modal context-gated convolution for multi-modal sentiment analysis” Pattern Recognition Letters 146: 252–259. DOI: 10.1016/j.patrec.2021.03.025.
[12] A. S. Alqarafi, A. Adeel, M. Gogate, K. Dashitpour, A. Hussain, and T. Durrani. “Towards Arabic multi-modal sentiment analysis”. In: International Conference on Communications, Signal Processing, and Systems. Singapore: Springer, 2017, 2378–2386. DOI: 10.1007/978-981-10-6571-2_290.
[13] I. Chaturvedi, E. Cambria, R. E. Welsch, and F. Herrera, (2018) “Distinguishing between facts and opinions for sentiment analysis: Survey and challenges” Information Fusion 44: 65–77. DOI: 10.1016/j.inffus.2017.12.006.
[14] J. Wu, T. Zhu, X. Zheng, and C. Wang, (2022) “Multi-modal sentiment analysis based on interactive attention mechanism” Applied Sciences 12: 8174. DOI: 10.3390/app12168174.
[15] M. G. Huddar, S. S. Sannakki, and V. S. Rajpurohit, (2021) “Attention-based multi-modal sentiment analysis and emotion detection in conversation using RNN” International Journal of Interactive Multimedia and Artificial Intelligence. DOI: 10.9781/ijimai.2020.07.004.
[16] W. Yuzhu, X. Jun, C. Bo, and X. Xinying, (2021) “Multi-modal sentiment analysis based on cross-modal context-aware attention” Data Analysis and Knowledge Discovery 1: DOI: 10.11925/infotech.2096-3467.2020.1042.
[17] J. Zhang, Z. Yin, P. Chen, and S. Nichele, (2020) “Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review” Information Fusion 59: 103–126. DOI: 10.1016/j.inffus.2020.01.011.
[18] A. Kumar and J. Vepa. “Gated mechanism for attention based multi modal sentiment analysis”. In: ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2020, 4477–4481. DOI: 10.1109/ICASSP40776.2020.953012.
[19] D. H. Kim, M. K. Lee, D. Y. Choi, and B. C. Song. “Multi-modal emotion recognition using semi-supervised learning and multiple neural networks in the wild”. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 2017, 529–535. DOI: 10.1145/3136755.3143005.
[20] S. Latif, H. Cuayáhuitl, F. Pervez, F. Shamshad, H. S. Ali, and E. Cambria, (2022) “A survey on deep reinforcement learning for audio-based applications” Artificial Intelligence Review: 1–48. DOI: 10.1007/s10462-022-10224-2.
[21] M. Sivakumara and S. R. Uyyalab, (2022) “Aspect-based sentiment analysis of product reviews using multi-agent deep reinforcement learning” Asia Pacific Journal of Information Systems 32: 226–248. DOI: 10.14329/apjis.2022.32.2.226.
[22] S. J. Park, D. K. Chae, H. K. Bae, S. Park, and S. W. Kim. “Reinforcement learning over sentiment-augmented knowledge graphs towards accurate and explainable recommendation”. In: Proceedings of the 15th ACM International Conference on Web Search and Data Mining. ACM, 2022, 784–793. DOI: 10.1145/3488560.3498515.
[23] F. Nadeem. “Multi-modal reinforcement learning with videogame audio to learn sonic features”. (phdthesis). Massachusetts Institute of Technology, 2020.
[24] E. Acar, F. Hopfgartner, and S. Albayrak. “Fusion of learned multi-modal representations and dense trajectories for emotional analysis in videos”. In: 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI). IEEE, 2015, 1–6. DOI: 10.1109/CBMI.2015.7153603.
[25] B. Schuller, F. Weninger, and J. Dorfner. “Multi-modal non-prototypical music mood analysis in continuous space: reliability and performances”. In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR). 2011.