Multi-modal sentiment analysis based on Transformer and spatial transformation network in education management application

Chunlin  Yuan

doi:10.6180/jase.202503_28(3).0011

Multi-modal sentiment analysis based on Transformer and spatial transformation network in education management application

Civil Engineering

Proposed model network

Chunlin Yuan

School of Civil Engineering and Architecture, Zhengzhou University of Science and Technology, Zhengzhou 450064 China

Received: March 1, 2024
Accepted: March 25, 2024
Publication Date: May 22, 2024

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202503_28(3).0011

In order to solve the problems in multi-modal sentiment analysis, such as small amount of labeled data, insufficient inter-modal fusion and information redundancy, this paper proposes a novel multi-modal sentiment analysis method based on Transformer and spatial transformation network. The proposed method first uses the spatial transformation network (STN) to learn the location information of the target in the image to help extract important local features. Second, it uses the Transformer-based interaction network to model the relationship between aspects, text and images to achieve multi-modal interaction. At the same time, the similar information between different modal features is supplemented and the multi-feature information is fused by multi-head attention mechanism to represent the multi-modal information. Finally, the result of emotion classification is obtained through Softmax layer. The proposed model is compared with several other advanced methods on the open data set CH-SIMS. The experimental results show that the proposed method improves the binary classification accuracy, triple classification accuracy and F1 value by 2.31%, 2.25% and 1.57%, respectively.

Keywords: multi-modal sentiment analysis; spatial transformation network; Transformer; education management application

[1] Y. Wang, (2021) “Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion" ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17(1s): 1–25.
[2] S. Amal, L. Safarnejad, J. A. Omiye, I. Ghanzouri, J. H. Cabot, and E. G. Ross, (2022) “Use of multi-modal data and machine learning to improve cardiovascular disease care" Frontiers in cardiovascular medicine 9: 840262.
[3] S. Yin, H. Li, A. A. Laghari, T. R. Gadekallu, G. A. Sampedro, and A. Almadhor, (2024) “An Anomaly Detection Model Based On Deep Auto-Encoder and Capsule Graph Convolution via Sparrow Search Algorithm in 6G Internet-of-Everything" IEEE Internet of Things Journal:
[4] W. Yu, H. Xu, Z. Yuan, and J. Wu. “Learning modalityspecific representations with self-supervised multitask learning for multimodal sentiment analysis”. In: Proceedings of the AAAI conference on artificial intelligence. 35. 12. 2021, 10790–10797.
[5] F. M. Plaza-Del-Arco, M. D. Molina-González, L. A. Ureña-López, and M. T. Martín-Valdivia, (2021) “A multi-task learning approach to hate speech detection leveraging sentiment analysis" IEEE Access 9: 112478–112489.
[6] T. Pradhan, P. Kumar, and S. Pal, (2021) “CLAVER: An integrated framework of convolutional layer, bidirectional LSTM with attention mechanism based scholarly venue recommendation" Information Sciences 559: 212–235.
[7] M. Geetha and D. K. Renuka, (2021) “Improving the performance of aspect based sentiment analysis using finetuned Bert Base Uncased model" International Journal of Intelligent Networks 2: 64–69.
[8] B. Bahmei, E. Birmingham, and S. Arzanpour, (2022) “CNN-RNN and data augmentation using deep convolutional generative adversarial network for environmental sound classification" IEEE Signal Processing Letters 29: 682–686.
[9] H. Deng, D. Ergu, F. Liu, Y. Cai, and B. Ma, (2022) “Text sentiment analysis of fusion model based on attention mechanism" Procedia Computer Science 199: 741–748.
[10] C. Molnar, G. König, B. Bischl, and G. Casalicchio, (2023) “Model-agnostic feature importance and effects with dependent features: a conditional subgroup approach" Data Mining and Knowledge Discovery: 1–39.
[11] A. Dutta, S. Biswas, and A. K. Das, (2024) “EmoComicNet: A multi-task model for comic emotion recognition" Pattern Recognition 150: 110261.
[12] A. A. Chandio, M. Asikuzzaman, M. R. Pickering, and M. Leghari, (2022) “Cursive text recognition in natural scene images using deep convolutional recurrent neural network" IEEE Access 10: 10062–10078.
[13] L. Yang, J.-C. Na, and J. Yu, (2022) “Cross-modal multitask transformer for end-to-end multimodal aspect-based sentiment analysis" Information Processing & Management 59(5): 103038.
[14] G. Ren, L. Diao, F. Guo, and T. Hong, (2024) “A co-attention based multi-modal fusion network for review helpfulness prediction" Information Processing & Management 61(1): 103573.
[15] X. Yan, H. Xue, S. Jiang, and Z. Liu, (2022) “Multimodal sentiment analysis using multi-tensor fusion network with cross-modal modeling" Applied Artificial Intelligence 36(1): 2000688.
[16] D. Xu, X. Qin, X. Dong, and X. Cui, (2023) “Emotion recognition of EEG signals based on variational mode decomposition and weighted cascade forest" Math. Biosci. Eng 20: 2566–2587.
[17] Z. Yin, Y. Du, Y. Liu, and Y. Wang, (2024) “Multi-layer cross-modality attention fusion network for multimodal sentiment analysis" Multimedia Tools and Applications: 1–17.
[18] N. S. Shaik and T. K. Cherukuri, (2022) “Multi-level attention network: application to brain tumor classification" Signal, Image and Video Processing 16(3): 817–824.
[19] D. Jiang, H. Li, and S. Yin, (2020) “Speech emotion recognition method based on improved long short-term memory networks" International Journal of Electronics and Information Engineering 12(4): 147–154.
[20] Y. Ding, M. Jia, Q. Miao, and Y. Cao, (2022) “A novel time–frequency Transformer based on self–attention mechanism and its application in fault diagnosis of rolling bearings" Mechanical Systems and Signal Processing 168: 108616.
[21] Y. Zhao, H. Zhou, H. Cheng, and C. Huang, (2023) “Cross-modal pedestrian re-recognition based on attention mechanism" The Visual Computer: 1–14.
[22] A. S. Gaafar, J. M. Dahr, and A. K. Hamoud, (2022) “Comparative analysis of performance of deep learning classification approach based on LSTM-RNN for textual and image datasets" Informatica 46(5):
[23] S. Yin, (2023) “Object Detection Based on Deep Learning: A Brief Review" IJLAI Transactions on Science and Engineering 1(02): 1–6.
[24] H. Xu. “Multimodal Sentiment Analysis Data Sets and Preprocessing”. In: Multi-Modal Sentiment Analysis. Springer, 2023, 23–52.
[25] W. Han, H. Chen, and S. Poria, (2021) “Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis" arXiv preprint arXiv:2109.00412:
[26] W. Yu, H. Xu, Z. Yuan, and J. Wu. “Learning modalityspecific representations with self-supervised multitask learning for multimodal sentiment analysis”. In: Proceedings of the AAAI conference on artificial intelligence. 35. 12. 2021, 10790–10797.
[27] B. Zhou and X. Li, (2023) “Multimodal Emotion Analysis Model based on Interactive Attention Mechanism" Frontiers in Computing and Intelligent Systems 3(2): 67–73.