Yueping WangThis email address is being protected from spambots. You need JavaScript enabled to view it.

Basic Department, Zhengzhou University of Science and Technology, Zhengzhou China


 

Received: November 30, 2025
Accepted: January 18, 2026
Publication Date: February 4, 2026

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.


Download Citation: ||https://doi.org/10.6180/jase.202608_31.014  


To address the issues of single representation and insufficient utilization of structure in existing framework recognition methods under complex contexts, a novel end-to-end recognition framework driven by dual-channel representation fusion and structural information collaboration is proposed. This method first builds a visual semantic dual-channel encoder: the visual channel simultaneously extracts the spatial position, shape, and scale features of framework elements through object detection and instance segmentation. The semantic channel captures context word-level and sentence-level semantic embeddings using a pre-trained language model and enhances the understanding of abstract roles and implicit relationships through an attention mechanism. To avoid the differences between heterogeneous modalities, a cross-modal gated fusion module is designed to adaptively calibrate the weights of the two channels, achieving complementary enhancement. Secondly, a structure-aware graph convolutional network is introduced, modeling candidate elements and context words as nodes, and constructing edges based on dependency syntax, co-occurrence statistics, and common sense associations. It iteratively propagates topological priors, suppresses redundant nodes, and highlights key paths. Finally, the fused features are simultaneously output as framework categories and element roles through a lightweight decoder. The entire network can be trained end-to-end without the need for manual templates. Experiments show that this method achieves significant improvements on multiple datasets in fields such as event extraction and script understanding, verifying the effectiveness of dual-channel fusion and structural information collaboration, and demonstrating good interpretability and scalability.


Keywords: Framework recognition, dual-channel representation fusion, structural information, graph convolutional network


  1. [1] G. Zhang, Y. Liang, K. Tian, J. Yi, H. Alsolai, M. Liu, and X. Hu, (2025) “Leveraging Spatial-Temporal Illumination Features and Convolution-Transformer Hybrid Networks for Deepfake Video Detection" IEEE Transactions on Consumer Electronics 71(4): 12479–12489. DOI: 10.1109/TCE.2025.3624764.
  2. [2] E.Namaziandost,F.Çelik,andV.Duran,(2025)“Feed back valence and framing in AI-mediated EFL learning: A quantum-inspired analysis of their effects on goal orientation, motivational affect, and task persistence through achievement goal theory" Learning and Motivation 92: 102200. DOI: 10.1016/j.lmot.2025.102200.
  3. [3] R. Jackendoff and J. Audring, (2020) “Relational morphology: A cousin of construction grammar" Frontiers in Psychology 11: 2241. DOI: 10.3389/fpsyg.2020.02241.
  4. [4] S. Yin, L. Wang, T. Chen, H. Huang, J. Gao, J. Zhang, M. Liu, P. Li, and C. Xu, (2025) “LKAFormer: A Lightweight Kolmogorov-Arnold Transformer Model for Image Semantic Segmentation" ACM Transactions on Intelligent Systems and Technology: DOI: 10.1145/3759254.
  5. [5] T. Botschen, H. Mousselly-Sergieh, and I. Gurevych. “Prediction of frame-to-frame relations in the Frame Net hierarchy with frame embeddings”. In: Proceedings of the 2nd Workshop on Representation Learn ing for NLP. 2017, 146–156. DOI: 10.18653/v1/W17-2618.
  6. [6] S. Thater, H. Fürstenau, and M. Pinkal. “Word mean ing in context: A simple and effective vector model”. In: Proceedings of 5th International Joint Conference on Natural Language Processing. 2011, 1134–1143. DOI: none.
  7. [7] A. L. Maas, P. Qi, Z. Xie, A. Y. Hannun, C. T. Lengerich, D. Jurafsky, and A. Y. Ng, (2017) “Building DNNacoustic models for large vocabulary speech recognition" Computer Speech & Language 41: 195–213. DOI: 10.1016/j.csl.2016.06.007.
  8. [8] N.Andriyanov, (2022) “Combining text and image analysis methods for solving multimodal classification problems" Pattern Recognition and Image Analysis 32(3): 489–494. DOI: 10.1134/S1054661822030026.
  9. [9] X. Su, R. Li, X. Li, and Z. Yan, (2025) “EFSP-TE: End to-End Frame-Semantic Parsing with Table Encoder" Tsinghua Science and Technology 30(4): 1474–1495. DOI: 10.26599/TST.2024.9010036.
  10. [10] X. Cai and W. Li, (2011) “Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization" Journal of the American Society for Information Science and Technology 62(10): 2067–2082. DOI: 10.1002/asi.21593.
  11. [11] Z. Yu, H. Li, and J. Feng, (2024) “Enhancing text classification with attention matrices based on BERT" Expert Systems 41(3): e13512. DOI: 10.1111/exsy.13512.
  12. [12] E. Clementini, (2019) “A conceptual framework for modelling spatial relations" Information Technology and Control 48(1): 5–17. DOI: 10.5755/j01.itc.48.1.22246.
  13. [13] S. Yin, H. Li, A. A. Laghari, L. Teng, T. R. Gadekallu, and A. Almadhor, (2024) “FLSN-MVO: edge computing and privacy protection based on federated learning Siamese network with multi-verse optimization algorithm for industry 5.0" IEEE Open Journal of the Communications Society 6: 443–3458. DOI: 10.1109/OJCOMS. 2024.3520562.
  14. [14] X. Zhu, L. Zhu, J. Guo, S. Liang, and S. Dietze, (2021) “GL-GCN: Global and local dependency guided graph convolutional networks for aspect-based sentiment classification" Expert Systems with Applications 186: 115712. DOI: 10.1016/j.eswa.2021.115712.
  15. [15] Z. Jin, M. Tao, X. Zhao, and Y. Hu, (2022) “Social media sentiment analysis based on dependency graph and co-occurrence graph" Cognitive Computation 14(3): 1039–1054. DOI: 10.1007/s12559-022-10004-8.
  16. [16] F. Zhang, W. Zheng, and Y. Yang, (2024) “Graph convolutional network with syntactic dependency for aspect based sentiment analysis" International Journal of Computational Intelligence Systems 17(1): 37. DOI: 10.1007/s44196-024-00419-6.
  17. [17] P. Zhao, L. Hou, and O. Wu, (2020) “Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification" Knowledge-Based Systems 193: 105443. DOI: 10.1016/j.knosys.2019.105443.
  18. [18] J. Yu, L. Zhao, S. Yin, and M. Ivanovi´c, (2024) “News recommendation model based on encoder graph neural net work and bat optimization in online social multimedia art education" Computer Science and Information Systems 21(3): 989–1012. DOI: 10.2298/CSIS231225025Y.
  19. [19] R. Verma and R. Bhatt, (2025) “Hybrid DCN transformer framework with role-based access control (RBAC) policy for threats classification in cloud" International Journal of Information Technology: 1–8. DOI: 10.1007/s41870-025-02743-2.
  20. [20] M. Bayat and S. Kharel, (2025) “Leveraging Artificial Intelligence for Predictive Maintenance and Condition Rating of Off-System Bridges" Applied Sciences 15(21): DOI: 10.3390/app152111301.
  21. [21] J. Chen and P. Wang. “Efficient Nearest Neighbor Prompt based Learning for Few shot Ner in Manufacturing”. In: 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE. 2024, 1077 1082. DOI: 10.1109/SMC54092.2024.11169726.