Zhaoli Li1,2, Yuhao Zhang1,2This email address is being protected from spambots. You need JavaScript enabled to view it., Rui Su1,2, Chuanming Zhang1,2, and Xuezeng Wang1,2

1School of Construction Machinery, Shandong Jiaotong University, Jinan 250023, China

2Shandong Provincial Engineering Research Center for Transportation Construction Equipment Technology and Intelligent Construction, Jinan 250357, China


 

Received: August 22, 2025
Accepted: February 7, 2026
Publication Date: March 21, 2026

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.


Download Citation: ||https://doi.org/10.6180/jase.202608_31.066  


In human-robot collaborative assembly scenarios, the accurate understanding and execution of natural language instructions by collaborative robots rely on efficient instruction parsing technologies. When natural language instructions are lengthy or contain multiple sets of actions, parsing systems often struggle to extract complete action sequences. This paper proposes an action sequence extraction model for natural language instructions, termed BS-CasPNRel, based on the CasRel framework. The model introduces a feature enhancement module to enrich semantic representations and designs a feature fusion module to promote effective feature interaction between subject and object entities. In addition, it employs the GHM (Gradually Hardened Margin) loss to alleviate dataset imbalance. We construct a dataset tailored to human-robot collaborative assembly, and experimental results show that BS-CasPNRel achieves an F1 score of 82.2% on the self-built Chinese instruction dataset, outperforming several baseline models. Moreover, experiments on the public WebNLG dataset further demonstrate the model’s generalization capability. Finally, simulation experiments on robotic pick-and-place tasks in a Gazebo-based simulation environment built with the ROS middleware demonstrate the practical effectiveness and feasibility of the proposed approach.


Keywords: Natural language instructions; Action sequence; Instruction parsing; Human-machine collaboration


  1. [1] C.-H. Chu, Y. “ Zhang, P. Zheng, F. Ferrise, and Q. ( Chang, (2025) “Special Issue: Human–Robot Collabo ration in Industry 5.0" Journal of Computing and Information Science in Engineering 25(5): 050301. DOI: 10.1115/1.4068118.
  2. [2] J. E. Hopcroft, R. Motwani, and J. D. Ullman, (2001) “Introduction to Automata Theory, Languages, and Computation, 2nd Edition" ACM SIGACTNews32(1):60–. DOI: 10.1145/568438.568455.
  3. [3] M. Garcia and P. Gamallo. “A Rule-Based System for Cross-Lingual Parsing of Romance Languages with Universal Dependencies”. In: Proceedings of the CoNLL2017SharedTask:Multilingual Parsing from Raw Text to Universal Dependencies. Vancouver, Canada: Association for Computational Linguistics, 2017, 274 282. DOI: 10.18653/v1/K17-3029.
  4. [4] W. Imlah andJ. du Boulay, (1985) “Robust natural language parsing in computer-assisted language instruction" System 13(2): 137–147. DOI: 10.1016/0346-251X(85) 90017-X.
  5. [5] S. Gao, L. Kong, and P. Wu, (2015) “An Autonomous Processing Method of Chinese Service Instruction for In door Intelligent" Robot 37(4): 424–434. DOI: 10.13973/j.cnki.robot.2015.0424.
  6. [6] M. Mensio, E. Bastianelli, I. Tiddi, and G. Rizzo, (2018) “A Multi-layer LSTM-based Approach for Robot Command Interaction Modeling" CoRR abs/1811.05242: arXiv: 1811.05242.
  7. [7] P. H. Martins, L. Custódio, and R. Ventura, (2018) “A deep learning approach for understanding natural language commands for mobile service robots" CoRR abs/1807.03053: arXiv: 1807.03053.
  8. [8] R. Jia and P. Liang. “Data Recombination for Neural Semantic Parsing”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers). Ed. by K.ErkandN.A.Smith. Berlin, Germany: Association for Computational Lin guistics, 2016, 12–22. DOI: 10.18653/v1/P16-1002.
  9. [9] B. Yang andT. Mitchell. “A Joint Sequential and Relational Model for Frame-Semantic Parsing”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Ed. by M. Palmer, R. Hwa, and S. Riedel. Copenhagen, Denmark: Association for Computational Linguistics, 2017, 1247–1256. DOI: 10.18653/v1/D17-1128.
  10. [10] C. Cui, H. Tian, X. Zhou, and S. Jiang, (2024) “A Service Robot Instruction Parsing Method for Action Sequence Generation in Intelligent Space" Robot 46(1): 1 15. DOI: 10.13973/j.cnki.robot.2015.0424.
  11. [11] Y. Li, J. He, X. Zhou, Y. Zhang, and J. Baldridge. “Mapping Natural Language Instructions to Mobile UI Action Sequences”. In: Proceedings of the 58th An nual Meeting of the Association for Computational Lin guistics. Ed. by D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault. Online: Association for Computational Lin guistics, 2020, 8198–8210. DOI: 10.18653/v1/2020.acl main.729.
  12. [12] C. Tang, D. Huang, W. Ge, W. Liu, and H. Zhang, (2023) “GraspGPT: Leveraging Semantic Knowledge From a Large Language Model for Task-Oriented Grasping" IEEE Robotics and Automation Letters 8(11): 7551–7558. DOI: 10.1109/LRA.2023.3320012.
  13. [13] H. Zhou, Y. Lin, L. Yan, J. Zhu, andH.Min.“LLM-BT: Performing Robotic Adaptive Tasks based on Large Language Models and Behavior Trees”. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). 2024, 16655–16661. DOI: 10.1109/ICRA57147.2024.10610183.
  14. [14] W. Huang, C. Wang, Y. Li, R. Zhang, and L. Fei-Fei. ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation. 2024. arXiv: 2409. 01652 [cs.RO].
  15. [15] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wain wright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. Lowe, (2022) “Training Language Models to Follow Instructions with Human Feedback" CoRR abs/2203.02155: arXiv: 2203.02155.
  16. [16] H.Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample. LLaMA: Open and Efficient Foundation Language Models. 2023. arXiv: 2302.13971 [cs.CL].
  17. [17] I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, and A. Garg, (2023) “ProgPrompt: Generating Situated Robot Task Plans Using Large Language Models" Autonomous Robots 47: 999–1012. DOI: 10.1007/s10514-022-10216-8.
  18. [18] Z. Wei, J. Su, Y. Wang, Y. Tian, and Y. Chang. “A Novel Cascade Binary Tagging Framework for Relational Triple Extraction”. In: Proceedings of the 58th Annual Meeting of the Association for Computational Lin guistics. Online: Association for Computational Lin guistics, 2020, 1476–1488. DOI: 10.18653/v1/2020.aclmain.136.
  19. [19] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. RoBERTa: A Robustly Optimized BERT Pretraining Ap proach. 2019. arXiv: 1907.11692.
  20. [20] Q. Shi, Y. Zhang, Q. Yin, and T. Liu. “Logic-level Evidence Retrieval and Graph-based Verification Net work for Table-based Fact Verification”. In: Proceed ings of the 2021 Conference on Empirical Methods in Natural Language Processing. Ed. by M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021, 175–184. DOI: 10.18653/v1/2021.emnlp-main.16.
  21. [21] Y. Wang, B. Yu, Y. Zhang, T. Liu, H. Zhu, and L. Sun. “TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking”. In: Pro ceedings of the 28th International Conference on Computa tional Linguistics. Ed. by D. Scott, N. Bel, and C. Zong. Barcelona, Spain (Online): International Committee on Computational Linguistics, 2020, 1572–1582. DOI: 10.18653/v1/2020.coling-main.138.
  22. [22] W.Tang,B.Xu,Y.Zhao,Z.Mao,Y.Liu,Y.Liao,andH. Xie. “UniRel: Unified Representation and Interaction for Joint Relational Triple Extraction”. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Ed. by Y. Goldberg, Z. Kozareva, and Y. Zhang. Abu Dhabi, United Arab Emirates: As sociation for Computational Linguistics, 2022, 7087 7099. DOI: 10.18653/v1/2022.emnlp-main.477.
  23. [23] G. Bekoulis, J. Deleu, T. Demeester, and C. Develder, (2018) “Joint entity recognition and relation extraction as a multi-head selection problem" Expert Systems with Applications 114: 34–45. DOI: 10.1016/j.eswa.2018.07.032.
  24. [24] H. Zheng, R. Wen, X. Chen, Y. Yang, Y. Zhang, Z. Zhang, N. Zhang, B. Qin, X. Ming, and Y. Zheng. “PRGC: Potential Relation and Global Correspondence Based Joint Relational Triple Extraction”. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Vol ume 1: Long Papers). Ed. by C. Zong, F. Xia, W. Li, and R. Navigli. Online: Association for Computational Linguistics, 2021, 6225–6235. DOI: 10.18653/v1/2021.acl-long.486.
  25. [25] S. Zheng, F. Wang, H. Bao, Y. Hao, P. Zhou, andB. Xu. “Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Ed. by R. Barzilay and M.-Y. Kan. Vancouver, Canada: Association for Computational Linguistics, 2017, 1227–1236. DOI: 10. 18653/v1/P17-1113.
  26. [26] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng. “ROS: An Open Source Robot Operating System”. In: Proceedings of the Open-Source Software Workshop of the IEEE Inter national Conference on Robotics and Automation. Kobe, Japan, 2009.
  27. [27] S. Chitta. “MoveIt!: An Introduction”. In: Robot Operating System (ROS): The Complete Reference (Volume 1). Ed. by A.Koubaa. Cham: Springer International Pub lishing, 2016, 3–27. DOI: 10.1007/978-3-319-26054 9_1.
  28. [28] N. Koenig and A. Howard. “Design and use paradigms for Gazebo, an open-source multi-robot simulator”. In: 2004 IEEE/RSJ International Confer ence on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566). 3. 2004, 2149–2154 vol.3. DOI: 10.1109/IROS.2004.1389727.