Chinese-English machine translation model based on transfer learning and self-attention

Shu  Ma

doi:10.6180/jase.202408_27(8).0015

Chinese-English machine translation model based on transfer learning and self-attention

Computer Science and Information Engineering

LSTM structure diagram

Shu Ma

Shenyang Normal University, No. 253 Huanghe North Street, Shenyang 110034, China

Received: October 17, 2023
Accepted: November 3, 2023
Publication Date: November 30, 2023

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202408_27(8).0015

With the continuous development of machine learning and neural networks, neural machine translation (NMT) has been widely used due to its strong translation ability. Lexical information is overused in the construction of the internal nodes that make up the structure. Using phrase structure encoders can lead to over-translation problems. In addition, the number of model parameters increases with the use of grammatical structures, and the phrase nodes may not always be beneficial to the neural translation model. Therefore, we propose a novel Chinese-English machine translation model based on transfer learning and self-attention. In order to make use of the position information between words, the absolute position information of words is represented by sine-cosine position encoding in the machine translation model based on self-attention mechanism. However, while this method can reflect relative distance, it lacks direction. In this paper, a new machine translation model is proposed by combining transfer learning with self-attention mechanism. This model not only inherits the high efficiency of self-attention mechanism, but also preserves the distance information and direction information between words. The results of translation experiments show that the proposed transfer learning model is significantly better than the traditional tree model.

Keywords: Chinese-English machine translation, transfer learning, self-attention, sine-cosine position encoding

[1] H. L. Trieu, D.-V. Tran, and M. Le Nguyen. “Investigating phrase-based and neural-based machine translation on low-resource settings”. In: Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation. 2017, 384–391.
[2] B. Marie and A. Fujita. “A smorgasbord of features to combine phrase-based and neural machine translation”. In: Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). 2018, 111–124.
[3] Y. Zhang, Y. He, and L. Zhang, (2023) “Recognition method of abnormal driving behavior using the bidirectional gated recurrent unit and convolutional neural network" Physica A: Statistical Mechanics and its Applications 609: 128317. DOI: 10.1016/j.physa.2022.128317.
[4] M. Akeel and R. Mishra, (2014) “ANN and rule based method for english to arabic machine translation." Int. Arab J. Inf. Technol. 11(4): 396–405.
[5] J. Zheng, Z. Zhao, M. Chen, J. Chen, C. Wu, Y. Chen, X. Shi, Y. Tong, et al., (2020) “An improved sign language translation model with explainable adaptations for processing long sign sentences" Computational Intelligence and Neuroscience 2020: DOI: 10.1155/2020/8816125.
[6] A. Jisi, S. Yin, et al., (2021) “A new feature fusion network for student behavior recognition in education" Journal of Applied Science and Engineering 24(2): 133–140. DOI: 10.6180/jase.202104_24(2).0002.
[7] S. Yin and H. Li, (2020) “Hot region selection based on selective search and modified fuzzy C-means in remote sensing images" IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13: 5862–5871. DOI: 10.1109/JSTARS.2020.3025582.
[8] K. Chen, R. Wang, M. Utiyama, and E. Sumita. “Recurrent positional embedding for neural machine translation”. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 1361–1367.
[9] K. J. Hayworth, M. D. Lescroart, and I. Biederman, (2011) “Neural encoding of relative position." Journal of Experimental Psychology: Human Perception and Performance 37(4): 1032. DOI: 10.1037/a0022338.
[10] F. Stahlberg, (2020) “Neural Machine Translation: A Review and Survey" Journal of Artificial Intelligence Research 69: 343–418.
[11] S. Wu, D. Zhang, Z. Zhang, N. Yang, M. Li, and M. Zhou, (2018) “Dependency-to-dependency neural machine translation" IEEE/ACM Transactions on Audio, Speech, and Language Processing 26(11): 2132–2141. DOI: 10.1109/TASLP.2018.2855968.
[12] P. N. Murthy and S. K. Y. Hanumanthaiah, (2022) “A simplified and novel technique to retrieve color images from hand-drawn sketch by human." International Journal of Electrical & Computer Engineering (2088- 8708) 12(6): DOI: 10.11591/ijece.v12i6.pp6140-6148.
[13] W. Shan, H. Lu, S. Wang, X. Zhang, and W. Gao. “Improving robustness and accuracy via relative information encoding in 3d human pose estimation”. In: Proceedings of the 29th ACM International Conference on Multimedia. 2021, 3446–3454. DOI: 10.1145/3474085.3475504.
[14] Y. Omote, A. Tamura, and T. Ninomiya. “Dependency-based relative positional encoding for transformer NMT”. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). 2019, 854–861. DOI: 10.26615/978-954-452-056-4_099.
[15] H. Sadr, M. M. Pedram, and M. Teshnehlab, (2019) “A robust sentiment analysis method based on sequential combination of convolutional and recursive neural networks" Neural processing letters 50: 2745–2761. DOI: 10.1007/s11063-019-10049-1.
[16] S. Hou and Y. Li, (2010) “Detecting nonlinearity from a continuous dynamic system based on the delay vector variance method and its application to gear fault identification" Nonlinear Dynamics 60: 141–148. DOI: 10.1007/s11071-009-9586-9.
[17] R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner, and W. Eckert, (2015) “Hipa cc: A domainspecific language and compiler for image processing" IEEE Transactions on Parallel and Distributed Systems 27(1): 210–224.
[18] W. Li, R. Huang, J. Li, Y. Liao, Z. Chen, G. He, R. Yan, and K. Gryllias, (2022) “A perspective survey on deep transfer learning for fault diagnosis in industrial scenarios: Theories, applications and challenges" Mechanical Systems and Signal Processing 167: 108487–. DOI: 10.1016/j.ymssp.2021.108487.
[19] X. Yu, J. Wang, Q.-Q. Hong, R. Teku, S.-H. Wang, and Y.-D. Zhang, (2022) “Transfer learning for medical images analyses: A survey" Neurocomputing 489: 230–254. DOI: 10.1016/j.neucom.2021.08.159.
[20] L. Zhang, L. Guo, H. Gao, D. Dong, G. Fu, and X. Hong, (2020) “Instance-based ensemble deep transfer learning network: A new intelligent degradation recognition method and its application on ball screw" Mechanical Systems and Signal Processing 140: 106681. DOI: 10.1016/j.ymssp.2020.106681.
[21] J. Li, L. Zhang, X. Shu, Y. Teng, and J. Xu, (2023) “Multi-instance learning based on spatial continuous category representation for case-level meningioma grading in MRI images" Applied Intelligence 53(12): 16015–16028. DOI: 10.1007/s10489-022-04114-x.
[22] K. Li, Y. Wang, J. Zhang, P. Gao, G. Song, Y. Liu, H. Li, and Y. Qiao, (2023) “Uniformer: Unifying convolution and self-attention for visual recognition. arXiv 2022" IEEE Transactions on Pattern Analysis and Machine Intelligence 45(10): 12581–12600.
[23] A. K. Yadav, A. Singh, M. Dhiman, Vineet, R. Kaundal, A. Verma, and D. Yadav, (2022) “Extractive text summarization using deep learning approach" International Journal of Information Technology 14(5): 2407–2415. DOI: 10.1007/s41870-022-00863-7.
[24] J. Liang, M. Du, et al., (2022) “Two-way neural network chinese-english machine translation model fused with attention mechanism" Scientific Programming 2022: DOI: 10.1155/2022/1270700.
[25] L. Yonglan, H. Wenjia, et al., (2022) “English-Chinese machine translation model based on bidirectional neural network with attention mechanism" Journal of Sensors 2022: DOI: 10.1155/2022/5199248.
[26] Z. Wang, X. Liu, and M. Zhang, (2022) “Breaking the representation bottleneck of chinese characters: Neural machine translation with stroke sequence modeling" arXiv preprint arXiv:2211.12781: