Yongzhi Min This email address is being protected from spambots. You need JavaScript enabled to view it.1, Jicheng Guo1, and Kun Yang2

1School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou730070, China
2Guangdong Province Railway Construction and Investment Group Co., LTD, Guangdong510665, China


Received: July 6, 2022
Accepted: August 12, 2022
Publication Date: September 14, 2022

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202306_26(6).0006  


Real-time detection of rail surface defects is an important part of the future railway automation detection. Because of rail surface defects show multi-scale, small difference between background and prospect, therefore, a improved YOLOX real-time defect detection algorithm is proposed. concretely: Firstly, by counting the size of the defects, we design a backbone feature network which is suitable for detecting the multi-scale defects. secondly, in feature fusion network, a multi-scale network of Feature Pyramid Network and Path Aggregation Network is designed to extract richer semantic information and achieve more accurate spatial information. Thirdly, 9 Convolution Block Attention Modules are added between backbone network and feature fusion network as well as in the up-sampling process of feature fusion network, the purpose is to assist model training and update weights of model. Besides, replacing the Spatial Pyramid Pooling Module in the original model with the Atrous Spatial Pyramid Pooling Model, which can speed up the model training and get better training effect. Finally, we modify the loss function—to train the model with Focal-Efficient Intersection Over Union function in the regression loss function, it can get more accurate regression boxes and solve the problem of positive and negative samples imbalance. Ultimately, the experiment shows that the resulting algorithm can achieve 96.1% accuracy and 49.74 fps value, which is a relatively accurate and more reasonable real-time rail surface defect detection algorithm.

Keywords: Rail surface defect; YOLOX; Real-time; Convolution Block Attention Module; Atrous Spatial Pyramid Pooling; Focal-Efficient Intersection Over Union


  1. [1] H. L. Luo and H. K. Chen., (2020) “A review of deep learning based object detection studies" Acta Electronica Sinica 48(6): 1230–1239. DOI: 10.3969/j.issn.0372-2112.2020.06.026.
  2. [2] J. L. Cao, Y. L. Li, and H. Q. Sun, (2022) “A review of visual object detection technology based on Deep Learning" Journal of Image and Graphics 27(6): 1697–1722. DOI: 10.11834/jig.220069.
  3. [3] S. Ren, K. He, R. Girshick, and J. Sun, (2015) “Faster r-cnn: Towards real-time object detection with region proposal networks" Advances in neural information processing systems 28: DOI: 10.1109/TPAMI.2016.2577031.
  4. [4] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. “Ssd: Single shot multibox detector”. In: European conference on computer vision. Springer. 2016, 21–37. DOI: 10.1007/978-3-319-46448-0_2.
  5. [5] J. Redmon and A. Farhadi, (2018) “Yolov3: An incremental improvement" arXiv preprint arXiv:1804.02767: DOI: 10.48550/arXiv.1804.02767.
  6. [6] A. Bochkovskiy, C.-Y.Wang, and H.-Y. M. Liao, (2020) “Yolov4: Optimal speed and accuracy of object detection" arXiv preprint arXiv:2004.10934: DOI: 10.48550/arXiv.2004.10934.
  7. [7] X. Chen, C. Song, and J. G. Shi, (2021) “A review of general objective detection research based on deep learning" Acta Electronica Sinica 49(7): 1428–1438. DOI:10.12263/DZXB.20200570.
  8. [8] Y. Min and Y. Li, (2022) “Self-Supervised Railway Surface Defect Detection with Defect Removal Variational Autoencoders" Energies 15(10): 3592. DOI: 10.3390/en15103592.
  9. [9] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, (2021) “Yolox: Exceeding yolo series in 2021" arXiv preprint arXiv:2107.08430: DOI: 10.48550/arXiv.2107.08430.
  10. [10] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh. “CSPNet: A new backbone that can enhance learning capability of CNN”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2020, 390–391. DOI:10.1109/CVPRW50498.2020.00203.
  11. [11] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. “Feature pyramid networks for object detection”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, 2117–2125. DOI: 10.1109/CVPR.2017.106.
  12. [12] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia. “Path aggregation network for instance segmentation”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, 8759–8768. DOI: 10.1109/CVPR.2018.00913.
  13. [13] M. Jiang and S. Yin, (2021) “Facial expression recognition based on convolutional block attention module and multi-feature fusion" Int. J. of Computational Vision and Robotics: DOI: 10.1504/IJCVR.2022.10044018.
  14. [14] L. Wang, Y. Shoulin, H. Alyami, A. A. Laghari, M. Rashid, J. Almotiri, H. J. Alyamani, and F. Alturise. A novel deep learning-based single shot multibox detector model for object detection in optical remote sensing images. 2022. DOI: 10.1002/gdj3.162.
  15. [15] Q. Shi, S. Yin, K. Wang, L. Teng, and H. Li, (2022) “Multichannel convolutional neural network-based fuzzy active contour model for medical image segmentation" Evolving Systems 13(4): 535–549. DOI: 10.1007/s12530-021-09392-3.
  16. [16] J. Hu, L. Shen, and G. Sun. “Squeeze-and-excitation networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, 7132–7141. DOI: 10.1109/TPAMI.2019.2913372.
  17. [17] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu. “ECA-Net: Efficient channel attention for deep convolutional neural networks”. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, WA, USA. 2020, 13–19. DOI:10.1109/CVPR42600.2020.01155.
  18. [18] Y. Liu, Z. Shao, and N. Hoffmann. “Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions”. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. DOI: 10.48550/arXiv.2112.05561.