A lightweight model for multi-traffic object detection based on deep learning under complex traffic conditions

Guoqiang Chen; Yanan Cheng

doi:10.6180/jase.202206_25(3).0019

A lightweight model for multi-traffic object detection based on deep learning under complex traffic conditions

Mechanical Engineering

Ghost module

Guoqiang Chen ¹ and Yanan Cheng¹

¹School of Mechanical and Power Engineering, Henan Polytechnic University, 2001 Century Avenue, Jiaozuo, Henan, China

Received: July 18, 2021
Accepted: September 17, 2021
Publication Date: October 25, 2021

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202206_25(3).0019

ABSTRACT

The object detection is extremely important in autonomous driving environment awareness. Besides vehicle and pedestrian detection, traffic signs and lights are important objects. The paper presents how to achieve precise results in multi-traffic object detection while minimizing the model size. A deep learning network YOLOv5s-Ghost-SE-DW is proposed based on the YOLOv5s. The proposed network can detect all traffic objects including traffic signs and lights. First, the convolution layer is replaced by Ghost module to reduce the parameter and model size. Second, in order to improve accuracy and real-time performance, the attention mechanism SELayer is embedded to fuse more spatial features. Third, the DW convolution is used to extract features and further reduce the parameter number. The effect of different modules on the whole network is verified by ablation experiments. The YOLOv5s-Ghost-SE-DW yields a model size of 5.22MB while achieving 15.58 FPS real-time performance on CPU. The FPS increases by 27.5%.

Keywords: ghost module; attention mechanism; DW convolution; real-time object detection; lightweight network; complex traffic conditions

REFERENCES

[1] S. K. Mishra and S. Das. “A Review on Vision Based Control of Autonomous Vehicles Using Artificial Intelligence Techniques”. In: 2019 International Conference on Information Technology (ICIT). IEEE. 2019, 500–504.
[2] T. Mekki, I. Jabri, A. Rachedi, and M. B. Jemaa, (2019) “Vehicular cloud networking: evolutionary game with reinforcement learning-based access approach" International Journal of Bio-Inspired Computation 13(1): 45–58.
[3] M. Kamalesh, B. Chokkalingam, J. Arumugam, G. Sengottaiyan, S. Subramani, M. A. Shah, et al., (2021) “An Intelligent Real Time Pothole Detection and Warning System for Automobile Applications Based on IoT Technology" Journal of Applied Science and Engineering 24(1): 77–81.
[4] C.-R. Dow, H.-H. Ngo, L.-H. Lee, P.-Y. Lai, K.-C. Wang, and V.-T. Bui, (2020) “A crosswalk pedestrian recognition system by using deep learning and zebracrossing recognition techniques" Software: Practice and Experience 50(5): 630–644.
[5] J. Hosang, R. Benenson, P. Dollár, and B. Schiele, (2015) “What makes for effective detection proposals?" IEEE transactions on pattern analysis and machine intelligence 38(4): 814–830.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, (2012) “Imagenet classification with deep convolutional neural networks" Advances in neural information processing systems 25: 1097–1105.
[7] K. Simonyan and A. Zisserman, (2014) “Very deep convolutional networks for large-scale image recognition" arXiv preprint arXiv:1409.1556:
[8] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, 770–778.
[9] S. Ren, K. He, R. Girshick, and J. Sun, (2016) “Faster R-CNN: towards real-time object detection with region proposal networks" IEEE transactions on pattern analysis and machine intelligence 39(6): 1137–1149.
[10] K.-H. Kim, S. Hong, B. Roh, Y. Cheon, and M. Park, (2016) “Pvanet: Deep but lightweight neural networks for real-time object detection" arXiv preprint arXiv:1608.08021:
[11] J. Dai, Y. Li, K. He, and J. Sun. “R-fcn: Object detection via region-based fully convolutional networks”. In: Advances in neural information processing systems. 2016, 379–387.
[12] S. Gidaris and N. Komodakis. “Object detection via a multi-region and semantic segmentation-aware CNN model”. In: Proceedings of the IEEE international conference on computer vision. 2015, 1134–1142.
[13] Z. Cai and N. Vasconcelos. “Cascade r-cnn: Delving into high quality object detection”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, 6154–6162.
[14] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis. “Soft-NMS-improving object detection with one line of code”. In: Proceedings of the IEEE international conference on computer vision. 2017, 5561–5569.
[15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. “Going deeper with convolutions”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, 1–9.
[16] T. Kong, A. Yao, Y. Chen, and F. Sun. “Hypernet: Towards accurate region proposal generation and joint object detection”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, 845–853.
[17] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. “You only look once: Unified, real-time object detection”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, 779–788.
[18] J. Redmon and A. Farhadi. “YOLO9000: better, faster, stronger”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, 7263–7271.
[19] J. Redmon and A. Farhadi, (2018) “Yolov3: An incremental improvement" arXiv preprint arXiv:1804.02767:
[20] A. Bochkovskiy, C.-Y.Wang, and H.-Y. M. Liao, (2020) “Yolov4: Optimal speed and accuracy of object detection" arXiv preprint arXiv:2004.10934:
[21] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. “Ssd: Single shot multibox detector”. In: European conference on computer vision. Springer. 2016, 21–37.
[22] J. Hu, L. Shen, and G. Sun. “Squeeze-and-excitation networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, 7132–7141.
[23] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q.Weinberger. “Densely connected convolutional networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, 4700–4708.
[24] A. Geiger, P. Lenz, and R. Urtasun. “Are we ready for autonomous driving? the kitti vision benchmark suite”. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE. 2012, 3354–3361.
[25] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. “The cityscapes dataset for semantic urban scene understanding”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016,3213–3223.
[26] F. Yu, H. Chen, X.Wang,W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell. “Bdd100k: A diverse driving dataset for heterogeneous multitask learning”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020, 2636–2645.