DDC-Net: Semantic Segmentation for Urban Roads Based on Improved Capsule Networks

Xianlei  Ge; Xiaobo  Shen; Yingxuan  Zhou

doi:10.6180/jase.202510_28(10).0009

DDC-Net: Semantic Segmentation for Urban Roads Based on Improved Capsule Networks

Electrical Engineering Computer Science and Information Engineering

Overall structure of the Localized Compressed Feature (LFCE) capture module. It contains a spatial branch (red), a semantic feature fusion branch (blue), and symbols representing element multiplication.

Xianlei Ge^1,2, Xiaobo Shen^1,3 , and Yingxuan Zhou¹

¹School of Electronic Engineering, Huainan Normal University, Huainan 232038, China

²College of Computing and Information Technologies, National University, Manila 1008, Philippines

³College of Industrial Education, Technological University of the Philippines, Manila 1000, Philippines

Received: January 6, 2024
Accepted: December 18, 2024
Publication Date: January 24, 2025

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202510_28(10).0009

In recent years, with the gradual progress of automatic driving technology, semantic segmentation of road scenes, as the core of this technology, has become a hot spot of research. However, nowadays, most of the convolutional (CNN)-based methods appear to be inefficient and costly due to the factors of large amount of detection data and complex structure. It limits their performance in dealing with some fast response (real-time) tasks. Addressing the above problems, this paper proposes a capsule network-based semantic segmentation method for road images, which achieves a good balance between recognition efficiency and detection speed. Specifically, the DDC-Net designed based on capsule network is used as the baseline network, and different connection paths are dynamically selected according to pixel affinity during forward propagation. In addition, DDC-S andDDC-Garedesigned for spatial detail fusion and semantic fusion, respectively, and the local feature extraction module (LFCE) is designed using a two-branch structure. Numerous experiments show that the method described in this paper outperforms most of the current CNN-based methods in terms of model size, recognition flexibility and overall performance. In ADE20K and Cityscapes test datasets, the method described in this paper achieves 74.5% and 79.4% mean intersection and merger ratio (mIoU) accuracies at 63.9fps and 64.8fps, and the experimental results demonstrate the effectiveness of our method.

Keywords: image semantic segmentation; deep learning; autonomous driving; road scene detection; fast response

[1] Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew, (2018) “A review of semantic segmentation using deep neural networks" International journal of multimedia in formation retrieval 7: 87–93. DOI: 10.1007/s13735-017-0141-z.
[2] Y. Xu, M.Li, L. Cui, S. Huang, F. Wei, and M. Zhou. “Layoutlm: Pre-training of text and layout for doc ument image understanding”. In: Proceedings of the 26th ACMSIGKDDinternational conference on knowl edge discovery & data mining. 2020, 1192–1200. DOI: 10.1145/3394486.3403172.
[3] H.Shao, L. Wang, R. Chen, H. Li, and Y. Liu. “Safety enhanced autonomous driving using interpretable sensor fusion transformer”. In: Conference on Robot Learning. PMLR. 2023, 726–737. DOI: 10.48550/arXiv.2207.14024.
[4] L. Hou, Y. Cheng, N. Shazeer, N. Parmar, Y. Li, P. Korfiatis, T. M. Drucker, D. J. Blezek, and X. Song. High Resolution Medical Image Analysis with Spatial Partitioning. 2019. DOI: 10.48550/arXiv.1909.03108. arXiv: 1909.03108 [eess.IV].
[5] Y. Yao, M. Xu, C. Choi, D. J. Crandall, E. M. Atkins, and B. Dariush. “Egocentric Vision-based Future Ve hicle Localization for Intelligent Driving Assistance Systems”. In: 2019 International Conference on Robotics and Automation (ICRA). 2019, 9711–9717. DOI: 10.1109/ICRA.2019.8794474.
[6] E. Shelhamer, J. Long, and T. Darrell, (2017) “Fully Convolutional Networks for Semantic Segmentation" IEEETransactionsonPatternAnalysisandMachine Intelligence 39(4): 640–651. DOI: 10.1109/TPAMI.2016.2572683.
[7] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu. “Dual Attention Network for Scene Segmentation”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. DOI: 10.1109/CVPR.2019.00326.
[8] H.Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. “Pyramid Scene Parsing Network”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. DOI: 10.1109/CVPR.2017.660.
[9] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, (2018) “DeepLab: Semantic Image Seg mentation with Deep Convolutional Nets, Atrous Convo lution, and Fully Connected CRFs" IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4): 834–848. DOI: 10.1109/TPAMI.2017.2699184.
[10] H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y. Iwamoto, X. Han, Y.-W. Chen, and J. Wu. “UNet 3+: A Full-Scale Connected UNet for Medical Im age Segmentation”. In: ICASSP 2020- 2020 IEEE In ternational Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020, 1055–1059. DOI: 10.1109/ICASSP40776.2020.9053405.
[11] H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia. “IC Net for Real-Time Semantic Segmentation on High Resolution Images”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. DOI: 10.48550/arXiv.1704.08545.
[12] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang. “BiSeNet: Bilateral Segmentation Network for Real time Semantic Segmentation”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. DOI: 10.1007/978-3-030-01261-8_20.
[13] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. 2016. DOI: 10.48550/arXiv.1606.02147. arXiv: 1606.02147 [cs.CV].
[14] X.Ding,C.Xia,X.Zhang,X.Chu,J.Han,andG.Ding. RepMLP: Re-parameterizing Convolutions into Fully connected Layers for Image Recognition. 2022. DOI: 10.48550/arXiv.2105.01883. arXiv: 2105.01883 [cs.CV].
[15] G. Bender, H. Liu, B. Chen, G. Chu, S. Cheng, P.-J. Kindermans, and Q. V. Le. “Can Weight Sharing Out perform Random Architecture Search? An Investiga tion With TuNAS”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. DOI: 10.1109/CVPR42600.2020.01433.
[16] Y. Chen, W. Li, and L. Van Gool. “ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. DOI: 10.1109/CVPR.2018.00823.
[17] J. Sun and Y. Li, (2021) “Multi-feature fusion network for road scene semantic segmentation" Computers & Electrical Engineering 92: 107155. DOI: 10.1016/j.compeleceng.2021.107155.
[18] H. Pan, Y. Hong, W. Sun, and Y. Jia, (2023) “Deep Dual-Resolution Networks for Real-Time and Accurate Semantic Segmentation of Traffic Scenes" IEEE Transac tions on Intelligent Transportation Systems 24(3): 3448–3460. DOI: 10.1109/TITS.2022.3228042.
[19] L. Sun, K. Yang, X. Hu, W. Hu, and K. Wang, (2020) “Real-Time Fusion Network for RGB-D Semantic Segmen tation Incorporating Unexpected Obstacle Detection for Road-Driving Images" IEEE Robotics and Automa tion Letters 5(4): 5558–5565. DOI: 10.1109/LRA.2020.3007457.
[20] M. Orsic, I. Kreso, P. Bevandic, and S. Segvic. “In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. DOI: 10.1109/CVPR.2019.01289.
[21] H.Zhang,C.Wu,Z.Zhang,Y. Zhu, H. Lin, Z. Zhang, Y. Sun, T. He, J. Mueller, R. Manmatha, M. Li, and A. Smola. “ResNeSt: Split-Attention Networks”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2022, 2736–2746. DOI: 10.1109/CVPRW56347.2022.00309.
[22] B.Cheng,M.D.Collins,Y.Zhu,T.Liu,T.S.Huang,H. Adam, andL.-C. Chen. “Panoptic-DeepLab: A Sim ple, Strong, and Fast Baseline for Bottom-Up Panop tic Segmentation”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. DOI: 10.1109/CVPR42600.2020.01249.
[23] S. Borse, Y. Wang, Y. Zhang, and F. Porikli. “Inverse Form: A Loss Function for Structured Boundary AwareSegmentation”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, 5901–5911. DOI: 10.1109/CVPR46437.2021.00584.
[24] Z. Chen, Y. Duan, W. Wang, J. He, T. Lu, J. Dai, and Y. Qiao. Vision Transformer Adapter for Dense Predictions. 2023. DOI: 10.48550/arXiv.2205.08534. arXiv: 2205. 08534 [cs.CV].
[25] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Al varez, and P. Luo. “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transform ers”. In: Advances in Neural Information Processing Sys tems. Ed. by M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan. 34. Curran Associates, Inc., 2021, 12077–12090. DOI: 10.48550/arXiv.2105.15203.
[26] D. Bashkirova, M. Abdelfattah, Z. Zhu, J. Akl, F. Al ladkani, P. Hu, V. Ablavsky, B. Calli, S. A. Bargal, and K. Saenko. “ZeroWaste Dataset: Towards Deformable Object Segmentation inCluttered Scenes”. In: Proceed ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022, 21147–21157. DOI: 10.1109/CVPR52688.2022.02047.
[27] S. Sabour, N. Frosst, and G. E. Hinton. “Dynamic Routing Between Capsules”. In: Advances in Neural Information Processing Systems. Ed. by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish wanathan,andR.Garnett.30.CurranAssociates,Inc., 2017. DOI: 10.48550/arXiv.1710.09829.
[28] V. Mazzia, F. Salvetti, and M. Chiaberge, (2021) “Efficient-capsnet: Capsule network with self-attention routing" Scientific reports 11(1): 14634. DOI: 10.1038/s41598-021-93977-0.
[29] J. Rajasegaran, V. Jayasundara, S. Jayasekara, H. Jayasekara, S. Seneviratne, and R. Rodrigo. “Deep Caps: Going Deeper With Capsule Networks”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. DOI: 10.1109/CVPR.2019.01098.
[30] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 2017. DOI: 10.48550/arXiv.1704.04861. arXiv: 1704.04861 [cs.CV].
[31] A. Kirillov, Y. Wu, K. He, and R. Girshick. “PointRend: Image Segmentation As Rendering”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. DOI: 10.1109/CVPR42600.2020.00982.
[32] L. Reiher, B. Lampe, and L. Eckstein. “A Sim2Real Deep Learning Approach for the Transformation of Images fromMultiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View”. In: 2020 IEEE 23rd International Conference on Intel ligent Transportation Systems (ITSC). 2020, 1–7. DOI: 10.1109/ITSC45102.2020.9294462.
[33] M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang. “DenseASPP for Semantic Segmentation in Street Scenes”. In: Proceedings of the IEEE Conference on Com puter Vision and Pattern Recognition (CVPR). 2018. DOI: 10.1109/CVPR.2018.00388.
[34] T. Takikawa, D. Acuna, V. Jampani, and S. Fidler. “Gated-SCNN: Gated Shape CNNs for Semantic Seg mentation”. In: Proceedings of the IEEE/CVF Interna tional Conference on Computer Vision (ICCV). 2019. DOI: 10.1109/ICCV.2019.00533.
[35] J. Lee, D. Kim, J. Ponce, and B. Ham. “SFNet: Learn ing Object-Aware Semantic Correspondence”. In: Pro ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. DOI: 10.1109/CVPR.2019.00238.
[36] R.P.K.Poudel,S.Liwicki,andR.Cipolla.Fast-SCNN: Fast Semantic Segmentation Network. 2019. DOI: 10.48550/arXiv.1902.04502. arXiv: 1902.04502 [cs.CV].
[37] E. Romera, J. M. Álvarez, L. M. Bergasa, and R. Ar royo, (2018) “ERFNet: Efficient Residual Factorized Con vNet for Real-Time Semantic Segmentation" IEEE Trans actions on Intelligent Transportation Systems 19(1): 263–272. DOI: 10.1109/TITS.2017.2750080.
[38] Z. Guo, Z. Chen, T. Yu, J. Chen, and S. Liu. “Progres sive Image Inpainting with Full-Resolution Residual Network”. In: Proceedings of the 27th ACM Interna tional Conference on Multimedia. MM ’19. Nice, France: Association for Computing Machinery, 2019, 2496 2504. DOI: 10.1145/3343031.3351022.
[39] S. Wang, L. Yi, Q. Chen, Z. Meng, H. Dong, and Z. He. “Edge-aware Fully Convolutional Network with CRF-RNNLayer for Hippocampus Segmenta tion”. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). 2019, 803–806. DOI: 10.1109/ITAIC.2019.8785801.
[40] H.Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. “Pyramid scene parsing network”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, 2881–2890. DOI: 10.48550/arXiv.1612.01105.
[41] H.Wang, X. Jiang, H. Ren, Y. Hu, and S. Bai. “Swift Net: Real-Time Video Object Segmentation”. In: Pro ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021, 1296–1305. DOI: 10.1109/CVPR46437.2021.00135.
[42] H. Li, P. Xiong, H. Fan, and J. Sun. “DFANet: Deep Feature AggregationforReal-TimeSemanticSegmen tation”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019. DOI: 10.1109/CVPR.2019.00975.
[43] S. Yan, C. Wu, L. Wang, F. Xu, L. An, K. Guo, and Y. Liu. “DDRNet: Depth Map Denoising and Refine ment for Consumer Depth Cameras Using Cascaded CNNs”. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. DOI: 10.1007/978-3 030-01249-6_10.