Xianlei Ge1,2, Xiaobo Shen1,3This email address is being protected from spambots. You need JavaScript enabled to view it., and Yingxuan Zhou1

1School of Electronic Engineering, Huainan Normal University, Huainan 232038, China

2College of Computing and Information Technologies, National University, Manila 1008, Philippines

3College of Industrial Education, Technological University of the Philippines, Manila 1000, Philippines


Received: January 6, 2024
Accepted: December 18, 2024
Publication Date: January 24, 2025

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202510_28(10).0009  

In recent years, with the gradual progress of automatic driving technology, semantic segmentation of road scenes, as the core of this technology, has become a hot spot of research. However, nowadays, most of the convolutional (CNN)-based methods appear to be inefficient and costly due to the factors of large amount of detection data and complex structure. It limits their performance in dealing with some fast response (real-time) tasks. Addressing the above problems, this paper proposes a capsule network-based semantic segmentation method for road images, which achieves a good balance between recognition efficiency and detection speed. Specifically, the DDC-Net designed based on capsule network is used as the baseline network, and different connection paths are dynamically selected according to pixel affinity during forward propagation. In addition, DDC-S andDDC-Garedesigned for spatial detail fusion and semantic fusion, respectively, and the local feature extraction module (LFCE) is designed using a two-branch structure. Numerous experiments show that the method described in this paper outperforms most of the current CNN-based methods in terms of model size, recognition flexibility and overall performance. In ADE20K and Cityscapes test datasets, the method described in this paper achieves 74.5% and 79.4% mean intersection and merger ratio (mIoU) accuracies at 63.9fps and 64.8fps, and the experimental results demonstrate the effectiveness of our method.

Keywords: image semantic segmentation; deep learning; autonomous driving; road scene detection; fast response

