An interactive deep learning method for fine-grained image classification

2025

2026-04-03

Liumin Luo¹, Mingxia Wang², and Xiaoqing Liu¹

¹School of Mechanical and Electrical Engineering, Zhoukou Normal University, Zhoukou 466000 China

²Department of Mechanical and Electrical Engineering, PLA Army Special Operations College, Guilin 541000, Guangxi Province, China

Received: March 19, 2023
Accepted: Appril 24, 2024
Publication Date: April 3, 2026

FESS.

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: BibTeX | http://dx.doi.org/10.6180/jase.202504_28(4).0004

Download PDF

Fine-grained image classification refers to the classification of subcategories based on the basic categories already divided. Fine-grained image classification is a very challenging research task because of the data characteristics of small inter-class differences and large intra-class differences. Based on the analysis and research of existing fine-grained image classification algorithms, a novel fine-grained image classification method based on an interactive deep learning is proposed. First, YOLOv5 is used as the backbone network to improve the classification performance, and a random elimination enhancement selection strategy is designed. The feature elimination branch and feature enhancement branch interactions promote the network to learn more relevant information and capture potential distinguishable features. Then, a global diversified module is proposed to model the feature maps of different levels to improve the ability of network comparison cues. Finally, the internal standard imprinting data set is established, and the fine-grained algorithm is applied to the authenticity identification work to realize the practical application of fine-grained image classification in natural scenes. Model training can be efficiently trained in an end-to-end manner without bounding boxes and comments. Experimental results show that the accuracy of the proposed algorithm on three fine-grained image datasets, namely, CUB-200-2011, Standford Cars and FGVC-Aircraft, reaches 90.6%, 95.9% and 95.8%, respectively.

Keywords: Fine-grained image classification; YOLOv5; interactive deep learning; feature enhancement

[1] M. Tan, F. Yuan, J. Yu, G. Wang, and X. Gu, (2022) “Fine-grained image classification via multi-scale selective hierarchical biquadratic pooling” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18(1s): 1–23. DOI: 10.1145/3492221.
[2] C. Zhang, H. Bai, and Y. Zhao, (2022) “Fine-grained image classification by class and image-specific decomposition with multiple views” IEEE Transactions on Multimedia: DOI: 10.1109/TMM.2022.3214431.
[3] X. Meng, X. Wang, S. Yin, and H. Li, (2023) “Few-shot image classification algorithm based on attention mechanism and weight fusion” Journal of Engineering and Applied Science 70(1): 14. DOI: 10.1186/s44147-023-00186-9.
[4] D. Liang, W. Xu, and X. Bai. “An end-to-end transformer model for crowd localization”. In: European Conference on Computer Vision. Springer. 2022, 38–54. DOI: 10.1007/978-3-031-19769-7_3.
[5] Y. Huang, F. Juefei-Xu, Q. Guo, Y. Liu, and G. Pu, (2022) “Fakelocator: Robust localization of gan-based face manipulations” IEEE Transactions on Information Forensics and Security 17: 2657–2672. DOI: 10.1109/TIFS.2022.3141262.
[6] P. Wu, W. Zhai, and Y. Cao. “Background activation suppression for weakly supervised object localization”. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2022, 14228–14237. DOI: 10.1109/CVPR52688.2022.01385.
[7] S. Yin, (2023) “Object Detection Based on Deep Learning: A Brief Review” IJLAI Transactions on Science and Engineering 1(02): 1–6.
[8] F. Li, D. Yao, M. Jiang, and X. Kang, (2022) “Smoking behavior recognition based on a two-level attention fine-grained model and EfficientDet network” Journal of Intelligent & Fuzzy Systems 43(5): 5733–5747. DOI: 10.3233/JIFS-213042.
[9] X. Ke, Y. Huang, and W. Guo, (2022) “Weakly supervised fine-grained image classification via two-level attention activation model” Computer Vision and Image Understanding 218: 103408. DOI: 10.1016/j.cviu.2022.103408.
[10] J. Guang and J. Liang, (2022) “Cmsea: Compound model scaling with efficient attention for fine-grained image classification” IEEE Access 10: 18222–18232. DOI: 10.1109/ACCESS.2022.3150320.
[11] J. Yang, J. Duan, T. Li, C. Hu, J. Liang, and T. Shi, (2022) “Tool wear monitoring in milling based on finegrained image classification of machined surface images” Sensors 22(21): 8416. DOI: 10.3390/s22218416.
[12] Y. Wan and J. Li, (2024) “LGP-YOLO: an efficient convolutional neural network for surface defect detection of light guide plate” Complex & Intelligent Systems 10(2): 2083–2105. DOI: 10.1007/s40747-023-01256-4.
[13] G. Xian, R. Tao, and G. Chen. “Combining spatial attention and cross-layer bilinear pooling for finegrained image classification”. In: 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA). IEEE. 2023, 271–276. DOI: 10.1109/ICPECA56706.2023.10075984.
[14] H. Touvron, P. Bojanowski, M. Caron, M. Cord, A. ElNouby, E. Grave, G. Izacard, A. Joulin, G. Synnaeve, J. Verbeek, et al., (2022) “Resmlp: Feedforward networks for image classification with data-efficient training” IEEE Transactions on Pattern Analysis and Machine Intelligence 45(4): 5314–5321. DOI: 10.1109/TPAMI.2022.3206148.
[15] Q. Zhu, Z. Li, W. Kuang, and H. Ma, (2023) “A multichannel location-aware interaction network for visual classification” Applied Intelligence 53(20): 23049–23066. DOI: 10.1007/s10489-023-04734-x.
[16] A. Jisi, S. Yin, et al., (2021) “A new feature fusion network for student behavior recognition in education” Journal of Applied Science and Engineering 24(2): 133–140. DOI: 10.6180/jase.202104_24(2).0002.
[17] R. Li and Y. Wu, (2022) “Improved YOLO v5 wheat ear detection algorithm based on attention mechanism” Electronics 11(11): 1673. DOI: 10.3390/electronics11111673.
[18] W. Chen, X. Du, F. Yang, L. Beyer, X. Zhai, T.-Y. Lin, H. Chen, J. Li, X. Song, Z. Wang, et al. “A simple single-scale vision transformer for object detection and instance segmentation”. In: European Conference on Computer Vision. Springer. 2022, 711–727. DOI: 10.1007/978-3-031-20080-9_41.
[19] T. Li, Z. Zhang, L. Pei, and Y. Gan, (2022) “HashFormer: Vision transformer based deep hashing for image retrieval” IEEE Signal Processing Letters 29: 827–831. DOI: 10.1109/LSP.2022.3157517.
[20] H. Liu, C. Zhang, Y. Deng, B. Xie, T. Liu, and Y.-F. Li, (2023) “TransIFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification” IEEE Transactions on Multimedia: DOI: 10.1109/TMM.2023.3238548.
[21] W. Zhang, Y. Zhao, Y. Gao, and C. Sun, (2024) “Reabstraction and perturbing support pair network for fewshot fine grained image classification” Pattern Recognition 148: 110158. DOI: 10.1016/j.patcog.2023.110158.
[22] X. Li, Q. Song, J. Wu, R. Zhu, Z. Ma, and J.-H. Xue, (2023) “Locally-enriched cross-reconstruction for few-shot fine-grained image classification” IEEE Transactions on Circuits and Systems for Video Technology: DOI: 10.1109/TCSVT.2023.3275382.