An interactive deep learning method for fine-grained image classification

Liumin  Luo; Mingxia  Wang; Xiaoqing  Liu

doi:10.6180/jase.202504_28(4).0004

An interactive deep learning method for fine-grained image classification

Electrical Engineering Mechanical Engineering

FESS.

Liumin Luo¹, Mingxia Wang², and Xiaoqing Liu¹

¹School of Mechanical and Electrical Engineering, Zhoukou Normal University, Zhoukou 466000 China

²Department of Mechanical and Electrical Engineering, PLA Army Special Operations College, Guilin 541000, Guangxi Province, China

Received: March 19, 2023
Accepted: April 24, 2024
Publication Date: May 28, 2024

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202504_28(4).0004

Fine-grained image classification refers to the classification of subcategories based on the basic categories already divided. Fine-grained image classification is a very challenging research task because of the data characteristics of small inter-class differences and large intra-class differences. Based on the analysis and research of existing fine-grained image classification algorithms, a novel fine-grained image classification method based on an interactive deep learning is proposed. First, YOLOv5 is used as the backbone network to improve the classification performance, and a random elimination enhancement selection strategy is designed. The feature elimination branch and feature enhancement branch interactions promote the network to learn more relevant information and capture potential distinguishable features. Then, a global diversified module is proposed to model the feature maps of different levels to improve the ability of network comparison cues. Finally, the internal standard imprinting data set is established, and the fine-grained algorithm is applied to the authenticity identification work to realize the practical application of fine-grained image classification in natural scenes. Model training can be efficiently trained in an end-to-end manner without bounding boxes and comments. Experimental results show that the accuracy of the proposed algorithm on three fine-grained image datasets, namely, CUB-200-2011, Standford Cars and FGVC-Aircraft, reaches 90.6%, 95.9% and 95.8%, respectively.

Keywords: Fine-grained image classification; YOLOv5; interactive deep learning; feature enhancement

[1] M. Tan, F. Yuan, J. Yu, G. Wang, and X. Gu, (2022) “Fine-grained image classification via multi-scale selective hierarchical biquadratic pooling" ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18(1s): 1–23. DOI: 10.1145/3492221.
[2] C. Zhang, H. Bai, and Y. Zhao, (2022) “Fine-grained image classification by class and image-specific decom position with multiple views" IEEE Transactions on Multimedia: DOI: 10.1109/TMM.2022.3214431.
[3] X. Meng, X. Wang, S. Yin, and H. Li, (2023) “Few-shot image classification algorithm based on attention mecha nism and weight fusion" Journal of Engineering and Applied Science 70(1): 14. DOI: 10.1186/s44147-023-00186-9.
[4] D. Liang, W. Xu, and X. Bai. “An end-to-end trans former model for crowd localization”. In: European Conference on Computer Vision. Springer. 2022, 38–54. DOI: 10.1007/978-3-031-19769-7_3.
[5] Y. Huang, F. Juefei-Xu, Q. Guo, Y. Liu, and G. Pu, (2022) “Fakelocator: Robust localization of gan-based face manipulations" IEEE Transactions on Information Forensics and Security 17: 2657–2672. DOI: 10.1109/TIFS.2022.3141262.
[6] P. Wu, W. Zhai, and Y. Cao. “Background activa tion suppression for weakly supervised object local ization”. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2022, 14228–14237. DOI: 10.1109/CVPR52688.2022.01385.
[7] S.Yin, (2023) “Object Detection Based on Deep Learning: ABrief Review" IJLAI Transactions on Science and Engineering 1(02): 1–6.
[8] F. Li, D. Yao, M. Jiang, and X. Kang, (2022) “Smok ing behavior recognition based on a two-level attention f ine-grained model and EfficientDet network" Journal of Intelligent & Fuzzy Systems 43(5): 5733–5747. DOI: 10.3233/JIFS-213042.
[9] X. Ke, Y. Huang, and W. Guo, (2022) “Weakly super vised fine-grained image classification via two-level atten tion activation model" Computer Vision and Image Understanding 218: 103408. DOI: 10.1016/j.cviu.2022.103408.
[10] J.GuangandJ.Liang,(2022) “Cmsea: Compoundmodel scaling with efficient attention for fine-grained image clas sification" IEEE Access 10: 18222–18232. DOI: 10.1109/ACCESS.2022.3150320.
[11] J. Yang, J. Duan, T. Li, C. Hu, J. Liang, and T. Shi, (2022) “Tool wear monitoring in milling based on fine grained image classification of machined surface images" Sensors 22(21): 8416. DOI: 10.3390/s22218416.
[12] Y.WanandJ.Li,(2024) “LGP-YOLO:anefficient convo lutional neural network for surface defect detection of light guide plate" Complex & Intelligent Systems 10(2): 2083–2105. DOI: 10.1007/s40747-023-01256-4.
[13] G. Xian, R. Tao, and G. Chen. “Combining spatial attention and cross-layer bilinear pooling for fine grained image classification”. In: 2023 IEEE 3rd In ternational Conference on Power, Electronics and Com puter Applications (ICPECA). IEEE. 2023, 271–276. DOI: 10.1109/ICPECA56706.2023.10075984.
[14] H.Touvron, P. Bojanowski, M. Caron, M. Cord, A. El Nouby, E. Grave, G. Izacard, A. Joulin, G. Synnaeve, J. Verbeek, et al., (2022) “Resmlp: Feedforward networks for image classification with data-efficient training" IEEE Transactions on Pattern Analysis and Machine In telligence 45(4): 5314–5321. DOI: 10.1109/TPAMI.2022.3206148.
[15] Q. Zhu, Z. Li, W. Kuang, and H. Ma, (2023) “A multi channel location-aware interaction network for visual clas sification" Applied Intelligence 53(20): 23049–23066. DOI: 10.1007/s10489-023-04734-x.
[16] A. Jisi, S. Yin, et al., (2021) “A new feature fusion net work for student behavior recognition in education" Jour nal of Applied Science and Engineering 24(2): 133-140. DOI: 10.6180/jase.202104_24(2).0002.
[17] R. Li and Y. Wu, (2022) “Improved YOLO v5 wheat ear detection algorithm based on attention mecha nism" Electronics 11(11): 1673. DOI: 10.3390/electronics11111673.
[18] W.Chen, X. Du, F. Yang, L. Beyer, X. Zhai, T.-Y. Lin, H. Chen, J. Li, X. Song, Z. Wang, et al. “A simple single-scale vision transformer for object detection and instance segmentation”. In: European Conference on Computer Vision. Springer. 2022, 711–727. DOI: 10.1007/978-3-031-20080-9_41.
[19] T. Li, Z. Zhang, L. Pei, and Y. Gan, (2022) “Hash Former: Vision transformer based deep hashing for image retrieval" IEEE Signal Processing Letters 29: 827–831. DOI: 10.1109/LSP.2022.3157517.
[20] H. Liu, C. Zhang, Y. Deng, B. Xie, T. Liu, and Y.-F. Li, (2023) “TransIFC: invariant cues-aware feature con centration learning for efficient fine-grained bird image classification" IEEE Transactions on Multimedia: DOI: 10.1109/TMM.2023.3238548.
[21] W. Zhang, Y. Zhao, Y. Gao, and C. Sun, (2024) “Re abstraction and perturbing support pair network for few shot fine-grained image classification" Pattern Recogni tion 148: 110158. DOI: 10.1016/j.patcog.2023.110158.
[22] X. Li, Q. Song, J. Wu, R. Zhu, Z. Ma, and J.-H. Xue, (2023) “Locally-enriched cross-reconstruction for few-shot f ine-grained image classification" IEEE Transactions on Circuits and Systems for Video Technology: DOI: 10.1109/TCSVT.2023.3275382.