Mask RCNN-based Single Shot Multibox Detector For Gesture Recognition In Physical Education

Tao Feng

doi:10.6180/jase.202303_26(3).0009

Mask RCNN-based Single Shot Multibox Detector For Gesture Recognition In Physical Education

Computer Science and Information Engineering

New SSD network

Tao Feng ¹

¹Department of Physical Education, Harbin Finance University, Harbin, 150000, China

Received: April 11, 2022
Accepted: May 4, 2022
Publication Date: June 11, 2022

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202303_26(3).0009

ABSTRACT

Human-computer interaction (HCI) is an important supporting technology in the computer vision area, especially in physical education. HCI can promote the efficiency of physical education class, which is of great help to improve the learning efficiency. It is developing towards naturalization, intelligence, high efficiency, and materialization. Gesture recognition is very important in HCI, and plays a very important role in artistic understanding and image perception. Traditional gesture recognition methods are prone to misrecognition and result in low accuracy. In this paper, we propose a new gesture recognition method based on mask RCNN and single shot multibox detector (SSD) in HCI. Firstly, feature extraction and region segmentation are performed on the red, green, and blue (RGB) three-channel images, and the hand instance segmentation and mask are obtained. Then we modify the SSD model to obtain a new convolution layer, which can realize the fusion of shallow visual convolution layer and deep semantic convolution layer in the network structure. To solve the problem of poor classification performance caused by the imbalance of positive and negative samples, an improved loss function is proposed to improve the model ability of classifying target gestures. The experimental results show that compared with state-of-the-art methods, the proposed method has better robustness and faster detection speed while maintaining higher gesture detection accuracy.

Keywords: Human-computer interaction, gesture recognition, mask RCNN, single shot multibox detector, loss function, physical education

REFERENCES

[1] M. Li, X. Xiong, and Q. Yin, (2021) “Smart City Construction Visual Simulation Application Based on Intelligent BIM Technology" International Journal of Pattern Recognition and Artificial Intelligence 35(13): DOI: 10.1142/S0218001421550144.
[2] Q. Huang and K. Hao, (2020) “Development of CNN based visual recognition air conditioner for smart buildings" Journal of Information Technology in Construction 25: 361–373. DOI: 10.36680/j.itcon.2020.021.
[3] J. Yu, H. Li, S.-L. Yin, Q. Shi, and S. Karim, (2020) “Dynamic gesture recognition based on deep learning in human-to-computer interfaces" Journal of Applied Science and Engineering 23(1): 31–38. DOI: 10.6180/jase.202003_23(1).0004.
[4] X. Xiong, H.Wu,W. Min, J. Xu, Q. Fu, and C. Peng, (2021) “Traffic police gesture recognition based on gesture skeleton extractor and multichannel dilated graph convolution network" Electronics (Switzerland) 10(5): 1–15. DOI: 10.3390/electronics10050551.
[5] Q. Fu, J. Fu, S. Zhang, X. Li, J. Guo, and S. Guo, (2021) “Design of Intelligent Human-Computer Interaction System for Hard of Hearing and Non-Disabled People" IEEE Sensors Journal 21(20): 23471–23479. DOI: 10.1109/JSEN.2021.3107949.
[6] X. Liu and L. Zhang, (2021) “WITHDRAWN: Design and Implementation of Human-Computer Interaction Adjustment in Nuclear Power Monitoring System" Microprocessors and Microsystems: 104096. DOI: https://doi.org/10.1016/j.micpro.2021.104096.
[7] X. Wang, S. Yin, K. Sun, H. Li, J. Liu, and S. Karim, (2020) “GKFC-CNN: Modified gaussian kernel fuzzy Cmeans and convolutional neural network for apple segmentation and recognition" Journal of Applied Science and Engineering 23(3): 555–562. DOI: 10.6180/jase.202009_23(3).0020.
[8] A. Malini, P. Priyadharshini, and S. Sabeena, (2021) “An automatic assessment of road condition from aerial imagery using modified VGG architecture in faster-RCNN framework" Journal of Intelligent and Fuzzy Systems 40(6): 11411–11422. DOI: 10.3233/JIFS-202596.
[9] J. Xiong, L. Zhu, L. Ye, and J. Li, (2021) “Attention aware cross faster RCNN model and simulation" Wireless Networks: DOI: 10.1007/s11276-021-02645-8.
[10] Z. Xiang, P. Seeling, and F. Fitzek, (2021) “You only look once, but compute twice: service function chaining for low-latency object detection in softwarized networks" Applied Sciences (Switzerland) 11(5): 1–14. DOI: 10.3390/app11052177.
[11] J. Yi, P. Wu, and D. Metaxas, (2019) “ASSD: Attentive single shot multibox detector" Computer Vision and Image Understanding 189: DOI: 10.1016/j.cviu.2019.102827.
[12] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, (2018) “Encoder-decoder with atrous separable convolution for semantic image segmentation" Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11211 LNCS: 833–851. DOI: 10.1007/978-3-030-01234-2_49.
[13] J. Wu, Z. Hong, S. Pan, X. Zhu, Z. Cai, and C. Zhang, (2016) “Multi-graph-view subgraph mining for graph classification" Knowledge and Information Systems 48(1): 29–54. DOI: 10.1007/s10115-015-0872-1.
[14] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, (2017) “Dssd: Deconvolutional single shot detector" arXiv preprint arXiv:1701.06659:
[15] L. Zeng, B. Sun, and D. Zhu, (2021) “Underwater target detection based on Faster R-CNN and adversarial occlusion network" Engineering Applications of Artificial Intelligence 100: DOI: 10.1016/j.engappai.2021.104190.
[16] M. Jaiswal, V. Sharmay, A. Sharmaz, and R. Tomar. “Transfer learning with L2 norm regularization for classifying static two hand Hindi sign language gestures”. In: cited By 4. 2020, 44–48. DOI: 10.1109/CSNT48778.2020.9115767.
[17] M. Montazerin, S. Zabihi, E. Rahimian, A. Mohammadi, and F. Naderkhani, (2022) “ViT-HGR: Vision Transformer-based Hand Gesture Recognition from High Density Surface EMG Signals" arXiv preprint arXiv:2201.10060:
[18] T. Hu, W. Wang, and T. Lu, (2018) “Hand pose estimation with attention-and-sequence network" Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11164 LNCS: 556–566. DOI: 10.1007/978-3-030-00776-8_51.
[19] S. Yang, J. Liu, S. Lu, M. Er, and A. Kot, (2020) “Collaborative Learning of Gesture Recognition and 3D Hand Pose Estimation with Multi-order Feature Analysis" Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12348 LNCS: 769–786. DOI: 10.1007/978-3-030-58580-8_45.
[20] K. He, G. Gkioxari, P. Dollár, and R. Girshick. “Mask R-CNN”. In: 2017 IEEE International Conference on Computer Vision (ICCV). 2017, 2980–2988. DOI: 10.1109/ICCV.2017.322.
[21] J. Nayak, B. Naik, P. Dash, A. Souri, and V. Shanmuganathan, (2021) “Hyper-parameter tuned light gradient boosting machine using memetic firefly algorithm for hand gesture recognition" Applied Soft Computing 107: DOI: 10.1016/j.asoc.2021.107478.
[22] W.-C. Shia and D.-R. Chen, (2021) “Classification of malignant tumors in breast ultrasound using a pretrained deep residual network model and support vector machine" Computerized Medical Imaging and Graphics 87: DOI: 10.1016/j.compmedimag.2020.101829.
[23] H. Wu and W. Wu. “An Optimized FPN Network Attack Model Based on Improved Ant Colony Algorithm”. In: 2015 3rd International Conference on Mechatronics and Industrial Informatics (ICMII 2015). Atlantis Press. 2015, 114–123.
[24] X. Chen, C. Lian, H. Deng, T. Kuang, H.-Y. Lin, D. Xiao, J. Gateno, D. Shen, J. Xia, and P.-T. Yap, (2021) “Fast and Accurate Craniomaxillofacial Landmark Detection via 3D Faster R-CNN" IEEE Transactions on Medical Imaging 40(12): 3867–3878. DOI: 10.1109/TMI.2021.3099509.
[25] P. Gajbhiye, N. Mingchinda, W. Chen, S. Mukhopadhyay, T. Wilaiprasitporn, and R. Tripathy, (2021) “Wavelet Domain Optimized Savitzky-Golay Filter for the Removal of Motion Artifacts from EEG Recordings" IEEE Transactions on Instrumentation and Measurement 70: DOI: 10.1109/TIM.2020.3041099.