Dynamic Gesture Recognition Based on Deep Learning in Human-to-Computer Interfaces

Jing Yu; Hang Li; Shou-Lin Yin; Shahid Karim

doi:10.6180/jase.202003_23(1).0004

Dynamic Gesture Recognition Based on Deep Learning in Human-to-Computer Interfaces

Computer Science and Information Engineering

Part of the results: left segment result, right recognition result.

Jing Yu¹, Hang Li ², Shou-Lin Yin ² and Shahid Karim³

+ Show author affiliations

¹Luxun Academy of Fine Arts, No.19, Miyoshi Street, HePing District, Shenyang 110034, China
²Software College, Shenyang Normal University, No.253, HuangHe Bei Street, HuangGu District, Shenyang 110034, China
³Institute of Image and Information Technology, Harbin Institute of Technology, No.92, XiDaZhi Street, NanGang District, Harbin 150000, China

Received: July 22, 2019
Accepted: October 19, 2019
Download Citation: ||https://doi.org/10.6180/jase.202003_23(1).0004

ABSTRACT

Currently, gesture recognition provides a faster, simpler, convenient, effective and more natural way for human-computer interaction, which has been widely concerned. Gesture recognition plays an important role in real life. The manual feature extraction in traditional gesture recognition methods is time-consuming and strenuous. Moreover, in order to improve the accuracy of recognition, the quantity and quality of features to be extracted are required to be very high, which is a bottleneck for traditional gesture recognition methods. Therefore, we propose a deep learning method for dynamic gesture recognition in Human-to-Computer interfaces. An improved inverted residual network architecture is utilized as the basis of SSD (Single Shot MultiBox Detector) network for feature extraction. And the convolution structure of the auxiliary layer is predicted by using the inverse residual structure combining the cavity convolution. It uses multi-scale information, which can reduce the amount of calculation and parameters number. Transfer learning is used to optimize the trained network model so as to reduce the training time and make the model more convergent. Finally, experimental results show that the proposed method can recognize different gestures quickly and effectively.

Keywords: gesture recognition, deep learning, Human-to-Computer interfaces, feature extraction

REFERENCES

[1]Yang C, Long J, Urbin M A, et al. “Real-Time Myocontrol of a Human–Computer Interface by Paretic Muscles After Stroke”. IEEE Transactions on Cognitive & Developmental Systems, 10(4) 1126-1132. (2018) doi: 10.1109/TCDS.2018.2830388
[2]Mert A, Akan A. “Emotion recognition from EEG signals by using multivariate empirical mode decomposition”. Pattern Analysis & Applications, 21(1):81-89. (2018) doi: 10.1007/s10044-016-0567-6
[3]Jing Yu, Hang Li, Shoulin Yin. “New intelligent interface study based on K-means gaze tracking”. International Journal of Computational Science and Engineering, 18(1) 12-20. (2019) doi: 10.1504/IJCSE.2019.096971
[4]Molchanov P, Gupta S, Kim K, et al. “Hand gesture recognition with 3D convolutional neural networks”. Computer Vision & Pattern Recognition Workshops. (2015). doi: 10.1109/CVPRW.2015.7301342
[5]Wilson A D, Bobick A F. “Parametric hidden Markov models for gesture recognition”. IEEE Trans.pattern Anal. & Mach.intell, 21(9):884-900. (2016) doi: 10.1109/34.790429
[6]Caramiaux B, Montecchio N, Tanaka A. “Adaptive Gesture Recognition with Variation Estimation for Interactive Systems”. Acm Transactions on Interactive Intelligent Systems, 4(4):1-34. (2014) doi: 10.1145/2643204
[7]Jing Gao, Peng Li, Zhikui Chen, “A canonical polyadic deep convolutional computation model for big data feature learning in Internet of Things”, Future Generation Computer Systems, (2019). doi: 10.1016/j.future.2019.04.048
[8]Teng Lin, Hang Li and Shoulin Yin. “Modified Pyramid Dual Tree Direction Filter-based Image De-noising via Curvature Scale and Non-local mean multi-Grade remnant multi-Grade Remnant Filter”, International Journal of Communication Systems. 31(16), (2018). doi: 10.1002/dac.3486
[9]Shoulin Yin, Jing Bi. “Medical Image Annotation Based on Deep Transfer Learning”. Journal of Applied Science and Engineering. 22(2), pp. 385-390 (2019). doi: 10.6180/jase.201906_22(2).0020
[10]Shoulin Yin, Ye Zhang, Shahid Karim. “Large Scale Remote Sensing Image Segmentation Based on Fuzzy Region Competition and Gaussian Mixture Model”. IEEE Access. 6, pp: 26069 - 26080. (2018). doi: 1109/ACCESS.2018.2834960
[11]Shoulin Yin, Ye Zhang and Shahid Karim. “Region Search Based on Hybrid CNN in Optical Remote Sensing Images Under Cloud Computing Environment”. International Journal of Distributed Sensor Networks, 15(5) (2019). doi: 1177/1550147719852036
[12]Ren S, He K, Girshick R, et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39(6):1137-1149. (2017) doi: 10.1109/TPAMI.2016.2577031
[13]Li J, Wong H C, Lo S L, et al. “Multiple Object Detection by Deformable Part-Based Model and R-CNN”. IEEE Signal Processing Letters, PP(99):1-1. (2018) doi: 10.1109/LSP.2017.2789325
[14]Shen J, Bu J, Ju B, et al. “Refining Gaussian mixture model based on enhanced manifold learning”. Neurocomputing, 87(1):19-25. (2012) doi: 10.1016/j.neucom.2012.01.029
[15]Bambach S, Lee S, Crandall D J, et al. “Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions”. 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, (2015). doi:10.1109/ICCV.2015.226
[16]Lepetit-Aimon G, Duval R, Cheriet F. “Large Receptive Field Fully Convolutional Network for Semantic Segmentation of Retinal Vasculature in Fundus Images”. International Workshop on Computational Pathology, pp.201-209. (2018) doi: 10.1007/978-3-030-00949-6_24
[17]Liu W, Anguelov D, Erhan D, et al. “SSD: Single Shot MultiBox Detector”. European Conference on Computer Vision. ECCV, pp 21-37. (2016) doi: 10.1007/978-3-319-46448-0_2
[18]Bambach S , Lee S , Crandall D J , et al. “Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions”, 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, (2015). doi: 10.1109/ICCV.2015.226
[19]Zhou Z, Cao Z, Pi Y. “Dynamic Gesture Recognition with a Terahertz Radar Based on Range Profile Sequences and Doppler Signatures”. Sensors, 18(1):10-. (2018) doi: 10.3390/s18010010
[20]Verma B, Choudhary A. “Framework for dynamic hand gesture recognition using Grassmann manifold for intelligent vehicles”. Iet Intelligent Transport Systems, 12(7):721-729. (2018) doi: 10.1049/iet-its.2017.0331
[21]Zhang Z, Tian Z, Mu Z. “Latern: Dynamic Continuous Hand Gesture Recognition Using FMCW Radar Sensor”. IEEE Sensors Journal, 18(8):1-1, (2018) doi: 10.1109/JSEN.2018.2808688
[22]Nguyen X S, Brun L, Lezoray O, et al. “Skeleton-Based Hand Gesture Recognition by Learning SPD Matrices with Neural Networks”, IEEE International Conference on Automatic Face & Gesture Recognition (FG). IEEE (2019). doi: 10.1109/FG.2019.8756512