A New Feature Fusion Network for Student Behavior Recognition in Education

A. Jisi; Shoulin Yin

doi:10.6180/jase.202104_24(2).0002

A New Feature Fusion Network for Student Behavior Recognition in Education

Computer Science and Information Engineering

Jisi A.¹ and Shoulin Yin ^1,2

¹School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, 150000 China
²Software College, Shenyang Normal University, Shenyang 110034, China

Received: September 20, 2020
Accepted: October 3, 2020
Publication Date: April 1, 2021

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202104_24(2).0002

ABSTRACT

Behavior recognition is a research hotspot in the field of computer vision and it also is a challenging task. In particular, student behavior analysis has an impact on the efficiency of classroom education. Aiming at the complex student behavior recognition problem in the video, we propose a new feature fusion network for student behavior recognition in education in this paper. The new feature fusion network contains two main stages: feature extraction and classification. First, we combine spatial affine transformation network with convolutional neural network to extract more detailed features. Then, the weighted sum method is adopted to fuse the spatial-temporal features, and the softmax classifier is improved for classification recognition to improve the final recognition result. Experiments are carried out on standard human behavior data HMDB51, UCF101 and real student behavior data. The results show that the proposed algorithm can achieve better recognition effect than other state-of-the-art recognition algorithms.

Keywords: Behavior recognition; feature fusion; spatial affine transformation; spatial-temporal feature

REFERENCES

[1] Xiaowei Wang, Shoulin Yin, Ke Sun, Hang Li, Jie Liu, and Shahid Karim. GKFC-CNN: Modified Gaussian Kernel Fuzzy C-means and Convolutional Neural Network for Apple Segmentation and Recognition. Journal of Applied Science and Engineering, 23(3):555–561, 2020.
[2] Jing Yu, Hang Li, Shou-Lin Yin, and Shahid Karim. Dynamic gesture recognition based on deep learning in human-to-computer interfaces. Journal of Applied Science and Engineering, 23(1):31–38, 2020.
[3] Joshua Candamo, Matthew Shreve, Dmitry B Goldgof, Deborah B Sapper, and Rangachar Kasturi. Understanding transit scenes: A survey on human behavior-recognition algorithms. IEEE transactions on intelligent transportation systems, 11(1):206–224, 2009.
[4] Natalia Díaz Rodríguez, Manuel P Cuéllar, Johan Lilius, and Miguel Delgado Calvo-Flores. A survey on ontologies for human behavior recognition. ACM Computing Surveys (CSUR), 46(4):1–33, 2014.
[5] Shoulin Yin, Ye Zhang, and Shahid Karim. Region search based on hybrid convolutional neural network in optical remote sensing images. International Journal of Distributed Sensor Networks, 15(5):1550147719852036, 2019.
[6] Lin Teng, Hang Li, and Shoulin Yin. Modified pyramid dual tree direction filter-based image denoising via curvature scale and nonlocal mean multigrade remnant filter. International Journal of Communication Systems, 31(16):e3486, 2018.
[7] Shoulin Yin, Hang Li, Desheng Liu, and Shahid Karim. Active contour modal based on densityoriented BIRCH clustering method for medical image segmentation. Multimedia Tools and Applications, pages 1–20, 2020.
[8] Arash Mokhber, Catherine Achard, and Maurice Milgram. Recognition of human behavior by space-time silhouette characterization. Pattern Recognition Letters, 29(1):81–89, 2008.
[9] Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1):221–231, 2012.
[10] Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems, pages 568–576, 2014.
[11] Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015.
[12] Wangjiang Zhu, Jie Hu, Gang Sun, Xudong Cao, and Yu Qiao. A key volume mining deep framework for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1991– 1999, 2016.
[13] Amlan Kar, Nishant Rai, Karan Sikka, and Gaurav Sharma. Adascan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3376–3385, 2017.
[14] Laura Sevilla-Lara, Yiyi Liao, Fatma Güney, Varun Jampani, Andreas Geiger, and Michael J Black. On the integration of optical flow and action recognition. In German Conference on Pattern Recognition, pages 281–297. Springer, 2018.
[15] Bowen Zhang, Limin Wang, Zhe Wang, Yu Qiao, and Hanli Wang. Real-time action recognition with enhanced motion vector CNNs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2718–2726, 2016.
[16] Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, and Cordelia Schmid. Potion: Pose motion representation for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7024–7033, 2018.
[17] Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pages 20–36. Springer, 2016.
[18] Zhenzhong Lan, Yi Zhu, Alexander G Hauptmann, and Shawn Newsam. Deep local video feature for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 1–7, 2017.
[19] Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. Temporal relational reasoning in videos. In Proceedings of the European Conference on Computer Vision (ECCV), pages 803–818, 2018.
[20] Huijuan Xu, Abir Das, and Kate Saenko. R-c3d: Region convolutional 3d network for temporal activity detection. In Proceedings of the IEEE international conference on computer vision, pages 5783–5792, 2017.
[21] Shoulin Yin, Ye Zhang, and Shahid Karim. Region search based on hybrid convolutional neural network in optical remote sensing images. International Journal of Distributed Sensor Networks, 15(5):1550147719852036, 2019.
[22] Zhaofan Qiu, Ting Yao, and Tao Mei. Learning spatiotemporal representation with pseudo-3d residual networks. In proceedings of the IEEE International Conference on Computer Vision, pages 5533–5541, 2017.
[23] Zhenzhen Huang, Qiang Niu, and Shuo Xiao. Human behavior recognition based on motion data analysis. International Journal of Pattern Recognition and Artificial Intelligence, 34(09):2056005, 2020.
[24] Meng Li, Tao Chen, and Hao Du. Human Behavior Recognition Using Range-Velocity-Time Points. IEEE Access, 8:37914–37925, 2020.
[25] Yixue Lin, Wanda Chi, Wenxue Sun, Shicai Liu, and Di Fan. Human Action Recognition Algorithm Based on Improved ResNet and Skeletal Keypoints in Single Image. Mathematical Problems in Engineering, 2020, 2020.
[26] Basavaiah J, Patil C. Robust Feature Extraction and Classification Based Automated Human Action Recognition System for Multiple Datasets. International Journal of Intelligent Engineering and Systems, 13(1):13–24, 2020.
[27] Zhao HXue WLi XGu ZNiu LZhang L. Multi-Mode Neural Network for Human Action Recognition. IET Computer Vision, 2020.
[28] Hao Du, Tian Jin, Yongping Song, Yongpeng Dai, and Meng Li. A three-dimensional deep learning framework for human behavior analysis using range-Doppler time points. IEEE Geoscience and Remote Sensing Letters, 17(4):611–615, 2019.