TFMNet: Trimap-free real-time image matting algorithm based on deep learning

Ge  Peng; Jingzong  Yang

doi:10.6180/jase.202404_27(4).0009

TFMNet: Trimap-free real-time image matting algorithm based on deep learning

Computer Science and Information Engineering

Ge Peng, Jingzong YangThis email address is being protected from spambots. You need JavaScript enabled to view it.

School of Big Data, Baoshan University, Baoshan Yunnan 678000, China

Received: January 16, 2023
Accepted: August 31, 2023
Publication Date: September 27, 2023

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202404_27(4).0009

The conventional image matting algorithms needed priori manual Trimap information to produce excellent matting results which made real time matting impossible. To tackle the problem, a Trimap-free image matting network, TFMNet, is proposed in this paper. The proposed network consists of four modules, ConvNeXt backbone module for image features extraction, Trimap prediction module for normalized Trimap generation, glance matting module for rough matting results prediction, and post-processing module for exact matting results production. To further optimize the training process of the proposed model, an improved Loss function based on frequency domain information is proposed. In experiment, Sets of Experiments designed by variable controlling approach prove that the proposed TFMNet do well in real time image matting. The TFMNet model achieves 8.99, 0.011, 12.31, 11.15 in the accuracy metrics of SAD, MSE, GRAD, CONN, respectively, costs 51ms for one image averagely which meet the real-time requirements, and model size is 671M. Besides, further experiments conducted by comparing with five state-of-the-art models based on three typical matting databases demonstrate the superiority of the proposed algorithm.

Keywords: real-time image matting; image semantic segmentation; convolution neural network without pooling; image processing

[1] R. Brinkmann. The art and science of digital compositing: Techniques for visual effects, animation and motion graphics. Morgan Kaufmann, 2008.
[2] N. Xu, B. Price, S. Cohen, and T. Huang. “Deep image matting”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2970–2979. DOI: 10.1109/CVPR.2017.41.
[3] G. Park, S. Son, J. Yoo, S. Kim, and N. Kwak. “Matteformer: Transformer-based image matting via priortokens”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11696–11706. DOI: 10.1109/CVPR52688.2022.01140.
[4] S. Lin, A. Ryabtsev, S. Sengupta, B. L. Curless, S. M. Seitz, and I. Kemelmacher-Shlizerman. “Real-time high-resolution background matting”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8762–8771. DOI: 10.1109/CVPR46437.2021.00865.
[5] C. Henry and S.-W. Lee, (2019) “Automatic trimap generation and artifact reduction in alpha matte using unknown region detection" Expert Systems with Applications 133: 242–259. DOI: 10.1016/j.eswa.2019.05.019.
[6] J. Li, G. Yuan, and H. Fan, (2020) “Robust trimap generation based on manifold ranking" Information Sciences 519: 200–214. DOI: 10.1016/j.ins.2020.01.017.
[7] J. Li, J. Zhang, S. J. Maybank, and D. Tao, (2022) “Bridging composite and real: towards end-to-end deep image matting" International Journal of Computer Vision 130(2): 246–266. DOI: 10.1007/s11263-021-01541-0.
[8] Z. Ke, J. Sun, K. Li, Q. Yan, and R. W. Lau. “Modnet: Real-time trimap-free portrait matting via objective decomposition”. In: Proceedings of the AAAI Conference on Artificial Intelligence. 36, 1140–1147. DOI: 10.48550/arXiv.2011.11961.
[9] A. Bilal, L. Zhu, A. Deng, H. Lu, and N. Wu, (2022) “AI-based automatic detection and classification of diabetic retinopathy using U-Net and deep learning" Symmetry 14(7): 1427. DOI: 10.3390/sym14071427.
[10] A. Bilal, G. Sun, S. Mazhar, A. Imran, and J. Latif, (2022) “A Transfer Learning and U-Net-based automatic detection of diabetic retinopathy from fundus images" Computer Methods in Biomechanics and Biomedical Engineering: Imaging Visualization 10(6): 663–674. DOI: 10.1080/21681163.2021.2021111.
[11] A. Bilal, M. Shafiq, F. Fang, M. Waqar, I. Ullah, Y. Y. Ghadi, H. Long, and R. Zeng, (2022) “IGWO-IVNet3: DL-Based Automatic Diagnosis of Lung Nodules Using an Improved Gray Wolf Optimization and InceptionNetV3" Sensors 22(24): 9603. DOI: 10.3390/s22249603.
[12] J. Li, J. Zhang, and D. Tao. “Deep automatic natural image matting”. In: Proceedings of International Joint Conferences on Artificial Intelligence Organization, Montreal-themed virtual reality, 800–806. DOI: 10.48550/arXiv.2107.07235.
[13] E. K. Aghdam, R. Azad, M. Zarvani, and D. Merhof, (2022) “Attention swin u-net: Cross-contextual attention mechanism for skin lesion segmentation" arXiv preprint arXiv:2210.16898: DOI: 10.1007/978-3-031-25066-8_9.
[14] H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang. “Swin-unet: Unet-like pure transformer for medical image segmentation”. In: European conference on computer vision. Springer, 205–218.
[15] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie. “A convnet for the 2020s”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11976–11986. DOI: 10.48550/arXiv.2201.03545.
[16] J. Xiao, S. A. Suab, X. Chen, C. K. Singh, D. Singh, A. K. Aggarwal, A. Korom, W. Widyatmanti, T. H. Mollah, and H. V. T. Minh, (2023) “Enhancing assessment of corn growth performance using unmanned aerial vehicles (UAVs) and deep learning" Measurement 214: 112764. DOI: 10.1016/j.measurement.2023.112764.
[17] R. Thukral, A. Arora, A. Kumar, and Gulshan. “Denoising of thermal images using deep neural network”. In: Proceedings of International Conference on Recent Trends in Computing: ICRTC 2021. Springer, 827–833. DOI: 10.1007/978-981-16-7118-0_70.
[18] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. DOI: 10.48550/arXiv.1512.03385.
[19] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, (2017) “Mobilenets: Efficient convolutional neural networks for mobile vision applications" arXiv preprint arXiv:1704.04861: DOI: 10.48550/arXiv.1704.04861.
[20] M. Tan and Q. Le. “Efficientnet: Rethinking model scaling for convolutional neural networks”. In: Proceedings of International conference on machine learning. PMLR, 6105–6114. DOI: 10.4236/ojmsi.2021.93017.
[21] I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár. “Designing network design spaces”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10428–10436. DOI: 10.48550/arXiv.2003.13678.
[22] Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, and L. Dong. “Swin transformer v2: Scaling up capacity and resolution”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12009–12019. DOI: 10.48550/arXiv.1405.0312.
[23] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. “Swin transformer: Hierarchical vision transformer using shifted windows”. In: Proceedings of the IEEE/CVF international conference on computer vision, 10012–10022. DOI: 10.48550/arXiv.2103.14030.
[24] R. Mechrez, I. Talmi, and L. Zelnik-Manor. “The contextual loss for image transformation with nonaligned data”. In: Proceedings of the European conference on computer vision (ECCV), 768–783. DOI: 10.48550/arXiv.1803.02077.
[25] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. “Microsoft coco: Common objects in context”. In: Proceedings of European conference on computer vision. Springer, 740– 755. DOI: 10.48550/arXiv.1405.0312.
[26] S. Chauhan, M. Singh, and A. K. Aggarwal, (2021) “Data science and data analytics: artificial intelligence and machine learning integrated based approach" Data Science and Data Analytics: Opportunities and Challenges 1: DOI: 10.1201/9781003111290.
[27] A. Aggarwal, (2020) “Enhancement of GPS position accuracy using machine vision and deep learning techniques" Journal of Computer Science 16(5): 651–659. DOI: 10.3844/jcssp.2020.651.659.
[28] A. Kaur, A. P. S. Chauhan, and A. K. Aggarwal. “Machine learning based comparative analysis of methods for enhancer prediction in genomic data”. In: Proceedings of 2019 2nd International Conference on Intelligent Communication and Computational Techniques. IEEE, 142–145. DOI: 10.1109/ICCT46177.2019.8969054.
[29] A. Kaur, A. P. S. Chauhan, and A. K. Aggarwal, (2022) “Dynamic deep genomics sequence encoder for managed file transfer" IETE Journal of Research: 1–13. DOI: 10.1080/03772063.2022.2060869.
[30] C. Rhemann, C. Rother, J. Wang, M. Gelautz, P. Kohli, and P. Rott. “A perceptually motivated online benchmark for image matting”. In: Proceedings of 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1826–1833. DOI: 10.1109/CVPR.2009.5206503.