H-GAT: A Hardware-Efficient Accelerator for Graph Attention Networks

Shizhen  Huang; Enhao  Tang; Shun  Li

doi:10.6180/jase.202403_27(3).0010

H-GAT: A Hardware-Efficient Accelerator for Graph Attention Networks

Computer Science and Information Engineering

The overall architecture and workflow of H-GAT

Shizhen Huang, Enhao Tang , Shun Li

College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China

Received: December 5, 2022
Accepted: July 18, 2023
Publication Date: August 26, 2023

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202403_27(3).0010

Recently, Graph Attention Networks (GATs) have shown good performance for representation learning on graphs. Furthermore, GAT leverage the masked self-attention mechanism to get a more advanced feature representation than the graph convolution networks (GCNs). However, GAT incurs large amounts of irregularity in computation and memory access, which prevents the efficient use of traditional neural network accelerators. Moreover, existing dedicated GAT accelerators demand high memory volumes and are difficult to implement onto resource-limited edge devices. Due to this, this paper proposes an FPGA-based accelerator, called H-GAT, which achieves excellent performance on acceleration and energy efficiency in GAT inference. HGAT decomposes GAT operation into matrix multiplication and activation function unit. We first design an effective and fully-pipelined PE for sparse matrix multiplication (SpMM) and dense matrix-vector multiplication (DMVM). Moreover, we optimize the softmax data flow so that the computational efficiency of softmax can be improved dramatically. We evaluate our design on Xilinx Kintex-7 FPGA with three popular datasets. Compared to existing CPU, GPU, and state-of-the-art FPGA-based GAT accelerator, H-GAT can achieve speedup by up to 585×, 2.7×, and 11× and increases power efficiency by up to 2095×, 173×, and 65×, respectively.

Keywords: Graph neural network; FPGA; sparse-matrix-vector;

[1] K. Atz, F. Grisoni, and G. Schneider, (2021) “Geometric deep learning on molecular representations" Nature Machine Intelligence 3(12): 1023–1032. DOI: 10.1038/s42256-021-00418-8.
[2] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang. “LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation”. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’20. Virtual Event, China: Association for Computing Machinery, 2020, 639–648. DOI: 10.1145/3397271.3401063.
[3] T. N. Kipf and M. Welling. Semi-Supervised Classification with Graph Convolutional Networks. Tech. rep. arXiv:1609.02907. arXiv:1609.02907 [cs, stat] type: article. arXiv, 2017. DOI: 10.48550/arXiv.1609.02907.
[4] S. Abu-El-Haija, A. Kapoor, B. Perozzi, and J. Lee. N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification. Tech. rep. arXiv:1802.08888. arXiv:1802.08888 [cs, stat] type: article. arXiv, 2018. DOI: 10.48550/arXiv.1802.08888.
[5] M. Zhang and Y. Chen. “Link prediction based on graph neural networks”. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18. Montréal, Canada: Curran Associates Inc., 2018, 5171–5181.
[6] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, (2018) “An End-to-End Deep Learning Architecture for Graph Classification" Proceedings of the AAAI Conference on Artificial Intelligence 32(1): DOI: 10.1609/aaai.v32i1.11782.
[7] P. Veliˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. “Graph Attention Networks”. In: arXiv, 2018. DOI: 10.48550/arXiv.1710.10903.
[8] Z. Tao, C. Wu, Y. Liang, and L. He. LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator. Tech. rep. arXiv:2111.03184. arXiv, 2021. DOI: 10.48550/arXiv.2111.03184.
[9] T. Geng, A. Li, R. Shi, C. Wu, T. Wang, Y. Li, P. Haghi, A. Tumeo, S. Che, S. Reinhardt, and M. C. Herbordt. “AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing”. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2020, 922–936. DOI: 10.1109/MICRO50266.2020.00079.
[10] S. Liang, Y. Wang, C. Liu, L. He, H. LI, D. Xu, and X. Li, (2021) “EnGN: A High-Throughput and EnergyEfficient Accelerator for Large Graph Neural Networks" IEEE Transactions on Computers 70(9): 1511–1525. DOI: 10.1109/TC.2020.3014632.
[11] K. Kamalakkannan, G. R. Mudalige, I. Z. Reguly, and S. A. Fahmy. “High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers”. In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). ISSN: 1530- 2075. 2021, 1087–1096. DOI: 10.1109/IPDPS49936.2021.00117.
[12] W. Yan, W. Tong, and X. Zhi. “S-GAT: Accelerating Graph Attention Networks Inference on FPGA Platform with Shift Operation”. In: 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS). ISSN: 2690-5965. 2020, 661–666. DOI: 10.1109/ICPADS51040.2020.00093.
[13] M. Hosseinabady and J. L. Nunez-Yanez, (2020) “A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis" IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39(6): 1272–1285. DOI: 10.1109/TCAD.2019.2912923.
[14] D. Bahdanau, K. Cho, and Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. Tech. rep. arXiv:1409.0473. arXiv, 2016. DOI: 10.48550/arXiv.1409.0473.
[15] B. Zhang, R. Kannan, and V. Prasanna. “BoostGCN: A Framework for Optimizing GCN Inference on FPGA”. In: 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). ISSN: 2576-2621. 2021, 29–39. DOI: 10.1109/FCCM51124.2021.00012.
[16] T. Geng, C. Wu, Y. Zhang, C. Tan, C. Xie, H. You, M. Herbordt, Y. Lin, and A. Li. “I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization”. In: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO ’21. Virtual Event, Greece: Association for Computing Machinery, 2021, 1051– 1063. DOI: 10.1145/3466752.3480113.
[17] M. Yan, L. Deng, X. Hu, L. Liang, Y. Feng, X. Ye, Z. Zhang, D. Fan, and Y. Xie. “HyGCN: A GCN Accelerator with Hybrid Architecture”. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). ISSN: 2378-203X. 2020, 15–29. DOI: 10.1109/HPCA47549.2020.00012.
[18] Z. Zhou, B. Shi, Z. Zhang, Y. Guan, G. Sun, and G. Luo. “BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices”. In: 2021 58th ACM/IEEE Design Automation Conference (DAC). ISSN: 0738-100X. 2021, 1009–1014. DOI: 10.1109/DAC18074.2021.9586181.
[19] N. Srivastava, H. Jin, J. Liu, D. Albonesi, and Z. Zhang. “MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product”. en. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Athens, Greece: IEEE, 2020, 766–780. DOI: 10.1109/MICRO50266.2020.00068.
[20] Z. Xu, J. Yu, C. Yu, H. Shen, Y. Wang, and H. Yang. “CNN-based Feature-point Extraction for Realtime Visual SLAM on Embedded FPGA”. In: 2020 IEEE 28th Annual International Symposium on FieldProgrammable Custom Computing Machines (FCCM). ISSN: 2576-2621. 2020, 33–37. DOI: 10.1109/FCCM48280.2020.00014.