Abstract
There is increasingly interest in developing embedded machine learning hardware as it can offer better performance in terms of privacy, bandwidth efficiency, and scalability. Gradient-boosted decision trees (GBDT) represent a strong candidate as they employ less complex logic, but their efficient implementation in field programmable gate array (FPGA) needs
to be explored in detail. In this paper, we propose sophisticated quantisation approaches to balance the dual goals of efficiency and performance. In particular, we introduce quantisation-aware training of GBDT for integer-only and binary arithmetic. Results are presented for implementations on a Zynq UltraScale+ MPSoC FPGA with the best design using only 170 Look-up Tables and 233 flip-flops at a clock speed of 724 MHz. Implementations focused on network intrusion detection and jet substructure classification for large-scale physics experiments are explored. An order of magnitude less FPGA resources are used whilst offering extremely high throughput rate and maintaining accuracy. Code is available at https://github.com/malsharari/QATGBDT.
to be explored in detail. In this paper, we propose sophisticated quantisation approaches to balance the dual goals of efficiency and performance. In particular, we introduce quantisation-aware training of GBDT for integer-only and binary arithmetic. Results are presented for implementations on a Zynq UltraScale+ MPSoC FPGA with the best design using only 170 Look-up Tables and 233 flip-flops at a clock speed of 724 MHz. Implementations focused on network intrusion detection and jet substructure classification for large-scale physics experiments are explored. An order of magnitude less FPGA resources are used whilst offering extremely high throughput rate and maintaining accuracy. Code is available at https://github.com/malsharari/QATGBDT.
Original language | English |
---|---|
Journal | IEEE Transactions on Circuits and Systems I: Regular Papers |
Publication status | Accepted - 14 Aug 2024 |