Publications

BibTex format

@article{Schulte:2026:10.1145/3801979,
author = {Schulte, JF and Ramhorst, B and Sun, C and Mitrevski, J and Ghielmetti, N and Lupi, E and Danopoulos, D and Lonar, V and Duarte, J and Burnette, D and Laatu, L and Tzelepis, S and Axiotis, K and Berthet, Q and Wang, H and White, P and Demirsoy, S and Colombo, M and Aarrestad, TK and Summers, S and Pierini, M and Di, Guglielmo G and Ngadiuba, J and Campos, J and Hawks, B and Gandrakota, A and Fahim, F and Tran, N and Constantinides, GA and Que, Z and Luk, W and Tapper, A and Hoang, D and Paladino, N and Harris, PC and Lai, BC and Valentin, M and Forelli, R and Ogrenci, S and Gerlach, L and Brooks, Flynn R and Liu, M and Diaz, D and Koda, EE and Quinnan, M and Solares, RM and Parajuli, S and Neubauer, MS and Herwig, C and Tsoi, HF and Rankin, D and Hsu, SC and Hauck, S},
doi = {10.1145/3801979},
journal = {ACM Transactions on Reconfigurable Technology and Systems},
title = {hls4ml: A Flexible, Open Source Platform for Deep Learning Acceleration on Reconfigurable Hardware},
url = {http://dx.doi.org/10.1145/3801979},
volume = {19},
year = {2026}
}

Download

RIS format (EndNote, RefMan)

TY  - JOUR
AB  - We present hls4ml, a free and open source platform that translates machine learning (ML) models from modern deep learning frameworks into high-level synthesis (HLS) code that can be integrated into full designs for field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs). With its flexible and modular design, hls4ml supports a large number of deep learning frameworks and can target HLS compilers from several vendors, including Vitis HLS, Intel oneAPI and Catapult HLS. Together with a wider eco-system for software-hardware co-design, hls4ml has enabled the acceleration of ML inference in a wide range of commercial and scientific applications where low latency, resource usage, and power consumption are critical. In this article, we describe the structure and functionality of the hls4ml platform. The overarching design considerations for the generated HLS code are discussed, together with selected performance results.
AU  - Schulte,JF
AU  - Ramhorst,B
AU  - Sun,C
AU  - Mitrevski,J
AU  - Ghielmetti,N
AU  - Lupi,E
AU  - Danopoulos,D
AU  - Lonar,V
AU  - Duarte,J
AU  - Burnette,D
AU  - Laatu,L
AU  - Tzelepis,S
AU  - Axiotis,K
AU  - Berthet,Q
AU  - Wang,H
AU  - White,P
AU  - Demirsoy,S
AU  - Colombo,M
AU  - Aarrestad,TK
AU  - Summers,S
AU  - Pierini,M
AU  - Di,Guglielmo G
AU  - Ngadiuba,J
AU  - Campos,J
AU  - Hawks,B
AU  - Gandrakota,A
AU  - Fahim,F
AU  - Tran,N
AU  - Constantinides,GA
AU  - Que,Z
AU  - Luk,W
AU  - Tapper,A
AU  - Hoang,D
AU  - Paladino,N
AU  - Harris,PC
AU  - Lai,BC
AU  - Valentin,M
AU  - Forelli,R
AU  - Ogrenci,S
AU  - Gerlach,L
AU  - Brooks,Flynn R
AU  - Liu,M
AU  - Diaz,D
AU  - Koda,EE
AU  - Quinnan,M
AU  - Solares,RM
AU  - Parajuli,S
AU  - Neubauer,MS
AU  - Herwig,C
AU  - Tsoi,HF
AU  - Rankin,D
AU  - Hsu,SC
AU  - Hauck,S
DO  - 10.1145/3801979
PY  - 2026///
SN  - 1936-7406
TI  - hls4ml: A Flexible, Open Source Platform for Deep Learning Acceleration on Reconfigurable Hardware
T2  - ACM Transactions on Reconfigurable Technology and Systems
UR  - http://dx.doi.org/10.1145/3801979
VL  - 19
ER  -

Download

Citation

BibTex format

RIS format (EndNote, RefMan)