TY - CPAPER AB - In this paper we discuss a high performance implementation for Convolutional Neural Networks (CNNs) inference on the latest generation of Dataflow Engines (DFEs). We discuss the architectural choices made during the design phase taking into account the DFE chip properties. We then perform design space exploration, considering the memory bandwidth and resources utilisation constraints derived from the used DFE and the chosen architecture. Finally, we discuss the high performance implementation and compare the obtained performance against other implementations, showing that our proposed design reaches 2,450 GOPS when running VGG16 as a test case. AU - Voss,N AU - Bacis,M AU - Mencer,O AU - Gaydadjiev,G AU - Luk,W DO - 10.1109/ICCD.2017.77 EP - 438 PB - IEEE PY - 2017/// SN - 1063-6404 SP - 435 TI - Convolutional Neural Networks on Dataflow Engines UR - http://dx.doi.org/10.1109/ICCD.2017.77 UR - http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000424789300067&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202 UR - http://hdl.handle.net/10044/1/61591 ER -