Imperial College London


Faculty of EngineeringDepartment of Computing

Professor of Computer Engineering



+44 (0)20 7594 8313w.luk Website




434Huxley BuildingSouth Kensington Campus






BibTex format

author = {Denholm, S and Inoue, H and Takenaka, T and Becker, T and Luk, W},
doi = {10.1587/transinf.2014RCP0011},
journal = {IEICE Transactions on Information and Systems},
pages = {288--297},
title = {Network-level FPGA acceleration of low latency market data feed arbitration},
url = {},
volume = {E98D},
year = {2015}

RIS format (EndNote, RefMan)

AB - Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex- 5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.
AU - Denholm,S
AU - Inoue,H
AU - Takenaka,T
AU - Becker,T
AU - Luk,W
DO - 10.1587/transinf.2014RCP0011
EP - 297
PY - 2015///
SN - 0916-8532
SP - 288
TI - Network-level FPGA acceleration of low latency market data feed arbitration
T2 - IEICE Transactions on Information and Systems
UR -
VL - E98D
ER -