Imperial College London

Professor Peter Y. K. Cheung

Faculty of EngineeringDyson School of Design Engineering

Head of the Dyson School of Design Engineering



+44 (0)20 7594 6200p.cheung Website




Mrs Wiesia Hsissen +44 (0)20 7594 6261




910BElectrical EngineeringSouth Kensington Campus






BibTex format

author = {Ang, SS and Constantinides, GA and Luk, W and Cheung, PYK},
doi = {10.1007/s11554-008-0082-0},
pages = {289--302},
title = {Custom parallel caching schemes for hardware-accelerated image compression},
url = {},
volume = {3},
year = {2008}

RIS format (EndNote, RefMan)

AB - In an effort to achieve lower bandwidth requirements, video compression algorithms have become increasingly complex. Consequently, the deployment of these algorithms on field programmable gate arrays (FPGAs) is becoming increasingly desirable, because of the computational parallelism on these platforms as well as the measure of flexibility afforded to designers. Typically, video data are stored in large and slow external memory arrays, but the impact of the memory access bottleneck may be reduced by buffering frequently used data in fast on-chip memories. The order of the memory accesses, resulting from many compression algorithms are dependent on the input data (Jain in Proceedings of the IEEE, pp. 349–389, 1981). These data-dependent memory accesses complicate the exploitation of data re-use, and subsequently reduce the extent to which an application may be accelerated. In this paper, we present a hybrid memory sub-system which is able to capture data re-use effectively in spite of data-dependent memory accesses. This memory sub-system is made up of a custom parallel cache and a scratchpad memory. Further, the framework is capable of exploiting 2D spatial locality, which is frequently exhibited in the access patterns of image processing applications. In a case study involving the quad-tree structured pulse code modulation (QSDPCM) application, the impact of data dependence on memory accesses is shown to be significant. In comparison with an implementation which only employs an SPM, performance improvements of up to 1.7× and 1.4× are observed through actual implementation on two modern FPGA platforms. These performance improvements are more pronounced for image sequences exhibiting greater inter-frame movements. In addition, reductions of on-chip memory resources by up to 3.2× are achievable using this framework. These results indicate that, on custom hardware platforms, there is substantial scope for improvement in the capture of data re-us
AU - Ang,SS
AU - Constantinides,GA
AU - Luk,W
AU - Cheung,PYK
DO - 10.1007/s11554-008-0082-0
EP - 302
PY - 2008///
SP - 289
TI - Custom parallel caching schemes for hardware-accelerated image compression
UR -
VL - 3
ER -