Imperial College London

ProfessorPeterPietzuch

Faculty of EngineeringDepartment of Computing

Professor of Distributed Systems
 
 
 
//

Contact

 

+44 (0)20 7594 8314prp Website

 
 
//

Location

 

442Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Koliousis:2019:10.14778/3342263.3342276,
author = {Koliousis, A and Watcharapichat, P and Weidlich, M and Mai, L and Costa, P and Pietzuch, P},
doi = {10.14778/3342263.3342276},
journal = {Proceedings of the VLDB Endowment},
title = {Crossbow: scaling deep learning with small batch sizes on multi-GPU servers},
url = {http://dx.doi.org/10.14778/3342263.3342276},
volume = {12},
year = {2019}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - Deep learning models are trained on servers with many GPUs, andtraining must scale with the number of GPUs. Systems such asTensorFlow and Caffe2 train models with parallel synchronousstochastic gradient descent: they process a batch of training data ata time, partitioned across GPUs, and average the resulting partialgradients to obtain an updated global model. To fully utilise allGPUs, systems must increase the batch size, which hinders statisticalefficiency. Users tune hyper-parameters such as the learning rate tocompensate for this, which is complex and model-specific.We describeCROSSBOW, a new single-server multi-GPU sys-tem for training deep learning models that enables users to freelychoose their preferred batch size—however small—while scalingto multiple GPUs.CROSSBOWuses many parallel model replicasand avoids reduced statistical efficiency through a new synchronoustraining method. We introduceSMA, a synchronous variant of modelaveraging in which replicasindependentlyexplore the solution spacewith gradient descent, but adjust their searchsynchronouslybased onthe trajectory of a globally-consistent average model.CROSSBOWachieves high hardware efficiency with small batch sizes by poten-tially training multiple model replicas per GPU, automatically tuningthe number of replicas to maximise throughput. Our experimentsshow thatCROSSBOWimproves the training time of deep learningmodels on an 8-GPU server by 1.3–4×compared to TensorFlow.
AU - Koliousis,A
AU - Watcharapichat,P
AU - Weidlich,M
AU - Mai,L
AU - Costa,P
AU - Pietzuch,P
DO - 10.14778/3342263.3342276
PY - 2019///
SN - 2150-8097
TI - Crossbow: scaling deep learning with small batch sizes on multi-GPU servers
T2 - Proceedings of the VLDB Endowment
UR - http://dx.doi.org/10.14778/3342263.3342276
UR - https://dl.acm.org/doi/10.14778/3342263.3342276
UR - http://hdl.handle.net/10044/1/75907
VL - 12
ER -