Imperial College London

ProfessorDenizGunduz

Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Professor in Information Processing
 
 
 
//

Contact

 

+44 (0)20 7594 6218d.gunduz Website

 
 
//

Assistant

 

Ms Joan O'Brien +44 (0)20 7594 6316

 
//

Location

 

1016Electrical EngineeringSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Buyukates:2023:10.1109/tcomm.2022.3166902,
author = {Buyukates, B and Ozfatura, E and Ulukus, S and Gunduz, D},
doi = {10.1109/tcomm.2022.3166902},
journal = {IEEE Transactions on Communications},
pages = {3317--3332},
title = {Gradient coding with dynamic clustering for straggler-tolerant distributed learning},
url = {http://dx.doi.org/10.1109/tcomm.2022.3166902},
volume = {71},
year = {2023}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - Distributed implementations are crucial in speeding up large scale machine learning applications. Distributed gradient descent (GD) is widely employed to parallelize the learning task by distributing the dataset across multiple workers. A significant performance bottleneck for the per-iteration completion time in distributed synchronous GD is straggling workers. Coded distributed computation techniques have been introduced recently to mitigate stragglers and to speed up GD iterations by assigning redundant computations to workers. In this paper, we introduce a novel paradigm of dynamic coded computation, which assigns redundant data to workers to acquire the flexibility to dynamically choose from among a set of possible codes depending on the past straggling behavior. In particular, we propose gradient coding (GC) with dynamic clustering, called GC-DC, and regulate the number of stragglers in each cluster by dynamically forming the clusters at each iteration. With time-correlated straggling behavior, GC-DC adapts to the straggling behavior over time; in particular, at each iteration, GC-DC aims at distributing the stragglers across clusters as uniformly as possible based on the past straggler behavior. For both homogeneous and heterogeneous worker models, we numerically show that GC-DC provides significant improvements in the average per-iteration completion time without an increase in the communication load compared to the original GC scheme.
AU - Buyukates,B
AU - Ozfatura,E
AU - Ulukus,S
AU - Gunduz,D
DO - 10.1109/tcomm.2022.3166902
EP - 3332
PY - 2023///
SN - 0090-6778
SP - 3317
TI - Gradient coding with dynamic clustering for straggler-tolerant distributed learning
T2 - IEEE Transactions on Communications
UR - http://dx.doi.org/10.1109/tcomm.2022.3166902
UR - https://ieeexplore.ieee.org/document/9755943
UR - http://hdl.handle.net/10044/1/96960
VL - 71
ER -