IPC Lab Seminar

If you are interested in attending this event, please get in touch using the contact information listed.

Abstract

Most learning problems, from linear or logistic regression to deep learning, are formulated as optimization problems, where the objective is to pursue a model, which describes the available data best. On the other hand, a trend toward an increasingly networked society has sparked a need for the development and analysis of decentralized learning algorithms, where a collection of intelligent agents coordinate in solving a more challenging inference problem without the need for a central parameter server. Instead, agents exchange information only locally, as defined by a graph topology, resulting in scalable and robust mechanisms.
 

We describe two recent directions in the study learning algorithms over graphs. First, we will deviate from the wide-spread “consensus optimization” setting, where agents are forced to agreement on a common model, resulting in poor performance in heterogeneous settings. We will show how different task-relatedness models give rise to a family of multi-task learning algorithms over graphs, which allow for improved learning performance without the need to force consensus. We will then show how, in the absence of a task-relatedness model, multi-task learning over graphs is still possible via a decentralized variant of model-agnostic meta-learning (MAML).

The second part will study the impact of the loss function chosen to quantify model fit. The dynamics of decentralized learning algorithms with convex loss functions is fairly well understood, yet performance guarantees in non-convex environments have long remained elusive. This is due to the fact that non-convex loss functions can be riddled with local minima and saddle-points, where the gradient vanishes (and hence gradient descent stagnates), while performance can be arbitrarily poor. This is in contrast to the empirical success of deep learning, which gives rise to non-convex loss surfaces, suggesting that stochastic gradient descent, as implemented via backpropagation, avoids saddle-points. We review recent results shedding light on these dynamics and show how decentralized algorithms can continue to match centralized ones, even when it comes to evading saddle-points.

About the speaker

Stefan Vlaski received the B.Sc. degree in Electrical Engineering from Technical University Darmstadt, Germany in 2013 and the M.S. in Electrical Engineering as well as Ph.D. in Electrical and Computer Engineering from the University of California, Los Angeles in 2014 and 2019 respectively. He is currently a postdoctoral researcher at the Adaptive Systems Laboratory, EPFL, Switzerland. His research interests are in machine learning, signal processing, and optimization. His current focus is on the development and study of learning algorithms with a particular emphasis on adaptive and decentralized solutions.