14:00 – 15:00 – Adam Quinn Jaffe (Columbia University)

Title: Consistency and Inconsistency in K-Means Clustering

Abstract: A celebrated results of Pollard proves the asymptotic consistency of k-means clustering when the population distribution has finite variance. In this talk, we point out that the population-level k-means clustering problem is, in fact, well-posed whenever the population distribution has finite expectation, and we investigate whether some form of asymptotic consistency holds in this setting. Surprisingly, the answer is no: We construct examples in which there is a unique set of population cluster centers, but where the empirical cluster centers oscillate to plus and minus infinity as the number of data increases. We show that this non-convergence is due to an extreme form of imbalance whereby a few outlying samples create clusters that contain very few points. Based on joint work with Moïse Blanchard and Nikita Zhivotovskiy.

 

Getting here