Talk by Korbinian Strimmer (Imperial)
Title: Computationally Efficient Approaches for Genomic Signal Identification
Abstract: The analysis of large-scale genomics data requires methods that are not only statistically efficient in high-dimensional settings with rare and weak effects and complex dependencies, but also computationally efficient. In my talk I will discuss two complementary strategies: First, I’ll advocate the use of “higher criticism” as a simple yet remarkably effective approach for feature selection and signal identification. Upon closer inspection HC also turns out to have a Bayesian interpretation and connection with local false discovery rates. Second, I discuss optimal whitening and decorrelation procedures for preprocessing with the aim of facilitating subsequent variable selection. For orthogonalized variables designed to be, on component level, maximally correlated to the original untransformed variables, this uniquely leads to correlation-adjusted test statistics (such as CAT and CAR scores), which offer a simple yet powerful means for variable ranking under correlation. CAT-CAR-type whitening can also be performed very effectively even for high-dimensional data, with computational complexity in the order of O(r^3) where r=min(n,d) is the rank of the data matrix. Korbinian Strimmer(Joint work Alex Lewin, Agnan Kessy, Bernd Klaus, Verena Zuber)