14:30 Joaquin Miguez (Universidad Carlos III de Madrid)
Title: Iterative importance sampling with transformed weights
Abstract: Iterative importance sampling schemes (often termed population Monte Carlo) are popular tools for problems that involve the approximation of posterior probability distributions in hierarchical or otherwise complex models. The technique is simple and conceptually appealing, yet limited by the weight degeneracy problem that plagues importance sampling based methods in general. Indeed, population Monte Carlo algorithms have been reported to perform poorly when the dimension of either the observations or the variables of interest is high. To alleviate this difficulty, we introduce a new method that performs a nonlinear transformation of the importance weights. This operation reduces the weight variation, hence it avoids degeneracy and increases the efficiency of the importance sampling scheme, specially when drawing from proposal functions which are poorly adapted to the true posterior. We have applied the methodology to the challenging problem of estimating the rate parameters of a stochastic kinetic model (SKM). SKMs are multivariate systems that model molecular interactions in biological and chemical problems. We introduce a particularization of the proposed algorithm to SKMs and present numerical results, including a comparison with a state of the art particle MCMC algorithm that tackles the same problem.
16:00 Lewis Evans (Imperial College)
Title: Active Learning Performance Assessment
Abstract: Classification is a major area in statistical inference and machine learning. Within the context of classification, Active Learning (AL) methods seek to improve classifier performance when labels are expensive or scarce.Analysis of AL performance reveals many surprising intricacies. To address these intricacies, we present a new methodology to assess AL performance. There are many factors that we expect to influence AL performance, for example the classifier and the classification task. It is an open question how AL performance depends on these factors. To explore that question we present a broad experimental investigation that systematically varies those factors.
16:00 Georg Hahn (Imperial College)
Title: Implementing False Discovery Rate Procedures for Simulation-Based Tests With Bounded Risk
Abstract: Consider multiple hypotheses to be tested for statistical significance using a procedure which controls the False Discovery Rate (FDR), e.g. the method by Benjamini-Hochberg. Instead of observing all p- values directly, we consider the case where they can only be approximated by simulation. This occurs e.g. for bootstrap or permutation tests. Naively, one could use an equal number of samples for the estimation of the p-value of each hypothesis and then apply the original FDR procedure. This technique is certainly not the most efficient one, nor does it give any guarantees on how the results relate to the FDR procedure applied to the true p-values. This talk presents MMCTest, a more sophisticated approach that uses fewer samples for all those hypotheses which can already be classified with sufficient confidence and more samples for all those which are still unidentified. The algorithm is designed to give, with high probability, the same classification as the one based on the exact p-values. A simulation study on actual biological data, given by a microarray dataset of gene expressions, shows that for a realistic precision, MMCTest draws level with the performance of current methods which unlike MMCTest do not give a guarantee on its classification being correct. An ad-hoc variant of MMCTest which forces a complete classification outperforms established methods. The idea behind MMCTest can also be extended to a wider class of multiple testing procedures. This is possible irrespective of the error criterion controlled by a particular procedure. For step-up and step-down procedures, a simple criterion suffices to check whether a procedure allows for classifying hypotheses using the MMCTest approach. By verifying this condition, it can be shown that all common procedures allow for an efficient classification of multiple hypotheses, for example the Bonferroni or Sidak correction controlling the FWER, the Holm and Hochberg procedures, the Hommel procedure and many more.