STREAMING DATA ANALYSIS
Recent infrastructural leaps have made it possible to continually record vast amounts of real-time information in a variety of settings. My research focuses on statistical methodology applicable to this challenging data format, wherein:
1. models must be updated on-the-y, without needing arbitrary access to the data history, to minimise storage costs and ensure timely responses.
2. models must be able to handle the evolving nature of data streams, which includes unforeseen disturbances as well as gradual data shifts.
My work relies on hybridisation of Stochastic Approximation theory with state-space Bayesian dynamic modelling. Recent invited talks on the topic include:
- 08.2013 Strategies for Handling the Risk of Obsolete Information in Scorecards, Credit Scoring and Credit Control XIII, Edinburgh.
- 07.2013 Adaptive power priors for Bayesian updating in the presence of drift with applications, Google Tech Talk, Google, Mountain View, California
- 07.2013 Adaptive power priors for Bayesian updating in the presence of drift, Statistics Section, University of British Columbia, Vancouver, Canada
Classification performance measures are a critical component of the statistics and machine learning literatures, as they offer the means by which novel classification methodology is assessed. This line of work revisits popular methods of classification performance to study their theoretical properties, revealing glaring incoherencies in certain popular measures (e.g., the Area Under the ROC Curve), and novel properties in others. It also supports the adoption of the H-measure, a recently proposed coherent alternative to the AUC (www.hmeasure.net)