Data gathering is inextricably linked with data processing; without the ability to recover useful information, all effort spent designing a good instrument and building it is wasted. As such, we often develop new algorithms and processing techniques to either acquire data more efficiently, or make the most of it once it's been gathered. There is a strong interest in the group for processing of hyperspectral datasets, such as Raman maps, but this is by no means the only field of interest. Furthermore, by default we open-source all our code; this can be found on the Code and Resources page.

Hyperspectral imaging

Smoothing

Smoothing is a common step in almost all data processing tasks. Because of the subjective nature of smoothing, it is desirable to find some kind of criterion by which the process can be judged. In this paper, we proposed the analysis of the residual (the result of subtracting the smoothed data from the raw data) as a metric for smoothing; when the residual is a good approximation of noise, the smoothing process terminates. The criterion used in this paper is the Anderson-Darling Statistic, which can assess whether a sample is drawn from a given probability distribution (normally Gaussian) with a given probability.

Baseline correction

While nonzero baselines are common in many analytical techniques, they are especially problematic in Raman microscopy, most often as a result of unwanted sample fluorescence. Researchers have long sought to minimize this fluorescence by increasing the excitation wavelength (since fluorophores that absorb in the red and infrared are rare), but since the Raman effect scales as inversely proportional to the fourth power of the wavelength, the scattering intensity drops precipitously under these conditions. One partial solution is to simply correct for the baseline, but like the case of smoothing, it is difficult to assess whether the baseline is being correctly subtracted.

The approach taken was to perform a smoothing step (see above) and analyse magnitude of the second derivative, averaged in order to optimally approximate a Lorentzian function. Further details can be found in the paper.

Source separation

Blind source separation can be well explained in the context of the so-called cocktail party problem. This model consists of a number of speakers in a room, and a number of microphones at different locations. The locations of speakers and microphones is random and unknown, and the signal from each speaker at each microphone is a function of the distance between them. The measured data at each microphone is simply the sum of all the signals from all the speakers. The goal is to isolate the independent signal from each speaker, given the data from all of the microphones. This can be achieved by summing together the data from each microphone, weighted by a constant, such that all the unwanted speakers cancel out and only one remains. The difficult therefore lies in determining what these weightings should be.

One approach, taken by a technique called Independent Component Analysis, is to find the weightings that produce the most 'non-Gaussian-distributed' unmixed signals. This is justified by the central limit theorem, which states that the sum of independent randomly distributed variables from any distribution will tend towards a normal (Gaussian) distribution. The approach proposed in our paper was to maximise the smoothness of the separated signals. It was later realized that this approach is by no means novel, although the application to hyperspectral datasets possibly is.

Fixed pattern noise correction

This is work in progress - once we've published it, details will be available here.

Errors in multiple particle tracking

This is a very specific use-case, but still makes for an interesting discussion. Multiple particle tracking experiments work by placing lots of particles into a medium (usually a gel or a liquid) and tracking their motion on a camera. By very carefully fitting the image to a known one, particle motion on the order of nanometers can be measured, and consequently parameters such as the mean squared displacement and the viscosity can be calculated. Now because microscopes are often not stable to nanometer-scale motion, it is common to simply take the mean of all the particle motion (the so-called common mode) and assume that it is due to motion of the microscope itself; thus it can be safely subtracted, and the result is the motion of the particles themselves. While this is valid in the case of large numbers of particles, in the case of small numbers of particles, some of this common mode noise is due to the motion of the particles themselves; it is entirely possible for them to all move in a similar direction by chance. Therefore, if only a small number of particles are being tracked, the resulting measurement will underestimate the mean-squared displacement.

Fortunately, it is possible to model the effect of this error and correct for it. The resulting data accurately estimates the viscosity of the medium both in simulations and in real world experiments; if you're interested in reading more, you can find the paper here.