I am working on various statistical and mathematical problems in genomics. I am particularly interested in building mathematical models to identify genetic variants in high-throughput genomics data - including genotyping microarrays and next generation sequence data - with the ultimate aim of understanding the functional impact and evolutionary history of these variants.
ECCB 2012 report: ECCB 2012 report
Yhap software for identifying haplogroups from low coverage sequence data: Yhap_0.51
ExomeCNVTest software : ExomeCNVTest_0.51
SOAP-popIndel software for genotyping indels in exome data
cnvHiTSeq - software for detecting and genotyping CNVs in WGS data:
cnvPipe - software to enable CNV meta analysis:
Software for converting IMPUTE format to format used by MultiPhen software:
vntrTest is a program for assessing association of VNTR fragment length genotypes with either continuous or case-control outcomes.
cnvHap is a program for joint copy number genotyping, which uses a haplotype model of copy number variation and integrates data from multiple platforms. It also carries out CN association.
polyHap is a program for phasing polyploids and copy number regions. See http://dx.doi.org/10.1186/1471-2105-9-513 for more details.
The first version was designed just for phasing polyploid regions (with the restriction that the ploidy is fixed across the entire region of analysis).
polyHap(v1) (right click, and save-as polyHap.zip)
We have extended polyHap to remove this restriction, so that it can phase CNV regions (from pre-calculated CNV/SNP genotypes):
polyHap(v2) (right click, and save-as polyHapv2.zip)
AncesHC is a program for determining the haplotype structure of a population sample from genotype data, and then testing for association of these haplotypes with either a binary or continous outcome. See http://dx.doi.org/10.1093/bioinformatics/btn071 for more details.
metaMapper is a program for flexible, scalable GWAS meta-analysis and visualisation.
Software for simulating sequence level data with inversions. See http://dx.doi.org/10.1093/bioinformatics/btq029 for more details. Developed in conjunction with Clive Hoggart and Paul O'Reilly.
Pseudogene inference from loss of constraint (PSILC)
Software for identifying pseudogenes via loss of evolutionary constraint: PSILC version 1.21
et al., 2016, Complete Genome Sequence of Klebsiella quasipneumoniae subsp. similipneumoniae Strain ATCC 700603., Genome Announc, Vol:4
et al., 2016, Point Mutations in Exon 1B of APC Reveal Gastric Adenocarcinoma and Proximal Polyposis of the Stomach as a Familial Adenomatous Polyposis Variant, American Journal of Human Genetics, Vol:98, ISSN:0002-9297, Pages:830-842
et al., 2016, Realtime analysis and visualization of MinION sequencing data with npReader, Bioinformatics, Vol:32, ISSN:1367-4803, Pages:764-766
et al., 2016, Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences., Nat Genet, Vol:48, Pages:593-599
et al., 2015, A global reference for human genetic variation, Nature, Vol:526, ISSN:0028-0836, Pages:68-+