Imperial College London

DrEdCurry

Faculty of MedicineDepartment of Surgery & Cancer

Honorary Lecturer
 
 
 
//

Contact

 

+44 (0)20 7594 5943e.curry

 
 
//

Location

 

Open PlanInstitute of Reproductive and Developmental BiologyHammersmith Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Curry:2014:10.1186/s12859-014-0355-5,
author = {Curry, EWJ},
doi = {10.1186/s12859-014-0355-5},
journal = {BMC Bioinformatics},
title = {A framework for generalized subspace pattern mining in high-dimensional datasets},
url = {http://dx.doi.org/10.1186/s12859-014-0355-5},
volume = {15},
year = {2014}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - BackgroundA generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix. This approach is particularly well-suited to analysis of heterogeneous molecular biology datasets, such as those collected from populations of cancer patients. Different definitions of biclusters will offer different opportunities to discover information from datasets, making it pertinent to tailor the desired patterns to the intended application. This paper introduces ‘GABi’, a customizable framework for subspace pattern mining suited to large heterogeneous datasets. Most existing biclustering algorithms discover biclusters of only a few distinct structures. However, by enabling definition of arbitrary bicluster models, the GABi framework enables the application of biclustering to tasks for which no existing algorithm could be used.ResultsFirst, a series of artificial datasets were constructed to represent three clearly distinct scenarios for applying biclustering. With a bicluster model created for each distinct scenario, GABi is shown to recover the correct solutions more effectively than a panel of alternative approaches, where the bicluster model may not reflect the structure of the desired solution. Secondly, the GABi framework is used to integrate clinical outcome data with an ovarian cancer DNA methylation dataset, leading to the discovery that widespread dysregulation of DNA methylation associates with poor patient prognosis, a result that has not previously been reported. This illustrates a further benefit of the flexible bicluster definition of GABi, which is that it enables incorporation of multiple sources of data, with each data source treated in a specific manner, leading to a means of intelligent integrated subspace pattern mining across multiple datasets.ConclusionsThe GABi framework enables discovery of biologically relevant patterns of any specified structure from large collections of genomic data. An R implemen
AU - Curry,EWJ
DO - 10.1186/s12859-014-0355-5
PY - 2014///
SN - 1471-2105
TI - A framework for generalized subspace pattern mining in high-dimensional datasets
T2 - BMC Bioinformatics
UR - http://dx.doi.org/10.1186/s12859-014-0355-5
UR - http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000347428700001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
UR - http://hdl.handle.net/10044/1/51155
VL - 15
ER -