Publications from our Researchers

Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.  

Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

    Munro RE, Guo Y, 2009,

    Solutions for complex, multi data type and multi tool analysis: principles and applications of using workflow and pipelining methods.

    , Methods in molecular biology (Clifton, N.J.), Vol: 563, Pages: 259-271, ISSN: 1064-3745

    Analytical workflow technology, sometimes also called data pipelining, is the fundamental component that provides the scalable analytical middleware that can be used to enable the rapid building and deployment of an analytical application. Analytical workflows enable researchers, analysts and informaticians to integrate and access data and tools from structured and non-structured data sources so that analytics can bridge different silos of information; compose multiple analytical methods and data transformations without coding; rapidly develop applications and solutions by visually constructing analytical workflows that are easy to revise should the requirements change; access domain-specific extensions for specific projects or areas, for example, text extraction, visualisation, reporting, genetics, cheminformatics, bioinformatics and patient-based analytics; automatically deploy workflows directly into web portals and as web services to be part of a service-oriented architecture (SOA). By performing workflow building, using a middleware layer for data integration, it is a relatively simple exercise to visually design an analytical process for data analysis and then publish this as a service to a web browser. All this is encapsulated into what can be referred to as an ’Embedded Analytics’ methodology which will be described here with examples covering different scientifically focused data analysis problems.

    Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, Holloway E, Lukk M, Malone J, Mani R, Pilicheva E, Rayner TF, Rezwan F, Sharma A, Williams E, Bradley XZ, Adamusiak T, Brandizi M, Burdett T, Coulson R, Krestyaninova M, Kurnosov P, Maguire E, Neogi SG, Rocca-Serra P, Sansone S-A, Sklyar N, Zhao M, Sarkans U, Brazma Aet al., 2009,

    ArrayExpress update-from an archive of functional genomics experiments to the atlas of gene expression

    , NUCLEIC ACIDS RESEARCH, Vol: 37, Pages: D868-D872, ISSN: 0305-1048
    Curcin V, Ghanem M, Guo Y, Darlington Jet al., 2008,

    Mining adverse drug reactions with e-science workflows.

    , Proceedings of the 4th Cairo International Biomedical Engineering Conference, 2008. CIBEC 2008
    Ghanem M, Curcin V, Guo Y, 2008,

    GoTag: A Case Study in Using a Shared UK e-Science Infrastructure for the Automatic Annotation of Medline Documents

    Ghanem M, Curcin V, Wendel P, Guo Yet al., 2008,

    Building and using analytical workflows in Discovery Net

    , Data Mining Techniques in Grid Environments. Dubitzky, Werner (Ed)., Publisher: Wiley-Blackwell, Pages: 119-140, ISBN: 9780470512586

    The Discovery Net platform is built around a workflow model for integrating distributed data sources and analytical tools. The platform was originally designed to support the design and execution of distributed data mining tasks within a grid-based environment. However, over the years it has evolved into a generic data analysis platform with applications in such diverse areas as bioinformatics, cheminformatics, text mining and business intelligence. In this work we present our experience in designing the platform and map out the evolution paths for a workflow language, and its architecture, that need to address the requirements of different scientific domains.

    Ma Y, Richards M, Ghanem M, Guo Y, Hassard Jet al., 2008,

    Air pollution monitoring and mining based on sensor Grid in London

    , SENSORS, Vol: 8, Pages: 3601-3623, ISSN: 1424-8220
    Curcin V, Ghanem M, Wendel P, Guo Yet al., 2007,

    Heterogeneous workflows in scientific workflow systems

    , 7th International Conference on Computational Science (ICCS 2007), Publisher: SPRINGER-VERLAG BERLIN, Pages: 204-+, ISSN: 0302-9743
    Darlington J, Guo Y, Rüger S, 2007,

    Scalable Query Assistance for Search Engines

    Syed J, Ghanem M, Guo Y, 2007,

    Supporting scientific discovery processes in discovery net

    Cohen J, James C, Rahman S, Curcin V, Ball B, Guo Y, Darlington Jet al., 2006,

    Modelling rail passenger movements through e-science methods

    , 5th UK e-Science All Hands Meeting (AHM 2006), Publisher: NATL E-SCIENCE CENTRE, Pages: 445-448
    Curcin V, Ghanem M, Guo YK, Stathis K, Toni Fet al., 2006,

    Building next generation Service-Oriented Architectures using argumentation agents

    , 3rd International Conference on Grid Services Engineering and Management (GSEM 2006), Publisher: Springer Verlag
    Davis N, Harkema H, Gaizauskas R, Guo Y, Ghanem M, Barnwell T, Guo YK, Ratcliffe Jet al., 2006,

    Three Approaches to GO-Tagging Biomedical Abstracts

    , SMBM 2006: Second International Symposium on Semantic Mining in Biomedicine., ISSN: 1613-0073
    El-Shishiny H, Soliman THA, Emam I, 2006,

    Mining drug targets: The challenges and a proposed framework

    , Pages: 239-244, ISSN: 1530-1346

    Drug target identification, being the first phase in drug discovery is becoming an overly time consuming process and in many cases produces inefficient results due to failure of conventional approaches to investigate large scale data. The main goal of this work is to identify drug targets, where there are genes or proteins associated with specific diseases. With the help of Microarray technology, the relationship between biological entities such as protein-protein, gene-gene and related chemical compounds are used as a means to identify drug targets. In this work, we focus on the challenges facing drug target discovery and propose a novel unified framework for mining disease related drug targets. © 2006 IEEE.

    Kakas A, Tamaddoni Nezhad A, Muggleton S, Chaleil Ret al., 2006,

    Application of abductive ILP to learning metabolic network inhibition from temporal data

    , Publisher: Springer, Pages: 209-230, ISSN: 0885-6125

    In this paper we use a logic-based representation and a combination of Abduction and Induction to model inhibition in metabolic networks. In general, the integration of abduction and induction is required when the following two conditions hold. Firstly, the given background knowledge is incomplete. Secondly, the problem must require the learning\r\nof general rules in the circumstance in which the hypothesis language is disjoint from the observation language. Both these conditions hold in the application considered in this paper. Inhibition is very important from the therapeutic point of view since many substances designed to be used as drugs can have an inhibitory effect on other enzymes. Any system able to predict the inhibitory effect of substances on the metabolic network would therefore be very useful in assessing the potential harmful side-effects of drugs. In modelling the phenomenon\r\nof inhibition in metabolic networks, background knowledge is used which describes the network topology and functional classes of inhibitors and enzymes. This background knowledge, which represents the present state of understanding, is incomplete. In order to overcome this incompleteness hypotheses are considered which consist of a mixture of specific inhibitions of enzymes (ground facts) together with general (non-ground) rules which predict classes of enzymes likely to be inhibited by the toxin. The foreground examples are derived from\r\nin vivo experiments involving NMR analysis of time-varying metabolite concentrations in rat urine following injections of toxins. The modelÆs performance is evaluated on training and test sets randomly generated from a real metabolic network. It is shown that even in\r\nthe case where the hypotheses are restricted to be ground, the predictive accuracy increases with the number of training examples and in all cases exceeds the default (majority class).\r\nExperimental results also suggest that when sufficient training data is provided

    Liu J, Ghanem M, Curcin V, Haselwimmer C, Guo Y, Morgan G, Mish Ket al., 2006,

    Achievements and Experiences from a Grid-Based Earthquake Analysis and Modelling Study \r\n

    , Publisher: IEEE Computer Society Press

    We have developed and used a grid-based geoinformatics infrastructure and analytical methods for investigating the relationship between macro and microscale earthquake deformational processes by linking geographically distributed and computationally intensive earthquake monitoring and modelling tools. Using this infrastructure, measurement of lateral co-seismic deformation is carried out with imageodesy algorithms running on servers at the London eScience Centre. The resultant deformation field is used to initialise geomechanical simulations of the earthquake deformation running on supercomputers based at the University of Oklahoma. This paper describes the details of our work, summarizes our scientific results and details our experiences from implementing and testing the distributed infrastructure and analysis workflow.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=607&limit=15&page=7&respub-action=search.html Current Millis: 1508549712299 Current Time: Sat Oct 21 02:35:12 BST 2017