The UK Med-Bio project has established significant computational resources to help meet the project's goals of making major advances in understanding the causes and reasons for disease progression of common human diseases. These resources are available to project members and other affiliates.
The major computational resources of the project are being maintained by the ICT HPC group, and are integrated with the centralcollege HPC service. These consist of:
- Commodity cluster compute nodes - 3280 CPU cores (Dell PowerEdge C6220 dual-CPU nodes with 10 core Xeon E5-2660v2 2.2GHz and 128 Gb RAM)
- Cluster scratch disk - ~500Tb
- Cache-coherent memory nodes – SGI UV 2000 - 640 cores, 8 TB RAM, 450TB usable locally attached scratch
There are additional infrastructure servers and storage maintained by UK Med-Bio personnel for hosting relational databases, web-based systems etc. These consist of:
- 8 x Dell R920 servers each with 4 x 10 core CPUs, 1-2 TB RAM, 36Tb local storage and FDR Infiniband connectivity to the storage fabric
The project has established a multi-petabyte, tiered storage system, which is mirrored between two sites for resiliance and to provide disaster-recovery capabilities.
The top tier of the storage system is based on IBM's GPFS filesystem, which provides a scalable, high-performance clustered filesystem. The storage hardware is provided by DataDirect Networks (DDN), and consists of a GridScalar GS12K20 at the primary site, with a GS7K based system on the secondary site. Each provides around 800 Tb of high-performance storage and are connected with an FDR infiniband fabric. Access to the storage is provided by CIFS and NFS protocols through separate file-servers connected via infiniband.
The second storage tier is also provided by DDN, in the form of the WOS (Web Object Store) system. Each site has 7 WOS archive nodes, which provide 2 PB of storage. This provides a highly scalable storage capability which permits extenstive metadata to be stored alongside every object which can be accessed directly through a programatic interface by data management software. DDN's WOSBridge software is used for migrating between tiers 1 and 2 according to policies including data age and storage utilisation. This allows active data to be maintained on the faster storage, with inactive data migrated to slower disk but then retrieved to tier 1 automatically as required.
A tape-based system forms tier 3 of the storage, based around two Spectralogic T950 tape libraries, each with 6 x LTO6 drives and 670 licensed slots, providing ~2PB offline storage capacity. IBM's Tivoli Storage Manager (TSM) is used for automatic data migration to tier 3, which is intended for long term storage of infrequently required data.
All storage tiers are mirrored between the primary and secondary locations to provide robust, resiliant and secure storage.
Additional high-performance SSD arrays are also available, and connected to the storage fabric via infiniband
- 1 x Huawei Dorado 2100G2 Flash Array with 5Tb SLC SSDs
- 1 x Huawei Dorado 2100G2 Flash Array with 20Tb MLC SSs
Software and Databases
The project also has licenses for a number of software package and databases for use by members of the project.
- Ingenuity Pathway Assist (pathway analyses, curated data)–unlimited datasets, 4 concurrent users, 5 yrs
- Matlab distributed computing 128 cores 3 yrs - and toolboxes (Compiler, Bioinformatics, Parallel Computing, Fuzzy Logic, Neural Network, Statistics)
- KEGG (non-commercial) 5 years
- Usearch (clustering for metagenomics) 40 cores, perpetual
- Partek (transcriptomics) 1 floating license, 3 years
- Spotfire (data visualisation) 5 licenses, 5 years