For the UK MED-BIO project, we aim to support all aspects of research using our resources, not just analyses involving use of High Performance computing clusters.
We have large memory servers that are used to provide a range of additional services to the project. These comprise Dell R920 servers each with 4 x 10 core CPUs with hyper-threading, 1-2 TB RAM, 36Tb local (fibre-attached) storage and FDR Infiniband connectivity that allows direct GPFS access to tier one of the MED-BIO storage systems. At present, 3 of these servers have been set up as an Open Nebula Cluster. This is a high availability (HA) virtualization platform. The three servers share access to a parallel file system (GlusterFS) making use of the direct-attached local storage and provide a platform for load-balanced persistent Virtual machines (VMs) that offer failover performance in the event of an individual server outage. More resources can be added to the cluster as required. At present it offers access to 120 cores, 3 TB Ram in total.
The major computational resources of the project are being maintained by the ICT RCS group, and are integrated with the central college RCS service. These consist of:
- Commodity cluster compute nodes - 3280 CPU cores (Dell PowerEdge C6220 dual-CPU nodes with 10 core Xeon E5-2660v2 2.2GHz and 128 Gb RAM)
- Cluster scratch disk - ~500Tb
- Cache-coherent memory nodes – SGI UV 2000 - 640 cores, 8 TB RAM, 450TB usable locally attached scratch
The project has established a multi-petabyte, tiered storage system, which is mirrored between two sites for resiliance and to provide disaster-recovery capabilities.
The top tier of the storage system is based on IBM's GPFS filesystem, which provides a scalable, high-performance clustered filesystem. The storage hardware is provided by DataDirect Networks (DDN), and consists of a GridScalar GS12K20 at the primary site, with a GS7K based system on the secondary site. Each provides around 800 Tb of high-performance storage and are connected with an FDR infiniband fabric. Access to the storage is provided by CIFS and NFS protocols through separate file-servers connected via infiniband.
The second storage tier is also provided by DDN, in the form of the WOS (Web Object Store) system. Each site has 7 WOS archive nodes, which provide 2 PB of storage. This provides a highly scalable storage capability which permits extenstive metadata to be stored alongside every object which can be accessed directly through a programatic interface by data management software. DDN's WOSBridge software is used for migrating between tiers 1 and 2 according to policies including data age and storage utilisation. This allows active data to be maintained on the faster storage, with inactive data migrated to slower disk but then retrieved to tier 1 automatically as required.
A tape-based system forms tier 3 of the storage, based around two Spectralogic T950 tape libraries, each with 6 x LTO6 drives and 670 licensed slots, providing ~2PB offline storage capacity. IBM's Tivoli Storage Manager (TSM) is used for automatic data migration to tier 3, which is intended for long term storage of infrequently required data.
All storage tiers are mirrored between the primary and secondary locations to provide robust, resiliant and secure storage.
Additional high-performance SSD arrays are also available, and connected to the storage fabric via infiniband
- 1 x Huawei Dorado 2100G2 Flash Array with 5Tb SLC SSDs
- 1 x Huawei Dorado 2100G2 Flash Array with 20Tb MLC SSs