Intelligent Component-based Development of HPC Applications

libhpc visualisations and nekkloud web interface

The Libhpc project is developing a generic component-based framework for defining and running High Performance Computing applications on remote, distributed platforms such as clusters and infrastructure clouds. The libhpc framework will enable scientists to build, deploy and run HPC applications in a platform-agnostic manner while taking advantage of platform and hardware-specific optimisations and simplified deployment across a range of different computational infrastructure.

Libhpc is a partnership between groups in the Departments of Computing and Aeronautics, the Bioinformatics Support Service and the Faculty of Medicine at Imperial College London and the Edinburgh Parallel Computing Centre (EPCC), University of Edinburgh. libhpc stage I ran from 2011-2013 and the current, second stage project, libhpc II, runs from 2013-2015. Both projects have been funded by research grants [libhpc I][libhpc II] from the Engineering and Physical Sciences Research Council (EPSRC).

For its initial test case, libhpc worked with Nektar++, a high-order finite element analysis framework developed by the Department of Aeronautics, Imperial College London and University of Utah for research in Fluid Dynamics. libhpc 2 expanded the use cases to include a selection of Bioinformatics pipelines and applications in Epidemiology. We also aim to investigate a range of different application models for use cases and have been further validating our tools using the GROMACS molecular dynamics software.

Libhpc architecture

Tools and Services

The following tools and services have been or are currently being developed as part of the libhpc project:

Nekkloud

Nekkloud is a web-based environment, building on initial implementations of the libhpc services, for running finite element jobs using the Nekktar++ software on cluster and cloud resources. The system provides a user-friendly interface for setting up jobs and allows users to select the platform they would like to use to run their job. Additional cloud platforms can be added to the system and users can register their cloud account credentials with the system so that they are charged directly for the cloud resources used.

Nekkloud job creation screenshot

Libhpc coordination form and component software library (Python)

This is a Python software library providing developers with the ability to define component metadata and to build libhpc components and coordination form implementations.  A series of coordination forms are provided as standard and this set can be extended. The library also contains a prototype component generator that can take an XML component description and convert this to a libhpc component definition in Python. The library can be used to build component pipelines and has been developed alongside our Bioinformatics use cases.

Coordination form and component metadata repository

The repository is a Python-based service that is accessed over HTTP and has a REST interface. It stores details of coordination forms and components and their implementations. The REST interface allows for addition of new data and querying of existing data.

Libhpc CFEditor

The CFEditor is a coordination forms structure editor that allows developers and users to define applications by building expressions consisting of coordination forms and components. The editor guides the user through the process of selecting coordination forms and then applying them to components and is driven by the contents of the metadata repository introduced above. A resulting description can be output as Python code using the libhpc software library and then run as the user application.

Job Templates and Profiles

Job templates and profiles provide structures that are used to represent the set of parameters applying to a job or component. A template is a tree structure that represents all of an application’s parameters that can be set to determine how the application runs and to tune performance. The parameters are grouped by area or functionality and are uninstantiated in the original template. A profile represents a template with some set of values instantied. A valid profile is one where all required values are instantiated and the profile can be used to run a job. Profiles can be saved to be re-used by other users and different classes of individual involved in the confiugration and running of an application may collaborate to produce profiles by each handling the areas in which they have greatest expertise and then saving the profile for others to further extend.

TemPSS (Templates and Profiles for Scientific Software), a standalone service for working with libhpc templates and profiles to create application input files, is now available on Github: https://github.com/london-escience/tempss

TemPSS is also introduced in detail in J. Cohen, C. Cantwell, D. Moxey, J. Nowell, P. Austing, X. Guo, J.Darlington and S. Sherwin. "TemPSS: A Service Providing Software Parameter Templates and Profiles for Scientific HPC," e-Science (e-Science), 2015 IEEE 11th International Conference on, Munich, 2015, pp. 78-87. http://dx.doi.org/10.1109/eScience.2015.43.