Background

In the domain of machine learning, it is important to share the trained model, the code used to train the model and the training data alongside journal articles where the model has been used. This practice underpins the reproducibility and reusability of research outcomes, allowing other researchers to both verify and build upon your work.  Complementary to this is the need to share trained models with researchers who may have no coding experience or those who want to run them for a new input quickly with minimal setup.

Dr Kim Jelfs and her research group (Department of Chemistry) frequently build and publish machine learning models that take simple representations of a molecule as input and give some score or categorisation as output. Examples include predicting whether a set of molecules will form a porous cage structure or just collapse and predicting how easy chemical precursors will be to synthesise. The group required a platform to host pre-trained models with a front-end web app that could be used to interact with the models.

Our Contribution

The RSE Team’s first task was to create a Dash app for one of the research group’s machine learning models and run this in a container using Docker. This repository was set up as a self-explanatory template allowing the research group modify the code easily to publish further interactive models. We then provisioned a web server hosted on a college virtual machine to host this app.

A significant technical challenge to this project was the requirement to host several machine learning models, which all require totally different programming environments, in one place. The solution to this was to use Docker Compose to manage separate containers and Caddy to provide a reverse proxy to forward URL requests to the correct application.  A continuous deployment (CD) pipeline was set up so that new models are automatically packaged up and ready to be deployed on the virtual machine.

A key aspect of our engagement was to provide training and documentation to the research group, enabling them to create and deploy apps themselves in the future with minimal input from us.

Outcomes

The SupraShare website now hosts three separate apps, allowing anyone to interact with the machine learning models published by the Dr Kim Jelfs and her group. The platform is extensible so that the group can continue to add apps as they publish more models. The SupraShare template repository is fully open-source and under a permissive licence, allowing other researchers to use and adapt this code.

Testimonials

Dr Kim Jelfs, Scientific Project Manager, SupraShare:

“It’s been great to have the Research Software Engineering team’s support in this, helping us ensure our models are open-source, while allowing us to continue to focus on our ongoing research.”

 Steven Bennett, PhD student:

 “The tools provided by the Research Software Engineering team now allow us to share our models with ease, enhancing the reproducibility and accessibility of our research. The system developed by the team was both easy-to-use and expandable, allowing group members to share results no matter the computational tools used to generate them.”