Big data and analytical unit (BDAU)
IGHI's Big Data & Analytical Unit (BDAU) is a multidisciplinary team of data specialists which collaborates with a large network of researchers across the College. The BDAU is primarily a data services-oriented unit with a remit to provide technical and analytical support to its customers, partners, and collaborators across the data lifecycle, from data application and data access to data analysis and visualisation of outputs. This is done primarily through the BDAU Secure Environment (BDAU SE) which provides a standard platform for researchers to securely hold and analyse data.
BDAU Secure Environment (BDAU SE)
The BDAU SE, is an ISO 27001:2013 certified research environment and compliant with NHS Digital Data Security and Protection Toolkit (EE133887-BDAU), providing:
• a standard operating/access model,
• secure data storage and processing environment and
• analysis software (including R, Python, Stata, SPSS, MATLAB, etc).
Can I use the BDAU SE to store and analyse my data?
You may use the BDAU SE if your research project involves the use of personal data which needs to be handled in a secure way to ensure compliance with regulations such as GDPR, or if the data provider for a specific dataset, requires their data to be stored in a secure environment to meet a set of data security standards. Data that is publicly available or is aggregated typically does not need to be stored in the BDAU SE.
The BDAU SE can be used for data storage and analysis while your project is ongoing. It is not a solution for long-term storage of data and archiving.
Due to our current policies, personally identifiable data (e.g. name, date of birth, IP address, etc) cannot be hosted in the BDAU SE. Therefore, data needs to be fully de-identified before it can be transferred to the BDAU SE.
Imaging, audio, and video data generally cannot be hosted in the BDAU SE.
- Data Requirements Definition – We consult with researchers at the feasibility scoping, data application and similar early stages of projects to help determine the precise data requirements that will be necessary for the successful completion of a given research project. This means that we get the researchers to elucidate their research criteria and then counsel them on the datasets they would need, the length of time to search within those datasets, the variables of interest and other relevant issues. This service is currently only available for CPRD and NRLS data.
- Data Acquisition – Initiating, advising on and implementing the process of obtaining datasets required for individual or multiple studies. This includes applying to bodies such as NHS Digital and MHRA for use of the restricted datasets such as HES and CPRD. Once applications are approved, the BDAU can extract the required data and store it in the BDAU SE.
- Secure Data Storage – The BDAU has resources for securely storing and analysing de-identified data within its certified research environment, the BDAU SE.
- Data Analysis – The BDAU is able to partner with researchers to support quantitative and qualitative analysis of data. We have a wide range of expertise in statistical programs, programming languages, and advanced statistical techniques such as natural language processing and machine learning.
- Data Visualisation – The BDAU helps with visualisation of data to aid in providing answers to research questions. This is mostly done using the interactive data visualisation tools such as Tableau and Power BI, but expertise is available in other statistical programs.
Our data sources
1. NHS Digital
- Hospital Episode Statistics (HES) – HES is an administrative dataset that records the details of all inpatient, outpatient and A&E attendances at NHS hospitals in England. It has been widely used by researchers to assess usage levels by patients and costs incurred due to hospital treatment within the NHS in England.
- Data Access – To access NHS Digital datasets such as HES, researchers need to complete the Data Access Request Service (DARS) process. If you are planning to use the BDAU SE for storage and analysis of HES data, please contact us so we can assist with completing the relevant sections of your DARS application. More information about the DARS process and NHS Digital charges are available here.
2. Medicines and Healthcare products Regulatory Agency (MHRA)
- Clinical Practice Research Datalink (CPRD) - CPRD is a clinical dataset that is comprised of anonymised primary care records. Research using CPRD data has resulted in over 2,800 publications which have led to improvements in drug safety, best practice, and clinical guidelines. CPRD can be linked to other health-related patient datasets such as HES, ONS Death Registry and Socio-economic measures to provide a fuller picture of the patient care record.
- Data Access – Imperial has a multi-study annual license agreement with CPRD, which is managed by the BDAU. We have trained fob-holders who can extract the data for researchers against specific study specifications, following protocol approval via CPRD's Research Data Governance (RDG) process (previously ISAC). Please contact us at firstname.lastname@example.org when you are planning your project to discuss the process and the associated costs.
3. NHS England and NHS Improvement
- National Reporting & Learning System (NRLS) - NRLS is a central database of patient safety incidents from healthcare staff that occur in hospitals in England & Wales. NRLS contains 10 million coded records of patient safety incidents from 2004 onwards.
- Data Access – The BDAU team has experience applying for, storing and analysing NRLS datasets for research purposes. If you are interested in using NRLS data for your research, please contact us at email@example.com.
4. Imperial College Healthcare NHS Trust
- Imperial College Healthcare NHS Trust (ICHT) – ICHT has a database of patient health records collected within the Trust. Applicants can request to use the ICHT data in research, service evaluation or clinical audit.
- Data Access – If you are interested in using data from ICHT, please click here for more information.
5. Open Data Sets
- These are open-access datasets provided by national, regional and local government entities that can be used for any purpose. They mostly contain anonymised, aggregate-level data and can, by being linked with other datasets, help to provide deeper insights into healthcare and other issues. An example is the UK Government Open Data Repository available by clicking here.
Please contact us at firstname.lastname@example.org for further information or if you have any questions.