Big data and analytical unit (BDAU)
The Big Data and Analytical Unit is a multi discipline team which collaborates with a large network of researchers across the college to ensure the maximum use, impact and dissemination of research using healthcare data.
We currently provide the only fully certified ISO 27001:2013 research environment within Imperial College. We also are 100% compliant with NHS IG Toolkit Level 3 (EE133887).
- Information Governance Advice – The BDAU provides advice to researchers on information governance issues such as maintaining patient privacy, data confidentiality, anonymisation, and pseudonymisation. In addition, we can help with the drafting of MoU’s for the purpose of sharing data between collaborators.
- Data Protection and Ethics Advice – Advice is provided to researchers on data protection and research ethics.
- Data Sharing and Dissemination – The BDAU works with researchers to build re-usable datasets and metadata which meet criteria for FAIR principals. FAIR principals indicate that data should be Findable, Accessible, Interoperable and Reusable/Reproducible. As part of our mission, we will seek to help researchers share their data both with the wider academic community and with the public at large. Although we respect the requirements of data provided under consent or contract, we are committed to the use and creation of open data whenever possible.
- Data Requirements Definition – We consult with researchers at the feasibility scoping, grant application and similar early stages of projects to help determine the precise data requirements that will be necessary for the successful completion of a given research project. This means that we get the researchers to elucidate their research criteria and then counsel them on the datasets they would need, the length of time to search within those datasets, the variables of interest and other relevant issues.
- Data Acquisition – Initiating, advising on and implementing the process of obtaining datasets required for individual or multiple studies. This includes applying to bodies such as the HSCIC and MHRA for use of the restricted datasets which they hold as well as the ideal use of open-access datasets. Once applications are approved the BDAU can extract the required data and store it in a secure fashion.
- Secure Data Storage – The BDAU has resources for securely storing and accessing data. We have several database options available for storing data (including Oracle) which are automated for access and can be controlled by individual research groups.
- Data Analysis – The BDAU is able to assist researchers with both quantitative and qualitative analysis of data. We have a wide range of expertise in statistical programs, programming languages, and advanced statistical techniques such as natural language processing and machine learning.
- Data Visualisation – The BDAU helps with visualisation of data to aid in providing answers to research questions. This is mostly done using the interactive data visualisation tool Tableau but expertise is available in other statistical programs.
- Software Development – The BDAU carries out innovative analyses to gain insight from a range of data sources including social media content and metadata, such as the feeds from the micro- blogging tool Twitter. We have developed software for a range of languages and functions. The existing software can also be used by researchers as required.
Security Consultancy – The BDAU can advise on security requirements to help ensure better compliance with ISO 27001:2013 and NHS IG Toolkit Level 3 standards.
Our data sources
- Hospital Episode Statistics (HES) - HES is an administrative dataset that records the details of all inpatient, outpatient and A&E attendances at NHS hospitals in England. It is a records-based system that covers all NHS trusts in England, including acute hospitals, primary care trusts and mental health trusts. It has been widely used by researchers to assess usage levels by patients and costs incurred due to hospital treatment within the NHS in England.
- Clinical Practice Research Datalink (CPRD) - CPRD is a clinical dataset that is comprised of anonymised primary care records. Research using CPRD data has resulted in over 1,500 publications which have led to improvements in drug safety, best practice, and clinical guidelines. CPRD can be linked to HES to provide greater insight into surgical outcomes.
- National Reporting & Learning System (NRLS) - NRLS is a central database of patient safety incidents from healthcare staff that occur in hospitals in England & Wales. NRLS contains 10 million coded records of patient safety incidents from 2004 onwards.
- Office for National Statistics (ONS) - Among the many things this dataset contains are records of births and deaths, which allows researchers to determine mortality-based outcome measures such as 30-day, 90-day, 1-year and 5-year survival rates following surgery.
- Patient Reported Outcome Measures (PROMs) - PROMs measures health gain in patients undergoing certain types of surgery in England, based on responses to questionnaires pre- and post-surgery. PROMs have been collected by all providers of NHS-funded care since 2009 and provide an indication of the outcomes and quality of care delivered to NHS patients.
- National Cancer Intelligence Network - The NCIN is a UK-wide initiative, working to drive improvements in standards of cancer care and clinical outcomes by improving and using the information collected about cancer patients for analysis, publication, and research.
- Open Data Sets - These are open-access datasets provided by national, regional and local government entities that can be used for any purpose. They mostly contain anonymised, aggregate-level data and can, by being linked with other datasets, help to provide deep insights into healthcare and other issues. An example is the UK Government Open Data Repository available at www.data.gov.uk