The following guidance will help you consider and develop your understanding of Anonymisation and Pseudonymisation plus highlight practical techniques to accomplish both.
Content
- Key Concepts
- Motivated Intruder Test
- Data Type Specific Considerations
- Common Approaches to Anonymise Datasets
- More resources
- Anonymous data / information – the UK GDPR states ‘“…information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”. While data protection legislation does not apply to anonymised data, the action of turning personal data into anonymised data will likely be considered as data processing and documented.
- Anonymisation – this is the act of rendering personal data anonymous so that it cannot be tracked back to an individual. This action can become more complex if you have large datasets that contain a wide range of data.
- Pseudonymisation – this is the replacement of directly identifiable parameters with pseudonyms, which will still constitute unique identifying indicators and is classed as personal data under legislation.
- De-identification – often used to describe the removal of all uniquely personal characteristics from data so that they can no longer be linked to a specific individual.. However, the term is not encouraged to be used as it can often be used interchangeably when referring to either pseudonymised or anonymised data. In contrast the term re-identification is described as the process of reidentifying pseudonymised or anonymised data under the context of it being a criminal offence under Section 171 of the Data Protection Act.
- Data Value – it should be assumed that the greater the perceived value of the data, from the perspective of the motivated intruder, potentially the greater the capabilities, tools and resources that are at the intruder’s disposal. This also assists, in part, to understand the possible motives behind a motivated intruder such as; does the data represent potential financial or political reward, malicious intent or simply curiosity.
Information is considered identifiable based on context of the data, how it was collected plus any other information that is held (or likely to be held) by the organisation holding the data. As such, every processing activity must be assessed on its individual merits and with the understanding of how the dataset, as a whole, can / will interact with other datasets held and potentially any external datasets which could be made available (for example data provided by the NHS, NHS Trusts and/or Biobank etc.)
Obvious identifiers include name, address, postcode, date of birth, and NHS Number though combinations of less obvious data items can sometimes also result in the information becoming identifiable and anonymisation being difficult to accomplish. These would include where persons suffer rare conditions or combinations of unique data points/traits.
Annex 3 in the Information Commissioner's Office (ICO) anonymisation guide [URL], gives some practical examples of anonymisation procedures which could be used to minimise identifiability of data and to support any such process the ICO recommends undertaking a Motivated Intruder Test to assess the effectiveness of the anonymisation process.
A method widely used to review anonymisation processes is the Motivated Intruder Test. This is an activity which considers all the practical steps and all the means that are reasonably likely to be used by someone who is motivated to identify the people whose personal data the anonymous information is derived from. The test is used to assess the identifiability risk of (proposed) anonymous information.
The approach assumes that the ‘motivated intruder’ is:
- reasonably competent,
- has access to resources such as the internet, libraries, and all public documents, and
- would employ investigative techniques such as making enquiries of people who may have additional knowledge of the identity of the data subject or advertising for anyone with information to come forward.
The ‘motivated intruder’ is not assumed to have:
- specialist knowledge such as computer hacking skills,
- access to specialist equipment, or
- intent towards criminality such as burglary, to gain access to data that is kept securely.
Techniques often considered:
- use of AI.
- online searchers for key identifiers (postcodes, birth dates).
- local and national press.
- trawling social media for associations that aid re-identification.
See full details via the Information Commissioner's Office (ICO) guidance on "How do we ensure anonymisation is effective?" [URL]. To support personnel undertake the Motivated Intruder Test within Imperial, we have created an Anonymisation Form Template [DOCX] which can be uploaded alongside the relevant Data Activity Risk-assessment Tool (DART) [URL] entry. The Anonymisation Form Template can also be accessed via Templates [URL].
Genomic Data
The Information Commissioner's Office (ICO) guidance on special category data [URL] states: "However, the definition of personal data also includes identification by reference to “one or more factors specific to the genetic identity of that natural person”, even without their name or other identifier. So, in practice, genetic analysis which includes enough genetic markers to be unique to an individual is personal data and special category genetic data, even if you have removed other names or identifiers."
It seems likely that some types of genetic data cannot be traced back to an individual. If the university wishes to classify any types of genetic data as non-personal data then documentation of the reasoning taken by expert individuals to reach this position is important evidence of good data stewardship.
The Genomics Information Governance Guidance developed in Imperial provides specific advice around genomic data types.
Face data
Widespread use of online platforms providing name alongside facial images combined with the advances in facial recognition has meant that face data would not satisfy the motivated intruder test. Care must also be taken with any imaging data which may be used to reconstruct facial data such as MRI sequential scans of faces.
Voice data
As with face data, voice data is increasingly more identifiable as online availability of video data with audio becomes widespread.
Geographic
Use of fine geographic data can be used to identify individuals’ places or work, homes or facilities they have been known to attend. This may directly identify an individual. Even rough geographic data can be combined with other available data to be effectively used to break anonymity of a dataset
Free Text / Survey Data
Collection of Free Text / Survey data when attempting to collect directly in an anonymised way or anonymising after collection, can be complicated by the context of the wording itself, inclusion of directly / indirectly identifiable data and can be resource intensive to review - depending on the volume and subject matter being collected. Specific guidance for those providing such information is therefore crucial to include and having a review process after collection is vital to avoid personal data becoming accidentally included in a supposed ‘anonymised’ dataset.
Other Data Types which may reveal personal information or identify an individual
- Omics – proteomics.
- AI / machine learning.
- Data generated from processing biological samples.
- Movement data / gait analysis.
- Other freely available open datasets.
- Combining datasets.
- Minimisation of accuracy.
- Minimisation of fields.
- Masking.
- Imaging – obscuring identifiable features.
- Voice data – transforming / masking.
- Aggregation of data.
- Uniqueness tests.
- Removal of linking fields.
- Avoidance of small numbers.
- The Information Commissioner’s Office (ICO) Glossary [URL]
- The Information Commissioner’s Office (ICO) Anonymisation Guide [URL]
- The Information Commissioner’s Office (ICO) Big data, Artificial Intelligence, Machine learning and data protection guide [PDF]
- Office of National Statistics (ONS) Statistical guidance and methodology [URL]
- Information Standards Board (ISB) Anonymisation Standard ISB1523 [URL]
- The European Data Protection Board (EDPB) [URL]