Data scraping or mining personal information
Personal information capable of identifying living individuals is granted special protection by most legal regimes, including the United Kingdom. This captures both explicitly identifying information and information which could identify an individual when cross referenced with other sources. For example, an ‘anonymised’ database of all incidents of childhood leukaemia for both sexes by year for a particular postal area by census ward could be cross referenced with post code information to fully identify those who had suffered leukaemia. Information which has been completely anonymised is usually not subject to this protection, but pseudonymised data (which is still capable of reverse engineering e.g. when a researcher allocates a pseudonym but keeps a key to allow reverse identification) is still protected.
The collection and use of personal information must meet certain legal criteria – this is highly likely to apply to those engaging in data scraping and data mining where the data contains personal information.
The legal principles are:
- Data use must be lawful – as well as fair and transparent. There must be a specified legal basis for using the data.
- Data can only be used for the purpose it was collected for.
- Data should be relevant and limited to what is necessary to meet the purpose.
- Data should be kept up to data and accurate.
- Data should not be kept for longer than required.
- Data must be protected by good security measures.
If you identify that data scraping or data mining will involve use of personal information, you must think very carefully about how you will comply with these legal principles. The Information Commissioner’s Office is the regulator of data use in the UK and you may have to register with them.
Specified legal basis
There are 6 reasons that you might be allowed to use personal data for data scraping or mining:
- You have informed and freely given consent from the individual to scrape or mine the data;
- You have a contract with the individual that requires you to scrape or mine their data;
- You have a legal obligation to scrape or mine the data;
- It is necessary to scrape or mine the data to protect someone’s life
- It is necessary to scrape or mine the data to perform a task in the public interest or for official functions;
- It is necessary to scrape or mine the data for your legitimate interests, unless there is a good reason to protect the individual’s information which overrides that legitimate interest.
Note that regardless of the legal basis, the individual is usually required to have been told about the proposed data mining or scraping at the point when they provided the information.
There are also additional protections that apply to:
- ‘Special category’ personal information – data which reveals racial or ethnic origin, data about political opinions, religious or philosophical beliefs, trade union membership, genetic and biometric data, data about health, data about a person’s sex life and data about a person’s sexual orientation.
- Information about criminal offences.
The ICO has a lawful basis interactive guidance tool you could use to think about this in more depth.
If you propose to scrape or mine data from a dataset that you have not been involved in creating or compiling, it may be difficult to establish a specified legal basis to allow you to data scrape or mine.
Even if your proposed scraping or mining has a legal basis, you will also have to consider carefully how to comply with the other principles, such as minimisation, accuracy, and security. If you propose to deploy artificial intelligence or other software tools to engage in scraping or mining, these must ensure adequate security, and you will also have to think about whether the data will be transferred to another country – common with cloud-based platforms where data servers may be located outside the UK.
There is a small number of exceptions to the provisions of data protection law. The most relevant to data scraping and data mining are likely to be those found in Schedule 2 of the Data Protection Act paragraphs 26-28:
- ‘Special purposes’ – journalism, academic, artistic and literary purposes;
- Scientific, historical or statistical purposes;
- Archiving purposes in the public interest;
It is best to seek specific legal advice on conducting data scraping or data mining activity in relation to personal information because even these exceptions do not have general application and depend on a variety of factors.
Data scraping or mining data which contains personal information will involve complex considerations about compliance with data protection law as well as the protection for protected databases and other considerations about data scraping or mining .
In all cases you would have to look very closely at the information given to the individuals about the proposed purpose explained to them when they provided their data and you may have to implement new measures and processes to ensure that you can meet the legal standards for protection of personal information.
Much more detailed information about data protection law is available from the Information Commissioner’s Office. You may also be able to speak to an internal Data Protection Officer about your proposed data scraping or mining plans – most institutions, organisations and businesses will have appointed someone to manage compliance.
Data scraping and data mining are likely to be classified as ‘data processing’ – ‘any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction’ (Article 4(2) of the General Data Protection Regulation 2016).
The individual whose personal information is used is called a ‘data subject’ – an ‘identified or identifiable natural person’ (Article 4(1) of the General Data Protection Regulation 2016).
In the case Common Services Agency v Scottish Information Commissioner  UKHL 47, which was decided before the implementation of the General Data Protection Regulation 2016 into UK law in the new Data Protection Act 2018, a ‘Freedom of Information Request’ for all incidents of childhood leukaemia for both sexes by year for a particular postal area by census ward was found to potentially be personal information capable of identifying individuals. A process called ‘barnardisation’ of data was ordered to make the information truly anonymous: ‘Barnardisation is a method of rendering the information, so far as it is possible to do so, anonymous. If the definition of “personal data” can be read in a way that excludes information that has been rendered fully anonymous in the sense that it is information from which the data subject is no longer identifiable, putting it into that form will take it outside the scope of the Agency’s duty as data controller’ (Lord Hope of Craighead at paragraph 23 of the judgment in Common Services Agency v Scottish Information Commissioner  UKHL 47).
The Data Protection Act 2018 is the UK’s implementation of the General Data Protection Regulation (GDPR) and is available at https://www.legislation.gov.uk/ukpga/2018/12/contents/enacted
The General Data Protection Regulation 2016 (GDPR) is available at https://eur-lex.europa.eu/eli/reg/2016/679/oj
This regulatory space is evolving fast, nationally and internationally.
‘Data adequacy’ is an important concept under which data can flow freely between jurisdictions that offer similar levels of protection: https://ec.europa.eu/commission/presscorner/detail/ro/ip_21_3183
While the EU and UK provisions are currently treated as equivalent, post-Brexit the UK is also seeking new global data adequacy partnerships that are based on principles that depart from EU law: https://www.gov.uk/government/news/uk-unveils-post-brexit-global-data-plans-to-boost-growth-increase-trade-and-improve-healthcare
The European Commission’s proposed EU Data Act is a new regulation to establish a harmonised framework for the access to and use of data generated in the EU across all economic sectors: https://digital-strategy.ec.europa.eu/en/library/data-act-proposal-regulation-harmonised-rules-fair-access-and-use-data.
It is important for users to have a better understanding of how to deal with the influence of algorithms on the production, distribution and consumption of creative works, and the implications for copyright.
Students and researchers often need to make use of materials which are copyright protected. In the context of their research or study, they may have to make copies or use extracts of those materials.
The electronic analysis of large amounts of copyright works allows researchers to discover patterns, trends and other useful information that cannot be detected through usual ‘human’ reading. This process, known as ‘text and data mining’…