Data scraping or mining copyright protected works


Copyright protected data

As explained elsewhere on Copyright User, copyright protects certain categories of works, many of which need to be ‘original’ to qualify for protection. Often data in a database does not meet that standard for protection – for example, much data will consist of unoriginal, repeated entries of generic information.

It is generally accepted that in most instances, one-word entries in data fields will not be sufficient to qualify for protection as ‘literary’ works under copyright law. However, lack of copyright protection does not mean that other forms of protection may not exist for the dataset.

However, it may be possible to compile or access a dataset that is a compilation of works that are protected by copyright – for example, anthologies of poems, songs or photographs. Here each individual copyright work will attract its own protection, thereby limiting the use that can be made of the compilation as a dataset.


Copyright protected databases

As well as potentially protecting any works in the data, copyright law also protects a database where the database structure or arrangement is in some way the ‘intellectual creation’ of the author. This is of relevance if data scraping or data mining requires copying of the entire dataset in its structure and arrangement as well as content, as a copyright owner has the right to control reproduction


Is data scraping or mining of copyright protected datasets lawful?

Data scraping and data mining often involves acts of reproduction. For example, the data is usually reproduced on screen and the nature of data mining usually involves making a copy of the whole data set.

However, the right to control reproduction of a copyright protected work is owned by the copyright owner, meaning that if a dataset contains copyright protected material, data scraping and mining may require permission from the copyright owner, or for an exception to allow legal use. Otherwise, your activity might potentially be infringement of copyright and subject to enforcement by the copyright owner.

There is a specific exception that allows use of a copyright protected work (including a copyright protected database) for text and data mining but this is very limited.

  • You must already have lawful access to the work.
  • The exception is only available where the purpose is for non-commercial research.
  • Attribution of the author must be included.

You can read more about the copyright exception for text and data mining here:

The movement for ‘Open Data‘ seeks to overcome some of the difficulties around use of potentially protected data through use of Open Standards and encouraging open data opportunities that anyone can use. Making data ‘open’ is a form of collective licensing that can make use of otherwise copyright protected datasets lawful.

Many governmental agencies as well as academic and research institutions adopt open access policies and practices and make their resources freely available to use. The UK data service, for example, is open access and some of their collections are open data. The open access initiative in the Netherlands aims to grant free and open online access to academic data and publications. Google Scholar, Internet Archive, and Wikimedia Commons are examples of open access databases.


Legal language

In 2009, in Infopaq International A/S v Danske Dagblades Forening, the CJEU stated that ‘words, considered in isolation, are not as such an intellectual creation of the author who employs them’.

In Football Dataco Ltd v Yahoo! UK Ltd and others [2012], the CJEU confirmed that the criterion for copyright protection for the selection and arrangement of a database was ‘not satisfied when the setting up of the database is dictated by technical considerations, rules or constraints which leave no room for creative freedom’.

Please note that the CJEU decisions mentioned above were made when the UK was still part of the EU and therefore bound by EU law. Following the end of the transition period (31 December 2020), the Supreme Court and Court of Appeal in the UK can decide to depart from pre-Brexit CJEU case law if they consider it ‘appropriate to do so’. However, until either of these senior courts decide to do so, the UK courts will continue to follow pre-Brexit CJEU judgements.


Legal references

Section 3A of the UK Copyright, Designs and Patents Act 1988 covers the protection for the structure and arrangement of a database.

The UK Copyright and Rights in Databases Regulations 1997 insert the specific provisions for databases into the 1988 Act.

Section 29A and Schedule 2(2)1D of the Copyright Designs and Patents Act 1988 consider the text and data mining exception.

The UK Copyright, Designs and Patents Act 1988 is available at:

The UK Copyright and Rights in Databases Regulations 1997 are available at:

This regulatory space is evolving fast, nationally and internationally.

‘Data adequacy’ is an important concept under which data can flow freely between jurisdictions that offer similar levels of protection:

While the EU and UK provisions are currently treated as equivalent, post-Brexit the UK is also seeking new global data adequacy partnerships that are based on principles that depart from EU law:

The European Commission’s proposed EU Data Act is a new regulation to establish a harmonised framework for the access to and use of data generated in the EU across all economic sectors:





It is important for users to have a better understanding of how to deal with the influence of algorithms on the production, distribution and consumption of creative works, and the implications for copyright.

Research & Private Study

Research & Private Study

Students and researchers often need to make use of materials which are copyright protected. In the context of their research or study, they may have to make copies or use extracts of those materials.

Text & Data Mining

Text & Data Mining

The electronic analysis of large amounts of copyright works allows researchers to discover patterns, trends and other useful information that cannot be detected through usual ‘human’ reading. This process, known as ‘text and data mining’…