2. I want to scrape or use data which contains copyright protected works

Data scraping or mining copyright protected works

Copyright protected data

As explained elsewhere on Copyright User, copyright protects certain categories of works, many of which need to be ‘original’ to qualify for protection. Often data in a database does not meet that standard for protection – for example, much data will consist of unoriginal, repeated entries of generic information.

It is generally accepted that in most instances, one-word entries in data fields will not be sufficient to qualify for protection as ‘literary’ works under copyright law. However, lack of copyright protection does not mean that other forms of protection may not exist for the dataset.

However, it may be possible to compile or access a dataset that is a compilation of works that are protected by copyright – for example, anthologies of poems, songs or photographs. Here each individual copyright work will attract its own protection, thereby limiting the use that can be made of the compilation as a dataset.

Copyright protected databases

As well as potentially protecting any works in the data, copyright law also protects a database where the database structure or arrangement is in some way the ‘intellectual creation’ of the author. This is of relevance if data scraping or data mining requires copying of the entire dataset in its structure and arrangement as well as content, as a copyright owner has the right to control reproduction

Is data scraping or mining of copyright protected datasets lawful?

Data scraping and data mining often involves acts of reproduction. For example, the data is usually reproduced on screen and the nature of data mining usually involves making a copy of the whole data set.

However, the right to control reproduction of a copyright protected work is owned by the copyright owner, meaning that if a dataset contains copyright protected material, data scraping and mining may require permission from the copyright owner, or for an exception to allow legal use. Otherwise, your activity might potentially be infringement of copyright and subject to enforcement by the copyright owner.

There is a specific exception that allows use of a copyright protected work (including a copyright protected database) for text and data mining but this is very limited.

You must already have lawful access to the work.
The exception is only available where the purpose is for non-commercial research.
Attribution of the author must be included.

You can read more about the copyright exception for text and data mining here: https://www.copyrightuser.org/understand/exceptions/text-data-mining/

The movement for ‘Open Data‘ seeks to overcome some of the difficulties around use of potentially protected data through use of Open Standards and encouraging open data opportunities that anyone can use. Making data ‘open’ is a form of collective licensing that can make use of otherwise copyright protected datasets lawful.

Many governmental agencies as well as academic and research institutions adopt open access policies and practices and make their resources freely available to use. The UK data service, for example, is open access and some of their collections are open data. The open access initiative in the Netherlands aims to grant free and open online access to academic data and publications. Google Scholar, Internet Archive, and Wikimedia Commons are examples of open access databases.

Legal language

In 2009, in Infopaq International A/S v Danske Dagblades Forening, the CJEU stated that ‘words, considered in isolation, are not as such an intellectual creation of the author who employs them’.

In Football Dataco Ltd v Yahoo! UK Ltd and others [2012], the CJEU confirmed that the criterion for copyright protection for the selection and arrangement of a database was ‘not satisfied when the setting up of the database is dictated by technical considerations, rules or constraints which leave no room for creative freedom’.

Please note that the CJEU decisions mentioned above were made when the UK was still part of the EU and therefore bound by EU law. Following the end of the transition period (31 December 2020), the Supreme Court and Court of Appeal in the UK can decide to depart from pre-Brexit CJEU case law if they consider it ‘appropriate to do so’. However, until either of these senior courts decide to do so, the UK courts will continue to follow pre-Brexit CJEU judgements.

Legal references

Section 3A of the UK Copyright, Designs and Patents Act 1988 covers the protection for the structure and arrangement of a database.

The UK Copyright and Rights in Databases Regulations 1997 insert the specific provisions for databases into the 1988 Act.

Section 29A and Schedule 2(2)1D of the Copyright Designs and Patents Act 1988 consider the text and data mining exception.

The UK Copyright, Designs and Patents Act 1988 is available at: https://www.legislation.gov.uk/ukpga/1988/48/contents

The UK Copyright and Rights in Databases Regulations 1997 are available at: https://www.legislation.gov.uk/uksi/1997/3032/contents

This regulatory space is evolving fast, nationally and internationally.

‘Data adequacy’ is an important concept under which data can flow freely between jurisdictions that offer similar levels of protection: https://ec.europa.eu/commission/presscorner/detail/ro/ip_21_3183

While the EU and UK provisions are currently treated as equivalent, post-Brexit the UK is also seeking new global data adequacy partnerships that are based on principles that depart from EU law: https://www.gov.uk/government/news/uk-unveils-post-brexit-global-data-plans-to-boost-growth-increase-trade-and-improve-healthcare

The European Commission’s proposed EU Data Act is a new regulation to establish a harmonised framework for the access to and use of data generated in the EU across all economic sectors: https://digital-strategy.ec.europa.eu/en/library/data-act-proposal-regulation-harmonised-rules-fair-access-and-use-data.

Back to Data Scraping & Data Mining

Algorithms

It is important for users to have a better understanding of how to deal with the influence of algorithms on the production, distribution and consumption of creative works, and the implications for copyright.

Research & Private Study

Students and researchers often need to make use of materials which are copyright protected. In the context of their research or study, they may have to make copies or use extracts of those materials.

Text & Data Mining

The electronic analysis of large amounts of copyright works allows researchers to discover patterns, trends and other useful information that cannot be detected through usual ‘human’ reading. This process, known as ‘text and data mining’…

CREATE

Creators Discuss

Musician

Filmmaker

Performer

Writer

Visual Artist

Developer

Public Domain

Public Domain: Duration

Copyright Bites

Copyright in SMEs

Lawful Reuse

Links & Resources

Creative Process

Copyright & Creativity

Copying & Creativity

Going for a Song

UNDERSTAND

Rights & Permissions

Protecting

Licensing & Exploiting

Using & Reusing

Getting Permission

User-Generated Content

Algorithms

Data Scraping & Data Mining

Terms & Conditions

Legal Access

Enforcement

Exceptions

Quotation

Parody & Pastiche

News Reporting

Education

Research & Private Study

Private Copying

Orphan Works

Text & Data Mining

Disabilities

EDUCATE

The Game is On!

Episode 1

Episode 2

A-Level Media Studies

Prompt one

Prompt two

Prompt three

Prompt four

Intermediaries

Libraries

Teachers & Students

Enjoy ©

Myth-Reality Cards

Copyright Bites

Copyright Bite #1

Copyright Bite #2

Copyright Bite #3

ABOUT

About us

Methodology

FAQs

Video Gallery

Contact

Most of the original content on the Copyright User’s website is distributed under a CC-BY 3.0 licence, meaning that you can share, remix, alter, and build upon Copyright User content for any purpose, as long as you credit the author of the content. Where content on Copyright User is not distributed under a CC-BY 3.0 licence, this will be indicated clearly.

Related

ABOUT