Data Scraping & Data Mining

Author: Sheona Burrow
Illustration: Ilaria Urbinati

Data scraping and data mining are (usually technological) processes where data (usually in some form of database) is accessed and used in some way, sometimes within a wider process of data cleansing, integration, transformation or evaluation. It is a common research and business technique, for example a price comparison tool will usually scrape data about pricing from databases. Unfortunately, understanding legal use can be difficult because of a number of overlapping legal regimes to consider. For example, legal use can depend on what is in the data (for example, personal information about individuals or copyright works), whether the data is in a legally protected database and on whether there is some form of contract with the database owner (e.g. terms and conditions).

This section aims to provide a picture of the different layers of legal protection one needs to consider in order to make informed decisions around data scraping and data mining. Lawful access to and use of data is not a settled legal matter. There is a need to establish community-specific norms that are in tune with legitimate public interest needs as well as competition and innovation concerns.

Protected databases

Both copyright law and a special database law provide protection for databases which meet certain criteria. Protected databases must be a ‘collection of independent works, data or other materials, which are arranged in a systematic or methodical way and are individually accessible by electronic or other means.’ This is quite a broad definition and will capture most databases.

A special right called the sui generis database right (SGDR) protects data in a database from unauthorised extraction or reutilisation unless there is a basis for legal use. However, it protects those who have invested in the ‘obtaining, verification or presentation’ of the data, rather than those who created the original data.

Copyright law also protects a database if ‘the selection or arrangement of the contents of the database the database constitutes the author’s own intellectual creation.’ This protection targets databases where the data has been compiled in a way which has required some intellectual thought – for example, a compilation of addresses listed alphabetically is unlikely to be protected by copyright. This protection is for the structure of the database and not for the data itself.

The effect of these protections is slightly different as they apply for different time periods and the protections can overlap and apply to the same database. These protections can even be owned by different parties. They apply in addition to all other forms of protection for the data – including contractual and competition law restrictions and confidentiality.

Click on the most relevant links below to read more about whether data scraping or data mining is permissible in the UK.

1. I want to scrape or use data which contains personal information about living individuals

Personal information capable of identifying living individuals is granted special protection by most legal regimes, including the United Kingdom.

2. I want to scrape or use data which contains copyright protected works

As well as potentially protecting any works in the data, copyright law also protects a database where the database structure or arrangement is in some way the ‘intellectual creation’ of the author.

3. I want to scrape or use a dataset which I don’t think contains any personal information or copyright protected works

Protection of a dataset is contextual in every case – the data, database and other context may dictate higher or lower risk for data scraping and data mining.

This research forms part of the CREATe Open Science series and was supported by the ESRC Urban Big Data Centre (UBDC) at the University of Glasgow.

Algorithms

It is important for users to have a better understanding of how to deal with the influence of algorithms on the production, distribution and consumption of creative works, and the implications for copyright.

Research & Private Study

Students and researchers often need to make use of materials which are copyright protected. In the context of their research or study, they may have to make copies or use extracts of those materials.

Text & Data Mining

The electronic analysis of large amounts of copyright works allows researchers to discover patterns, trends and other useful information that cannot be detected through usual ‘human’ reading. This process, known as ‘text and data mining’…

CREATE

Creators Discuss

Musician

Filmmaker

Performer

Writer

Visual Artist

Developer

Public Domain

Public Domain: Duration

Copyright Bites

Copyright in SMEs

Lawful Reuse

Links & Resources

Creative Process

Copyright & Creativity

Copying & Creativity

Going for a Song

UNDERSTAND

Rights & Permissions

Protecting

Licensing & Exploiting

Using & Reusing

Getting Permission

User-Generated Content

Algorithms

Data Scraping & Data Mining

Terms & Conditions

Legal Access

Enforcement

Exceptions

Quotation

Parody & Pastiche

News Reporting

Education

Research & Private Study

Private Copying

Orphan Works

Text & Data Mining

Disabilities

EDUCATE

The Game is On!

Episode 1

Episode 2

A-Level Media Studies

Prompt one

Prompt two

Prompt three

Prompt four

Intermediaries

Libraries

Teachers & Students

Enjoy ©

Myth-Reality Cards

Copyright Bites

Copyright Bite #1

Copyright Bite #2

Copyright Bite #3

ABOUT

About us

Methodology

FAQs

Video Gallery

Contact

Most of the original content on the Copyright User’s website is distributed under a CC-BY 3.0 licence, meaning that you can share, remix, alter, and build upon Copyright User content for any purpose, as long as you credit the author of the content. Where content on Copyright User is not distributed under a CC-BY 3.0 licence, this will be indicated clearly.

Data Scraping & Data Mining

Related

ABOUT