Ethical and Legal Aspects

1 Ethics and Data Protection

Ethics should be considered early in any research project. Information on data storage, security, availability, and archiving should be included into the DMP. Well-planned informed consent that carefully considers ethics can help make data more shareable.

Art. 36c et seq. of the ETH Act regulate the handling of personal data for research projects at Empa. Personal data that has been rendered anonymous in such a way that the individual is not or no longer identifiable is no longer considered personal data. For data to be truly anonymised, the anonymisation must be irreversible. Personal data that has been de-identified, encrypted or pseudonymised but can be used to re-identify a person remains personal data and falls within the scope of the data protection laws. If a research project uses non-anonymised personal data of any kind (e.g. names, physical and/or IP addresses etc.) or other sensitive data (e.g. medical data, data that allows for military use etc.), a declaration to Empa Legal and/or the Empa Ethics Committee is to be considered, before making this data publicly available.

2 Confidentiality

In a research project with external partners (research and/or industrial partners) there is usually an agreement on confidentiality – be it verbal or in writing (e.g. in a Non-Disclosure Agreement). These agreements have to be respected when sharing data and/or uploading data onto an (open) data repository. Regardless of the existence of such agreements, every research project has to comply with the rules of the Empa Management Handbook (MHB). Of particular relevance for research data management are MHB 2.3.21 ("Directive for the Classification of Data at Empa") and MHB 2.3.18 ("Guidelines for Handling Data at Empa").

Furthermore, confidentiality is also important where a research project could potentially lead to patentable inventions. Sharing data (or sharing data too early) might be detrimental to the patentability of an invention. In this context, MHB 4.3.1 ("Directive concerning the rights to research results and their exploitation") has to be considered before sharing data.

Whenever confidentiality agreements, patentability of an invention or ethical and data protection concerns prevent data sharing, there has to be an "opt-out" of the affected data; i.e. the affected data cannot be shared or uploaded onto an (open) data repository. Funders of research projects (e.g. EU or SNFS) allow for such an opt-out, as long as it is clearly stated in the DMP – including the reasoning why the data has to be omitted.

3 Intellectual Property, Licensing and Re-Use of Data

3.1 Data versus database

In any data project, there are likely to be two components: The first is the data collected, assembled, or generated. It is the raw content in the system. This raw data could be hourly temperature readings from a sensor, the age of individuals in a survey, recordings of individual voices, or photographs of plant specimens. The second component is the data system in which the data is stored and managed – the database. In general, raw data (e.g. temperature, humidity etc.) on their own are considered facts and thus are not copyrighted under Swiss law. However, data that are gathered together in a unique and original way, such as databases (e.g. a climate database consisting of temperature, humidity etc. over a given timespan), might automatically fall under copyright protection. Deciding what data needs to be included in a database, how to organize the data, and how to relate different data elements are all creative decisions that may receive copyright protection.

As a clear distinction between raw data and a database can be difficult, clear licensing regarding re-use is important.

3.2 Data Licensing

This section covers only the licensing of data and databases. For the licensing of software, please refer either to the Software Declaration Form (SDF) or to the Open Source Software Registration (OSSR) provided by the Empa-Eawag TT-Office. If your software includes a database, the SDF should also be used. For inventions, please use the Invention Disclosure Form (IDF).

There is increased pressure from funders and journals for researchers to release their research data. Applying appropriate licensing when data are released will help ensure proper re-use and attribution. There are many licenses available that represent the range of rights for the creator and licensee of the data. When choosing a license, the (possible) conflict of objectives between confidentiality, patents and ethics on one hand and open data on the other hand should always be kept in mind. As mentioned above, there is the possibility to "opt-out", if researches are obliged to keep certain data confidential or if there might be a patentable invention. Furthermore, it has to be taken into consideration that not all data are in the public domain: A project might, for example, use copyrighted photographs; these photographs are also part of the project’s "data". Therefore, it might be necessary to differentiate between the database and its data content (e.g. images, text, films, music) for licensing. In case of queries or uncertainties, please contact the Technology Transfer Office, either your TT contact person or the administration (email: TT@empa.ch)

In order to facilitate the re-use of data, it is imperative that others know the terms of use for the database and the data content. The Open Data Commons group (ODC) has been developing three standard licenses to govern the use of data sets.

The three ODC licenses are:

  1. Public Domain Dedication and License (PDDL): This dedicates the database and its content to the public domain, free for everyone to use as they see fit.
  2. Attribution License (ODC-By): Users are free to use the database and its content in new and different ways, provided they provide attribution to the source of the data and/or the database.
  3. Open Database License (ODC-ODbL): ODbL stipulates that any subsequent use of the database must provide attribution, an unrestricted version of the new product must always be accessible, and any new products made using ODbL material must be distributed using the same terms. This license applies only to the database itself and not to the data content (which has to be licensed separately, if licensing is possible). It is the most restrictive of all ODC licenses.

Creative Commons (CC) also has a library of standardized licenses, and some of them apply to data and databases. The ODC-By license, for example, is the equivalent of a Creative Commons Attribution license (CC BY). CC BY licenses, however, require copyright ownership of the underlying work (the data content), whereas the ODC-By license applies to works not protected by copyright (such as factual raw data)

The two CC licenses that are of greatest relevance to data management are:

  1. CC0 (i.e., "CC Zero"): When an owner wishes to waive the copyright and/or database rights, one can use the CC0 mark. It effectively places the database and data into the public domain. It is the functional equivalent of an ODC PDDL license.
  2. Public Domain mark (PDM): It is used to mark works that are in the public domain, and for which there are no known copyright or database restrictions. It is possible to flag factual data as PDM in a database, for example, in order to make it clear it is free to use.

3.3 Selecting a data license

There is no single right answer as to which license to assign to a database or content. Note, however, that anything other than an ODC PDDL or CC0 license may cause serious problems for subsequent scientists and other users. This is because of the problem of attribution stacking. It may be possible to extract data from a data set, use it in a research project, and still maintain information as to the source of that data. It is possible to create a data set derived from hundreds of sources with each source requiring acknowledgement. Furthermore, the data in the other databases may not have originated with it, but instead sourced from other databases that also demand attribution. Rather than legally require that everyone provide attribution to the data, it might be enough to have a community norm that says “if you make extensive use of data from this data set, please credit the authors.”

In case of queries or uncertainties when selecting a data license, please contact the Technology Transfer Office, either your TT contact person or the administration (email: TT@empa.ch).

3.4 Re-Using Existing Data

When re-using existing data, one has to clarify ownership, obtain permissions if needed, and understand limits set by licenses. In any case, it is important to provide appropriate attribution and citation in accordance with research standards.

3.5 Data Ownership @Empa

In accordance with article 36 of the ETH Act, all rights in research results (intellectual property) that have been created by Empa staff in the exercise of their official duties, with the exception of copyrights, are the property of Empa. The author is entitled to the copyright in protected works that are created within the framework of an employment relationship (in particular scientific publications or textbooks). If a work of this kind is developed in fulfilment of a performance obligation undertaken under an employment contract (known as a service work, article 36 of the ETH Act), the assignment of the copyright and of the exploitation rights to Empa can be agreed by contract by Empa with the author. Empa has reached such an agreement with its employees in the employment contracts (excluding publications by employees). For details regarding data ownership, please refer to MHB 4.3.1 ("Directive concerning the rights to research results and their exploitation").