Template for the SNSF Data Management Plan


prepared by

         The ETH Zurich Library  

Please note

Recommendations on this page are intended to illustrate the guidelines and other information provided by the SNSF for preparing Data Management Plans. The SNSF’s guidelines are binding.

The creation of former versions of this page was mandated by swissuniversities as part of the DLCM project. The earlier versions of this page were prepared jointly by teams from the libraries of EPFL and ETH Zurich, with input from DLCM partners. The included content exists in adapted versions for the two universities. It can also be freely adapted to other institutions’ needs. The examples therefore do not cover all disciplines. Further examples from other subject areas as well as feedback or questions concerning ETH Zurich and other feedback are welcome to data-management@library.ethz.ch for possible inclusion in future revisions.

Version ETH Zurich 3

License Creative Commons CC BY-SA


Table of Contents

1. Data collection and documentation

2. Ethics, legal and security issues

3. Data storage and preservation

4. Data sharing and reuse

How to work with this template

A DMP for the SNSF must be entered in a webform on the mySNF-platform. This page will guide you through the process of collecting the necessary information and formulating the input. It compiles and relies on the binding guidelines from the SNSF, which have priority in any case of doubt.
The first version of your DMP can be considered as a draft. It can and must be adapted as the implementation of the project and its data management evolve.

Each section of this template contains:

  1. Section heading from SNSF
  2. Questions to consider
  3. Recommendations for completing the section
  4. Examples of inputs from different DMPs.
    Please note: these examples should only give you an idea of how to state certain information. You are welcome to re-use parts of the examples for your own means. Nevertheless, the content must be adapted to the situation in your project. In addition, be aware that these examples do not always cover the entire section in question and need to be completed.
  5. Contact information at ETH Zurich

SNSF Data Management Plan

Institution


ETH Zurich

Responsibilities


Principal Investigator:
(Specify name and email)

Data management plan contact person:
(Specify name and email)



1. Data collection and documentation

1.1 What data will you collect, observe, generate or re-use? 

Questions you might want to consider

  • What type, format and volume of data will you collect, observe, generate or reuse?

  • Which existing data (yours or third-party) will you reuse?

Briefly describe the data you will collect, observe or generate. Also mention any existing data that will be (re)used. The descriptions should include the type, format and content of each dataset. Furthermore, provide an estimation of the volume of the generated datasets.

This relates to the FAIR Data Principles F2, I3, R1 & R1.2

Recommendations


For each dataset in your project (including data you might re-use) mention:

  • Data type: Briefly describe categories of datasets you plan to generate or use, and their role in the project

  • Data origin: to be mentioned if you are reusing existing data (yours or third-party one). Add the reference of the source if relevant.

  • Format of raw data (as created by the device used, by simulation or downloaded): open standard formats should be preferred, as they maximize reproducibility and reuse by others and in the future [see List of recommended file formats by ETH Zurich]

  • Format of curated data (if applicable): open standard formats should be preferred [see List of recommended file formats by ETH Zurich]

  • Estimation of volume of raw and curated data.


Examples of answer to be adapted to your research application

Example 1

The data produced in this project will fall into [3] categories:

  1. […] 
  2. […] 
  3. […]

Data in category 1 will be saved and documented in [...] format and will amount to approximately […TB]. Data in category 2 will be saved and documented in […] format and will amount to approximately […GB]. Data in category 3 will be saved and documented in […] format and will amount to approximately […MB]. 

Example 2

The data are health records auto-generated by users of the application X. They are subjected to a contract with the company X.

All fields contain user observations and are entered manually, except for temperature which is measured by a Bluetooth connected thermometer.

Data fields per user (anonymised by X): User identifier; Age; Weight, Size.

Data fields per users per day of observation:

  • Temperature and time at which temperature is taken,

  • Cervical fluid quality (none, sticky, creamy, egg white, watery) and quantity (little, medium, lots),

  • Cervix height (low, med, high), cervix openness (closed, med, open), cervix firmness (firm, med, soft),

  • Sexual intercourse (protected or unprotected),

  • Menstruation (light, medium, heavy), spotting, starting a new cycle,

  • Custom data (notable predetermined fields are pregnancy test results or ovulation test results).

Data will be received in CSV format, and consists of the record of 2 million users. It will amount to maximum 1GB.

Example 3 (from an Eawag DMP)

There will be two categories of data: NEW data from this project and EXISTING data from the FOEN Lake Monitoring
program.


The NEW data will consist of several file types, all CSV real number format, which are all organized along the same principle: matrixes of times series with various channels, each corresponding to a sensor (number of sensors varies from 1 to10) and very different length, as the sampling frequency varies by several orders-of-magnitudes.

  1. 6 files of CO2, DO, PAR and temperature (24 files at a time; Figure 2), each file only 1 sensor (Delta = 10 min; continuous),
  2. Thetis profiles corresponding to time series (equivalent to depth series) of 10 sensors (Delta = 1 s; 5-10 times per day).
  3. 5 files of CO2 time series for short-term surface flux measurements (several files, one per month),
  4. meteodata file (eight sensors; continuous),
  5. T-Microstructure profiles files (6 sensors at 512 Hz; several files, once per month) and
  6. excel files for individual chemical samples (such as alkalinity, sediment trap estimates, ect; sporadic).

The EXISTING data is already available (CIPAIS, CIPEL) in excel sheets with matrices for the individual samplings and a variable number of parameters (~10 to ~25). The EXISTING data will not be modified and remains with the organizations. We will keep a copy on our computers during the project. We anticipate the data produced in category 1 to amount to several hundred MB for the moored and profiled sensor files and ~100 GB for the T-microstructure profiles; the EXISTING data in category 2 is in the range of ~20 MB.

Example sentence, if other researchers’ data will be reused

Reused data from Smith (2000, DOI: […]) and Miller (2011, DOI: […]) will be merged with the newly collected data.

Example sentence, if no pre-existing data will be reused

No other pre-existing data will be reused. 

Contact for assistance – ETH Zurich

Digital Curation Office: data-management@library.ethz.ch



1.2 How will the data be collected, observed or generated?

Questions you might want to consider

  • What standards, methodologies or quality assurance processes will you use?

  • How will you organize your files and handle versioning?

Explain how the data will be collected, observed or generated. Describe how you plan to control and document the consistency and quality of the collected data: calibration processes, repeated measurements, data recording standards, usage of controlled vocabularies, data entry validation, data peer review, etc.

Discuss how the data management will be handled during the project, mentioning for example naming conventions, version control and folder structures.

This relates to the FAIR Data Principle R1

Recommendations


What standards, methodologies or quality assurance processes will you use?

For each dataset in your project (including data you might re-use) mention:

  • the use of core facility services (specify their certifications, if any),

  • whether you follow double blind procedures (define it),

  • the use of standards or internal procedures; describe them briefly.

If you are working with persons’ data, confirm the following:

  • have the subjects of your data collection (persons) been fully informed (what data do you collect, what will you do with the data, and who will receive it; when will they be deleted) and have the subjects given their informed consent?

  • have the subjects of your data collection (persons) been informed about their rights on information, data deletion and data correction?
How will you organize your files and handle versioning?

Indicate and describe the tools you will use in the project.
You may rely on the following tools depending on your needs:

  • a naming convention, i.e. the structure of folders and file names you will use to organize your data.

For example: Project-Experiment-Scientist-YYYYMMDD-HHmm-Version.format (concretely: Atlantis-LakeMeasurements-Smith-20180113-0130-v3.csv)

  • a data management system, such as an Electronic Laboratory Notebook / Laboratory Information System (ELN/LIMS). Within ETH domain, examples of used ELN/LIMS: openBISSLims.
  • Additional ETH Zurich services:
    • The ETH Research Data Hub (ETH RDH) is an ETHZ-​wide data management solution for quantitative research groups that provides the lowest entrance barrier for labs that do not need extensive customization.
    • The ETH Research Data Node (ETH RDN) is a specific data management instance for one research lab and can be customized more extensively if needed.

Examples of answer to be adapted to your research application

Example 1

The reaction conditions will be recorded and collated using a spreadsheet application and named according to each generation of reaction as follows: YYYYMMDD_HHmm_ProjectW_ReactionX_GenerationY_ScientistZ.csv

The various experimental procedures and associated compound characterization will be written up using the [e.g., Royal Society of Chemistry (adapt to your own discipline)] standard formatting in a Word document. Each Word document will also be exported to PDF-A. The associated NMR spectra will be collated in chronological order in a PDF-A document.

Example 2

All samples on which data are collected will be prepared according to published standard protocols in the field [cite reference]. Files will be named according to a pre-agreed convention. The dataset will be accompanied by a README file which will describe the directory hierarchy.
Each directory will contain an INFO.txt file describing the experimental protocol used in that experiment. It will also record any deviations from the protocol and other useful contextual information.
Microscope images capture and store a range of metadata (field size, magnification, lens phase, zoom, gain, pinhole diameter etc.) with each image.
This should allow the data to be understood by other members of our research group and add contextual value to the dataset should it be reused in the future.

Example 3

The experimental records and observations are recorded by hand-written notes followed by digitization (scanning). The analytical data are collected by the instruments that generated them; they are processed by the native programs associated with the instruments. A periodic quality control process will be applied to remove errors and redundancies. Errors include for example incorrect handling and machine malfunction. The quality control process will be documented.

The quality of experimental records and observations will be controlled by repeating experiments.

For NMR and X-ray, the data collection is done through instrument standardised data acquisition programs. For E-chem, UV-Vis, IR, GC, GC-MS, lab-standardized protocols will be used.

Example 4, template if you are using openBIS

All files produced during this project will be stored in our Electronic Laboratory Notebook (ELN) and Laboratory Information Management System (LIMS) openBIS.
In this ELN, each scientist has a personal folder where to organize projects and experiments. Each experiment is described in the electronic notebook and all data related to the experiment is directly attached to it, in so called “datasets”. Each dataset is immutable, thus different file versions are stored in the lab notebook in different datasets with a manually generated version number. Very large datasets (100s of TBs) are not directly stored in openBIS datasets, but they are linked to the experimental description using an extension to openBIS called BigDataLink. This works similarly to the git version control software, so every time changes are made to the data, these need to be committed to openBIS, which automatically keeps track of the versioning.

Contact for assistance – ETH Zurich

Digital Curation Office: data-management@library.ethz.ch

Scientific IT Services: https://sis.id.ethz.ch/


1.3 What documentation and metadata will you provide with the data?

Questions you might want to consider

  • What information is required for users (computer or human) to read and interpret the data in the future?

  • How will you generate this documentation?

  • What community standards (if any) will be used to annotate the (meta)data?

Describe all types of documentation (README files, metadata, etc.) you will provide to help secondary users to understand and reuse your data. Metadata should at least include basic details allowing other users (computer or human) to find the data. This includes at least a name and a persistent identifier for each file, the name of the person who collected or contributed to the data, the date of collection and the conditions to access the data.

Furthermore, the documentation may include details on the methodology used, information about the performed processing and analytical steps, variable definitions, data dictionary, codebook, references to vocabularies used, as well as units of measurement.

Wherever possible, the documentation should follow existing community standards and guidelines. Explain how you will prepare and share this information.

This relates to the FAIR Data Principles I1, I2, I3, R1, R1.2 & R1.3

Recommendations


Indicate all the information required to be able to read and interpret the data (context of data) in the future. General documentation of the data is often compiled into a plain text or markdown README file. These formats may be opened by any text editor and are future-proof.

In addition, for each data type
  • Provide the metadata standard used to describe the data (for concrete examples see: https://fairsharing.org/standards/, https://bartoc.org, or Research Data Alliance Metadata Standards Directory). If no appropriate (discipline oriented) existing standard is available, you may describe the ad hoc metadata format you will use in this section. Metadata 1 may also be embedded in the data (e.g. embedded comments for code). Or, when for example using Hierarchical Data Format HDF5, arbitrary machine readable metadata can be included directly at any level.

  • Describe:

    • the software (including its Version) used to produce the data and the software used to read it (they can be different)

    • the format and corresponding filename extension and its version (if possible). 

 The used software should be archived along with the data (if possible, depending on the software license).

  • Describe the automatically generated metadata, if any.

  • Provide the data analysis or result together with the raw data, if possible.

Additional information that are helpful in a README file
  • description of the used software,

  • description of the used system environment,

  • description of relevant parameters such as:

    • geographic locations involved (if applicable)

    • all relevant information regarding production of data.


1 Metadata refers to “data about data”, i.e., it is the information that describes the data that is being published with sufficient context or instructions to be intelligible for other users. Metadata must allow a proper organization, search and access to the generated information and can be used to identify and locate the data via a web browser or web based catalogue.


Examples of answer to be adapted to your research application

Example 1

The data will be accompanied by the following contextual documentation, according to standard practice for synthetic methodology projects:

  1. Spreadsheet documents which detail the reaction conditions.

  2. Text files which detail the experimental procedures and compound characterization.

Files and folders will be named according to a pre-agreed convention [XYZ], which includes for each dataset, identifications to the researcher, the date, the study and the type of data (see section 1.2).

The final dataset as deposited in the chosen data repository will also be accompanied by a README file listing the contents of the other files and outlining the file-naming convention used.

Example 2

Metadata will be tagged in XML using the Data Documentation Initiative (DDI) format. The codebook will contain information on study design, sampling methodology, fieldwork, variable-level detail, and all information necessary for a secondary analyst to use the data accurately and effectively.
It will be the responsibility of:

  • each researcher to annotate data with metadata,
  • the Principal Investigator to check weekly (during the field season, monthly otherwise) with all participants to assure data is being properly processed, documented, and stored.

Example 3

Two types of metadata will be considered within the frame of the project X: (i) metadata corresponding to the project publications, and (ii) that corresponding to the published research data.

In the context of data management, metadata will form a subset of data documentation that will explain the purpose, origin, description, time reference, creator, access conditions and terms of use of a data collection.

The metadata that would best describe the data depends on the nature of the data. For research data generated in project X, it is difficult to establish global criteria for all data, since the nature of the initially considered datasets will be different, so that the metadata will be based on a generalised metadata schema as the one used in [e.g., ZENODO (gathered metadata depends on the chosen repository)] which includes elements such as:

  • Title: free text
  • Creator: Last name, first name
  • Date
  • Subject: Choice of keywords and classifications
  • Description: Text explaining the content of the data set and other contextual information needed for the correct interpretation of the data,
  • Format: Details of the file format,
  • Resource Type: data set, image, audio, etc.,
  • Identifier: DOI,
  • Access rights: closed access, embargoed access, restricted access, open access.

Additionally, a readme.txt file will be used as an established way of accounting for all the files and folders entailed in the project and explaining how all the files that make up the data set relate to each other, what their file format is or whether particular files are intended to replace other files, etc.

Example 4 (from an Eawag DMP)

For every data stream (sequences of identical data files) over the entire 2-year period of data acquisition a README File will be generated which contains: (a) the sensors used (product, type, serial number), (b) the temporal sequence of the sensors (time and location, sampling interval), (c) the observations made during maintenance and repairs, and (d) details on the physical units, as well as the calibration procedure and format. This is a standard procedure which we have used in the past.

Example 5 (from template for the SNSF Data Management Plan for openBIS users: https://sis.id.ethz.ch/services/rdm/SNSF-DMP-openBIS-template.pdf) 

In the data management system (openBIS ELN-LIMS), metadata are provided as attributes of the respective datasets. Based on the defined metadata schema, openBIS ELN-LIMS will be configured so that the required metadata is automatically assigned to datasets and / or manually provided by the researcher. Within openBIS we will provide metadata in line with the following metadata schema: […to be added by researcher…]

Contact for assistance – ETH Zurich

Digital Curation Office: data-management@library.ethz.ch

Scientific IT Services: https://sis.id.ethz.ch/




2. Ethics, legal and security issues

2.1 How will ethical issues be addressed and handled?

Questions you might want to consider

  • What is the relevant protection standard for your data? Are you bound by a confidentiality agreement?
  • Do you have the necessary permission to obtain, process, preserve and share the data? Have the people whose data you are using been informed or did they give their consent?
  • What methods will you use to ensure the protection of personal or other sensitive data?

Ethical issues in research projects demand for an adaptation of research data management practices, e.g. how data is stored, who can access/reuse the data and how long the data is stored. Methods to manage ethical concerns may include: anonymization of data; gaining approval by ethics committees; formal consent agreements. You should outline that all ethical issues in your project have been identified, including the corresponding measures in data management. In case not all ethical issues of your research project are solved yet, you might check for additional information or resources on the ETHics Resource Platform (https://www.ethicsrp.ethz.ch/).

If you assess that there are no ethical issues in your project, you can use the following statement: There are no ethical issues in the generation of results from this project.

This relates to the FAIR Data Principle A1


Recommendations


Description and management of ethical issues

Describe which ethical issues are involved in the research project (for example, human participants, collection/use of biological material, privacy issues (confidential/sensitive data), animal experiments, dual use technology, etc.).  

For more information, see


Explain how these ethical issues will be managed, for example:

  • The necessary ethical authorizations will be obtained from the competent ethics committee.

  • Informed consent procedures will be put in place.

  • Personal/sensitive data will be anonymized.

  • Access to personal/sensitive data will be restricted.

  • Personal/data will be stored in a secure and protected place.

  • Protective measures will be taken with regard to the transfer of data and sharing of data between partners.

  • Sensitive data is not stored in cloud services (e.g. data related to individuals, data under a non-disclosure agreement, data injuring third party rights or (legal) expertises).


Please check if your project involves data relating to (in bold) one of the following ethical issues:

  • Human participants (This includes all kinds of human participation, incl. non-medical research, e.g. surveys, observations, tracking the location of people)
  • Human cells/tissues
  • Human embryonic stem cells
  • A clinical trial
  • The collection of personal/sensitive/confidential data
  • Animal experimentation
  • Developing countries (access and benefit sharing)
  • Environmental and/or health and safety issues (for example, a negative impact on the environment and/or on the health and safety of the researchers.)
  • The potential for military applications (dual-use technology).
Ethical authorizations

If your project involves human subjects, an ethical authorization from either the cantonal ethics commission or the institutional ethics commission (ETH Zurich Ethics Commission) is needed. This depends on whether your project is invasive/non-invasive and whether or not health-related data is collected/used.

  • For research involving work with human cells/tissues, a description of the types of cells/tissues used in their project needs to be provided, together with copies of the accreditation for using, processing or collecting the human cells or tissues.

  • Research which involves the collection or use of personal data needs to be reviewed by the cantonal ethics commission or the ETH Zurich Ethics Commission’s (depending on what kind of data is involved). ETH Zurich: For more information, see the ETH Zurich Ethics Commission’s website (German).

  • If animal experiments are conducted in the context of the research project, an authorization of the cantonal veterinarian office is needed.
    (See also: ETH Zurich Animal Welfare Officer)

  • Dual-Use technologies (civil and military purposes): Transfer of knowledge, software, demonstrators or prototypes could fall under the scope of the Swiss Goods Control Act (GCA) and its Ordinance (GCO) in the context of technology transfer or research proposals, but also informal personal contacts. In case any US-technology is involved in research, the US-export control regulations should not be disregarded. Before transmission of information, research results, prototypes etc. to a company, person or institution (even academic) outside of Switzerland, it must be checked whether the data/information or material to be transmitted is subject to authorization

    For more information, see the ETH Zurich Export Control website.

  • Research that may have a negative impact on the environment, for example research with Genetically Modified Organisms (GMO), requires an authorization from the Federal Office for the Environment (FOEN). If the research project has a negative impact on the health and safety of the researchers involved (for example if the research proposal involves the use of elements that may cause harm to humans), authorizations for the processing or possession of harmful materials must be requested.
    More information can be obtained from the ETH Zurich Safety, Security, Health, Environment department (SSHE / SGU).


Examples of answer to be adapted to your research application

Example 1

The project does not involve human or animal subjects. Therefore, no ethical issues are expected to occur during the generation of results from this project. None of the data collected or reused in this project is subject to a confidentiality agreement.

Example 2

This project will generate data designed to study the prevalence and correlates of DSM III-R psychiatric disorders and patterns and correlates of service utilization for these disorders in a nationally representative sample of over 8000 respondents. The sensitive nature of these data will require that the data be released through a restricted use contract, to which each respondent will give explicit consent. An ethical authorization will be obtained from the cantonal ethics committee for this project.

The project respects all the constraints and requirements as laid down in the Swiss Federal Act on Data Protection.

Example 3

Research in this proposal involves the use of animals of the species mouse (Mus Musculus). Animal studies will be preceded by multiple biochemical experiments in vitro and in cultured cells. Mouse experiments will only be used at advanced stages of investigations when few, specific and highly relevant questions can be addressed by a limited number of experiments.

The PI and the research team will work in conformity with all applicable rules, guidelines and principles such as the EU directive 2010/63/EU on the protection of animals used for scientific purposes, the Swiss federal law on animal protection (RS 455), the federal ordinance on animal protection (RS 455.1), and the federal ordinance on animal experimentation, production, and housing (RS 455.163). All animal experiments will only be initiated after having received the approval of the Cantonal and Federal authorities.

Details on animal usage

In performing the experiments, we strive to strictly adhere to the 3Rs principle of Replacement, Refinement, and Reduction.

  • Reduction: Each experiment will be designed to use the minimum number of mice required to obtain statistical significance. For the proposed pharmacokinetic experiments, a total number of 24 mice will be required.

  • Refinement: The animals will be housed in the animal facilities of EPFL, which meet international housing norms, and the animal health status is monitored by a certified veterinarian. To reduce stress and discomfort of the animals, all procedures will be performed only after animals are anaesthetized. After experiments animals will be euthanized. Also, as soon as animals show signs of severe discomfort and/or tumor burden during experiments, they will be euthanized by cervical dyslocation after being anaesthetized.

  • Replacement: Alternatives for mouse experiments will be considered at all stages during the project. Whenever possible, these alternatives will replace the mouse experiments.

Training

All researchers and technicians working with the animals receive proper animal welfare training in conformity with DFE Ordinance 455.109.1 on ‘Training in animal husbandry and in the handling of animals’.

Example 4

The PI and the research team will work in conformity with all applicable rules, guidelines and principles such as the EU directive 2010/63/EU on the protection of animals used for scientific purposes, the Swiss federal law on animal protection (RS 455), the federal ordinance on animal protection (RS 455.1), and the federal ordinance on animal experimentation, production, and housing (RS 455.163). All animal experiments will only be initiated after having received the approval of the Cantonal and Federal authorities.
Details on animal usage:
In performing the experiments, we strive to strictly adhere to the 3Rs principle of Replacement, Refinement, and Reduction.
Training: All researchers and technicians working with the animals receive proper animal welfare training in conformity with DFE Ordinance 455.109.1 on ‘Training in animal husbandry and in the handling of animals’.

Example 5

Dataset X was obtained from the BAFU and is subject to a confidentiality agreement to keep information about the sampling locations secret. We are allowed to share this information among researchers involved in the project. The dataset is being stored in a location to which only project member have access. Please refer to Section 2.2 for technical details about access restrictions. All project members will be informed about sensitivity of this data and agree not to copy it to other places. This dataset and intermediate datasets containing the sampling locations will be excluded from the data package published along with the final report and replaced with instructions about how to obtain them from the BAFU.

Example 6 (anonymised data)

All data are anonymised, and as such, we are in line with the Swiss Federal Act on Data Protection as described on the page of the Federal Data Protection and Information Commissioner (FDPIC). The anonymised data will only be published in line with the consent forms signed by participants. Moreover, we will adhere to the recommendations of the selected FAIR repository regarding upload and licensing of the anonymised data.

References

ETH Zurich Guidelines on scientific integrity, RSETHZ 414 (as of 01.01.2022)

Guidelines for Research Data Management at ETH Zurich, RSETHZ414.2 (as of 01.07.2022)

The ETH Zurich Compliance Guide

Federal Data Protection and Information Commissioner

Contact for assistance – ETH Zurich

Ethics Commission (Website or Contact: raffael.iturrizaga@sl.ethz.ch)

Website of Legal Office (e.g. for Data Protection issues)

Website of the Animal Welfare Officer (status 04.11.2022)

Website of ETH transfer

Website of Safety, Security, Health, Environment department (SSHE / SGU)




2.2 How will data access and security be managed?

Questions you might want to consider

  • What are the main concerns regarding data security, what are the levels of risk and what measures are in place to handle security risks?

  • How will you regulate data access rights/permissions to ensure the security of the data?

  • How will personal or other sensitive data be handled to ensure safe data storage and transfer?

If you work with personal data or other sensitive data, you should outline the security measures in order to protect the data. Please list formal standards which will be adopted in your study. An example is ISO 27001-Information security management. Furthermore, describe the main processes or facilities for storage and processing of personal or other sensitive data. (This relates to the FAIR Data Principle A1.)

Recommendations


The main concerns regarding data security are data availability, integrity and confidentiality, in particular the levels of risks involved and technical and organizational measures as named in the Swiss Federal Act on Data Protection.

The main concerns regarding data security are data availability, integrity and confidentiality.


Define whether :

  • the level of the data availability risk is : low/medium/high.
  • the level of data integrity risk is : low/medium/high.
  • the level of data confidentiality is : low/medium/high.

You may choose some of the following options :

Regarding anonymization / encryption:

  • All personal data will be anonymized in such a way that it will be impossible to attribute data to specific persons.
  • All personal data will be pseudonymized. The correspondence table will be encrypted and access restricted to the project leader.
  • All sensitive data will be encrypted and encryption keys will be managed only by authorized employees.
  • Sensitive data transfers will be end-to-end encrypted.

Regarding access rights:

  • Sensitive data will be accessible only by authorized participants to the project. The list of authorized participants will be managed by…
  • Data access rules will be detailed in before starting the project.
  • Access to the data/database will be logged, thus each access is traceable.
  • Access to laboratory and offices will be restricted to authorized persons. The list of authorized persons will be managed by…

Regarding storage and back-up:

  • All data will be backed-up on a regular basis and access to backup media will be managed according to data access rules. Backups will be stored in another location.
  • All damaged media containing sensitive data will be physically destroyed.
  • All servers will be located in a datacentre with restricted access. The datacentre is based in [country] (preferably data are stored at ETH Zurich).
  • No data will be stored on a public cloud / cloud hosted outside Switzerland.
  • No sensitive/personal data will be stored in cloud service external to ETH Zurich. “Sensitive data can be for example data related to individuals, data under a non-disclosure agreement, data injuring third-parties rights or legal expertise).
  • All computers storing or computing sensitive data will not be connected to the Internet.
  • All computers storing or computing sensitive data will have a hardened configuration (disk encryption, restricted access to privileged accounts to a small, controlled group of users, restricted or disabled remote access using privileged accounts, disabled guest or default accounts, local firewall, automatic screen lock with password protection, disabled remote out-of-band management (IPMI, Active Management Technology (AMT), etc.), disabled USB ports, removable privacy filter on screens, automatic updates via “Windows Update”, Apple’s “Software Update” or Linux “yum auto-update”, anti-virus software, Adobe’s “Flashplayer” and “Java” runtime).

Please note

In May 2018, the EU General Data Protection Regulation (GDPR, Regulation (EU) 2016/679) has come into force. This influences future cooperation with any EU-based partners and will be implemented in Swiss law, as well.
GDPR introduces an approach of “Privacy by Design” for parties working with personal or other sensitive data, requiring projects to define their data protection measures from the beginning.

Where the GDPR applies you must outline in a Data Protection Impact Analysis (DPIA, text or table, see an example of the ICRC) the risks involved to the rights of your studies’ subjects and the security measures foreseen in order to protect the data. This is crucial for your project. The less risks you have, the better. The more data safeguards you can imply, the better. The earlier stage you imply them at, the better.

(Cf. Art 35 of the EU General Data Protection Regulation entering into force May 2018)

DPIA-Template of the ICRC

Examples of answer to be adapted to your research application

Example 1

The data will be processed, managed, and analysed on […description of server infrastructure…], which is regularly and automatically backed up. Raw sequencing data is stored on the group storage on […e.g, the Euler cluster at ETH]. Only authorised persons (project members) will have access to the storage server via their password-protected institutional accounts. Since no personal or confidential data is produced or reused in the project, no special infrastructure or security measures will be necessary. Research data used in this project will be classified in line with the Directive on “Information Security at ETH Zurich” and marked accordingly. [For the directive, see the Directive on “Information Security at ETH Zurich”, Appendix 1b in particular could be relevant and helpful for classification]

Example 2

All input data for analysis and output data from analysis will be shared in a GitLab repository [e.g., GitLab repository hosted by ETH Zurich’s IT Services, https://gitlab.ethz.ch] available to the project members. The same holds for the metadata about input and output data together with text file documentation for code. The systems from which these data are extracted by the company is only available to their employees. For security reasons, we do not share data or code in any other ways (e.g., by email). Research data used in this project will be classified in line with the Directive on “Information Security at ETH Zurich” and marked accordingly.
[For the directive, see the Directive on “Information Security at ETH Zurich”, Appendix 1b in particular could be relevant and helpful for classification]

Example 3

Leonhard Med is the ETH secure scientific Data and IT platform to securely store, manage and process confidential research data [e.g., sensitive personal data in biomedical research]. Sufficient security is provided by this tool, i.e. strictly restricted access to authorized users, secure isolated storage and encrypted backup, shared or dedicated compute resources, logging and monitoring of user activity, strictly restricted access to trusted external internet sites, compliancy with the Leonhard Med Acceptable Use Policy (https://rechtssammlung.sp.ethz.ch/Dokumente/438.1.pdf). 
Data analysis of offline data (i.e., processing that was not performed in real time) will be performed in the Leonhard Med platform. If the analysis requires software that is not available to the moment in Leonhard Med, computed statistical maps on the individual level (individuals cannot be identified) will be transferred to a dedicated computer where the analysis can be performed. Only internal research personnel will have access to Leonhard Med and will be trained for it. General authentication and authorization at ETH Zurich are handled by the Identity- and Access Management System of the central IT-Services at ETH.
Specific guidelines apply for using the Leonhard Med Cluster at ETH Zurich to securely store, manage, compute on and share confidential research data (https://ethz.ch/services/en/it-services/it-security/guidelines.html).
Because we will deal with personal, sensitive data, project members will be provided with and sign confidentiality agreements. Access to personal/sensitive data will be restricted.
Storage media and safety back-ups will require password access to prevent misuse. The applicant and the supervisor are responsible for secured access to datasets as well as for safeguarding the code keys. 
The level of the data availability risk is low, the level of data integrity risk is high, and the level of data confidentiality is high.
Regarding anonymization / encryption: All sensitive data will be encrypted at rest (e.g., on file system when not in use, in backups) and in transit and keys will be managed only by the [name of data controller]. Pseudonymized data are still sensitive personal data and there will be a concordance table stored at the data controller. All sensitive, personal data will be handled securely (concordance table will be encrypted as it is not actively used). We will follow the policy and available procedures of Leonhard Med for ensuring data security during transfer. Leonhard Med offers a specific secure data transfer process in encrypted form, via encrypted channels.
Regarding access rights: Sensitive personal data will be accessible only to the following people working in the project: [name person A] [name person B]. The grantee and the Principal Investigator of the project will be responsible to ensure compliance with these defined access rights. 
Regarding storage and back-up: All data will be backed-up on a regular basis [good practice is minimum daily] and access to backup media will be managed according to data access rules. All damaged media containing sensitive data will be physically destroyed. Research data used in this project will be classified in line with the Directive on “Information Security at ETH Zurich” and marked accordingly.
[For the directive, see https://rechtssammlung.sp.ethz.ch/Dokumente/203.25en.pdf, Appendix 1b in particular could be relevant and helpful for classification]

Example 4 (from an Eawag DMP)

The data we are generating, processing and storing in this project does not pose a particular data security risk. Day-to-day work is conducted on standard-issue workstations in the ETH-environment with standard enterprise-grade access control. The ETH network is a secured system following the best practices in terms of identity management and central storage facility has redundancy, mirroring and is monitored. At different stages, data will be stored in the Eawag Institutional Collection (see section 1.3). This system is accessible only from within the Eawag network and is comprised of several virtualized Linux systems that receive real-time security patches. Access control is handled according to recognized best practices of server administration.

“Notoriously Toxic”, NEH ODH Start-up Grant, Level 1, https://www.neh.gov/files/dmp_from_successful_grants.zip

Example 5 (EAWAG example)

Research records will be kept confidential, and access will be limited to the PI, primary research team members, and project participants. Data will be housed on a local server controlled by the PI, and will be accessible via SSH and VPN. Data containing identifiable information, or information covered by an NDA, will be held in an encrypted format (symmetric, AES256, key on local server, passphrase only known to PI and primary research team members).

Example 6 (template if you are using openBIS)

All data generated in the project will be stored in our open- BIS ELN-LIMS. This operates in a client-server model, which is installed and maintained by the ETH Zurich IT services on ETH Zurich infrastructure. Researchers can access openBIS via any of the most common web browsers. openBIS requires user authentication with ETH Zurich credentials and it provides user right management, so that different users can have different access to all or different parts of the system, as required. Below is a description of the default openBIS roles, which can be modified upon request:

  1. Instance admin. Has full admin powers. Can customize settings, create, modify and delete entities, assign user roles, create data spaces.
  2. Instance observers. Has read-only access to everything in openBIS.
  3. Space admin. Can create, modify, delete entities and assign roles only within a given data space.
  4. Space power user. Can create, modify and delete entities only within a given data space.
  5. Space user. Can create and modify entities only within a given data space.
  6. Space observer. Has read-only access limited to a given data space.

openBIS does not offer any specific option for sensitive data, but the data will be encrypted prior to upload to openBIS. Furthermore, all operations on the system (incl. which users log in and when) are logged, so that it is fully transparent who did what to the data and when.

The data stored in openBIS is physically located on a NAS (network attached storage) provided by the ETH Zurich IT Services. The access to the share’s data is governed by the latest security best practices and only a limited number of employees of the ETH Zurich IT services have access to that share.

References

ETH Zurich Guidelines on scientific integrity, RSETHZ 414 (as of 01.01.2022)

Guidelines for Research Data Management at ETH Zurich, RSETHZ414.2 (as of 01.07.2022)

The ETH Zurich Compliance Guide


Contact for assistance – ETH Zurich

Digital Curation Office: data-management@library.ethz.ch

Website of the Legal Office (e.g. for Data Protection issues)

IT Support Groups in the Departments

Website of the Scientific IT Services




2.3 How will you handle copyright and Intellectual Property Rights issues?

Questions you might want to consider

  • Who will be the owner of the data?

  • Which licenses will be applied to the data?

  • What restrictions apply to the reuse of third-party data?

Outline the owners of the copyright and Intellectual Property Right (IPR) of all data that will be collected and generated including the licence(s). For consortia, an IPR ownership agreement might be necessary. You should comply with relevant funder, institutional, departmental or group policies on copyright or IPR. Furthermore, clarify what permissions are required should third-party data be re-used.

This relates to the FAIR Data Principles I3 & R1.1


Recommendations


Attaching a clear license to a publicly accessible data set allows other to know what can legally be done with its content. When copyright is applicable, Creative Commons licenses are recommended. However, Creative Commons licenses are not recommended for software.

Amongst all Creative Commons licenses, CC0 “no copyright reserved” is recommended for scientific data, as it allows other researchers to build new knowledge on top of a data set without restriction. It specifically allows aggregation of several data sets for secondary analysis. Several data repositories impose the CC0 license to facilitate reuse of their content.

In order to enable a data set to get cited, and therefore get recognition for its release, it is recommended to attach a CC-BY “Attribution” license to the record, usually a description of the dataset (metadata). To get recognition, data sets can be cited directly. However, to increase their visibility and reusability, it is recommended to describe them in a separated document licensed under CC BY “Attribution”, such as a data paper or on the institutional repository.

When the data has the potential to be used as such for commercial purposes, and that you intend to do so, the license CC BY-NC allows you to keep the exclusive commercial use.

Reuse of third-party data may be restricted. If authorised, the data must be shared according to the third party’s original requirement or license. 

For licensing of software at ETH Zurich, please see https://documentation.library.ethz.ch/pages/viewpage.action?pageId=9208031

Examples of answer to be adapted to your research application

Example 1

The collected data is suitable for sharing. They are observational [or: experimental] data and could be used for other analyses or for comparison in future studies. Reuse opportunities are vast. For this reason, the project participants aim to allow the widest reuse of the data and will release them under a Creative Commons public domain dedication (CC0) [alternative: creative commons CC-BY licence]. With regards to data sharing, there are no restrictions due to copyright or intellectual property rights.

Example 2 (some data confidential or strictly confidential due to contracts):

All data used in the project is owned by the collaborating company. A contract has been signed by both parties stating the project’s aims, methods, ownership rights, and how the project outcomes will be used.

Example 3 (industrial collaboration)

This project is being carried out in collaboration with an industrial partner. The intellectual property rights are set out in the collaboration agreement. The intellectual property generated from this project will be fully exploited with help from the institutional Technology Transfer Office. The aim is to patent the final procedure and then publish the work in a research journal and to publish the supporting data under an open Creative Commons Attribution (CC BY) license.

Example 4

The research is not expected to lead to patents. Other Intellectual Property Rights (IPR) issues will be dealt in line with the institutional recommendation. As the data is not subjected to a contract and will not be patented, it will be released as open data under Creative Commons CC0 Public Domain Dedication.

Example 5

The source code for analysis will most likely utilize the GNU Scientific Library (GSL), which is licensed under the GNU General Public License (GPL). Therefore we will make our analysis software available under the GPL as well.

References

ETH Zurich Guidelines on scientific integrity, RSETHZ 414 (as of 01.01.2022)

Guidelines for Research Data Management at ETH Zurich, RSETHZ414.2 (as of 01.07.2022)

The ETH Zurich Compliance Guide

Contact for assistance – ETH Zurich

Digital Curation Office: data-management@library.ethz.ch

Website of ETH transfer (e.g. for research contracts)



3. Data storage and preservation

3.1 How will your data be stored and backed-up during the research?

Questions you might want to consider

  • What is your storage capacity and where will the data be stored?

  • What are the back-up procedures?

Please mention what the needs are in terms of data storage and where the data will be stored.

Please consider that data storage on laptops or hard drives, for example, is risky. Storage through IT teams is safer. If external services are asked for, it is important that this does not conflict with the policy of each entity involved in the project, especially concerning the issue of sensitive data.

Please specify your back-up procedure (frequency of updates, responsibilities, automatic/manual process, security measures, etc.)


Recommendations


Institutional storage solutions:

For ETH Zurich, see storage options here and consult the IT Support Group of your Department.


Examples of answer to be adapted to your research application

Example 1

All data are stored on the standard departmental ETH server. If the data sets are exceeding a reasonable amount it is switched to the Network Attached Storage (NAS) also hosted by ETH. Both, standard ETH server and NAS include automatic daily backups and are maintained by ETH Zurich’s IT Services.

Example 2 (supplement to Example 1 for GitLab users)

[Example 1 +] All data will also be uploaded to and stored on GitLab for version control [e.g., GitLab repository hosted by ETH Zurich’s IT Services, https://gitlab.ethz.ch]. This holds for both the raw input data that will be processed for analysis and any output data from the analysis.

Example 3 (for project with personal, sensitive data)

The data will be stored via using the following resources:

  • For general storage, ETH’s server infrastructure (for aggregated, non-personal and non-sensitive data) and Leonhard Med (for personal, sensitive data) will be used, with standardized, daily backup procedures.
  • A copy of the non-personal and non-sensitive data will also be stored on local hard-drives.
  • For code storage and version control, we will use GitLab. (https://gitlab.ethz.ch/).

Data stored on the ETH server infrastructure and Leonhard Med will be automatically backed-up daily and storage in hard drives will be backed-up weekly. Insertions and changes made in GitLab are tracked and versions are kept automatically. The team will be instructed to follow a checklist for storing and backing up data, standardizing the procedure.

Example 4 (Electronic Laboratory Notebook)

All our data will be uploaded to our Electronic Laboratory Notebook. The data is stored on institutional storage facilities and it is set up by our IT support to be automatically backed up daily.

Example 5

The ETH centralized file storage service follows the best practices and standards regarding storage, for instance high availability, multiple levels of data protection, partnership with providers for support. The service is managed centrally by by ETH Zurich’s IT Services and ensures security, coherence, pertinence, integrity and high-availability.

Two distinct storage locations can be found on the ETH campus with replication between the two. (Please note that these two different storage options correspond to different payments.) Physical servers’ pairing and clustering guarantees local redundancy of data. Moreover, volume mirroring protects data in case of disaster on the primary site. The copy is asynchronous and automatic and runs every two hours.

The file servers are virtualized for separation between logical data and physical storage, RAID groups ensure physical storage protection: data is split in chunks written on many disks with double parity. Moreover, volume snapshots are used and can allow user restoration of previous versions if need be. For specific needs, optional backup on tape can also be done.

Access to the data is managed by the owner of the volumes through the identity management system of ETH. Any person who needs access to data has therefore to be a registered and verified user in the identity management system.

Example 6 (template* if you are using openBIS)

openBIS uses a postgres database that stores all metadata. This database is backed up (“pg_dump”) every night with a 7 days retention of the dumps and fully backed-up twice a week with a backup retention of 20 days. The full backup procedure includes a point-in-time recovery that allows a finer granularity (up to minutes) of data recovery in case of a disaster. The database backup is stored on the NAS (network attached storage) provided by the ETH Zurich IT services. The same NAS is used to store the data uploaded to openBIS. This network attached storage is snapshot every night with a 7 days retention, and data is backed up on a proprietary tape library with a retention of 90 days.

Data which is no longer actively needed is moved to the long term storage (i.e. tapes). The tape library where openBIS moves the data has a read-only replica in a different geographical location in order to minimize any data loss.

*For data linked to openBIS with the BigDataLink tool, please provide details of the data location and back-up.

Example 7

Original notebooks and hardcopies of all NMR and mass spectra are stored in the PI’s laboratory. Additional electronic data will be stored on the PI’s computer, which is backed up daily. Additionally, the laboratory will make use of the PI’s lab server space at institution’s storage facility for a second repository of data storage. The PI’s lab has access to up to 1 terabyte of information storage, which can be expanded if needed.

All the project data will be stored using the institution’s collaborative storage, which is backed-up on a daily basis.

Contact for assistance

Digital Curation Office: data-management@library.ethz.ch

Website of the Scientific IT Services




3.2 What is your data preservation plan?

Questions you might want to consider

  • What procedures would be used to select data to be preserved?

  • What file formats will be used for preservation?

Please specify which data will be retained, shared and archived after the completion of the project and the corresponding data selection procedure (e.g. long-term value, potential value for re-use, obligations to destroy some data, etc.). Please outline a long-term preservation plan for the datasets beyond the lifetime of the project.

In particular, comment on the choice of file formats and the use of community standards.


Recommendations


Describe the procedure, (appraisal methods, selection criteria …) used to select data to be preserved.  Note that preservation does not necessarily mean publication (e.g. personal sensitive data may be preserved but never published), but publication means generally preservation. 

This section should answer the following questions:

  • What data will be preserved in the long term - selection criteria, in particular:

  • Reusability of the data: quality of metadata, integrity and accessibility of data, license allowing reuse, readability of data (chosen file formats),

  • Value of the data: indispensable data, completeness of the data or data set, uniqueness, possibility to reproduce the data in the same conditions and at what cost, interest of the data, potential of reuse

  • Ethical considerations

  • Stakeholders requirements

  • Costs: additional costs that come for depositing data in a repository or data archive of your choice (costs anticipation and budgeting)

Selection basically has to be done together with or by the data producer or someone else with deep specialist knowledge.

  • What data curation process(es) will be applied, i.e.: anonymization (if necessary), metadata improvement, format migration, integrity check, measures to ensure accessibility.

  • Data retention period (0, 5, 10, 20 years or unlimited)

  • Decision to make the data public

  • Use of sensitive data (i.e. privacy issues, ethics, or intellectual property laws)

  • Definition of the responsible person for data (during the process of selection and after the end of the project)

For more information on useful criteria, see also Beagrie, Neil (2019). What to Keep: A Jisc research data study. Joint Information Systems Committee (JISC). Available at: https://repository.jisc.ac.uk/7262/1/JR0100_WHAT_RESEARCH_DATA_TO_KEEP_FEB2019_v5_WEB.pdf

In addition, select appropriated preservation formats (see section 1.1) and data description or metadata (see section 1.3).


Examples of answer to be adapted to your research application

Example 1

We will preserve the data for 10 years on ETH’s servers and also deposit it in an appropriate data archive at the end of the project [e.g., disciplinary data repository/archive, ETH Research Collection (with long-term preservation in the ETH Data Archive, or Zenodo, see examples in section 4.1 below)]. Where possible, we will store files in open archival formats, for example, Word files converted to PDF-A or simple text files encoded in UTF-8 and Excel files converted to CSV. In case this is not possible, we will include information on the software used and its version number.

Example 2 (some data confidential due to contracts)

This research project is an industrial collaboration. The data are owned by the collaborating company and constitutes a valuable resource in the highly competitive industry in which they operate. For this reason, the original data cannot be preserved in a public data archive. Instead, the original raw input data owned by the industrial partner is preserved by the collaborating company. All aggregated, non-confidential parts of the data will be submitted to the [name of the archive, e.g., disciplinary data repository/archive, ETH Research Collection (with long-term preservation in ETH Data Archive), or Zenodo] to be kept for a minimum of 10 years. Where possible, we will store these files in open archival formats. E.g., Word files will be converted to PDF-A or simple text files encoded in UTF-8 and Excel files will be converted to CSV. In case this is not possible, we will include information on the software used and its version number.

Example 3 (for project with personal, sensitive data)

The data will only be stored in an appropriate archive once they are fully anonymized and only in line with the consent forms signed by participants. Moreover, we will adhere to the recommendations of the selected FAIR repository regarding upload and licensing of the anonymized data.

Example 4

Text documents and scans will be stored as PDF/A files or unformatted ASCII files with file name extension .txt. Excel files will be stored in csv  format. Files from the statistics software R will be stored as ASCII files. All these formats are recommended for long-term archiving. The ETH Data Archive assesses the long-term readability of these file formats as high and takes measures for long-term usability.

Contact for assistance – ETH Zurich

Digital Curation Office: data-management@library.ethz.ch

IT Support Groups in the Departments



4. Data sharing and reuse

4.1 How and where will the data be shared?

Questions you might want to consider

  • On which repository do you plan to share your data?

  • How will potential users find out about your data?

Consider how and on which repository the data will be made available. The methods applied to data sharing will depend on several factors such as the type, size, complexity and sensitivity of data.

Please also consider how the reuse of your data will be valued and acknowledged by other researchers.

This relates to the FAIR Data Principles F1, F3, F4, A1, A1.1, A1.2 & A2


Recommendations


It is recommended to publish data in well established (or even certified) domain specific repositories, if available:

  • A list of repositories recommended by the SNSF can be found on its webpage.
  • re3data is a repository directory allowing to select repositories by subject and level of trust (e.g. certifications)
  • ETH Zurich researchers are encouraged to publish data in ETH’s own Research Collection repository to ensure full compliance with ETH regulations.

In domains for which no suitable subject repositories are available, generalist repositories are available.

Among the most common used:

  • Zenodo (free, maximum 50GB/dataset, hosted by CERN)

  • Dryad (120$ for the first 20GB and 50$ for additional GB, Non-profit organization)

  • Figshare (free upload, maximum 5GB / dataset, commercial company)

Note

SNSF does not pay for storage in commercial data repositories (even though data preparation costs are eligible). Check the SNSF’s criteria for non-commercial repositories here (section 5.2). If you choose a commercial repository, read carefully the Terms of service to check if they respond to your needs and to your institutions’ ones as well as to your institutional (data) policy.

In order to make your data findable by other users, it is important that

  • each data packet and publication has a DOI (or similar persistent identifier) assigned,

  • they are deposited Open Access in a repository harvested by the main data services (e.g.: OpenAire, EUDAT,…).


Examples of answer to be adapted to your research application

Example 1

Data collected in this project will be released under a Creative Commons public domain dedication (CC0) [alternative: creative commons CC-BY licence] in the ETH Research Collection [alternatives: appropriate subject specific repository XY; Zenodo; others…] as a FAIR data repository. Data in the repository will be stored in accordance with the SNSF's data policies. Data underlying publications will be shared at the point of publication of a journal article or book chapter, while all remaining data will be made available at the end of the project period.

Datasets deposited in the Research Collection will be given a Digital Object Identifier (DOI). Data deposited in the Research Collection with individual file sizes of less than 10 GB are also archived in the ETH Data Archive for long-term preservation. The DOI issued to datasets in the repository can be included as part of a data citation in publications, allowing the datasets underpinning a publication to be identified and accessed. Metadata about datasets held in the Research Collection will be publicly searchable and discoverable and will indicate how and on what terms the dataset can be accessed.

Example 2 (some data confidential due to contracts)

This research project is an industrial collaboration. The data are owned by the collaborating company and constitutes a valuable resource in the highly competitive industry in which they operate. For this reason, the original data cannot be made publicly available. As far as contractual obligations with the industry partner permit, metadata-only entries that describe the datasets will be made available in the ETH Research Collection [alternatives: appropriate subject specific repository 'XY'; Zenodo; others…] as a FAIR data repository. In that way, other researchers can find the dataset and get information about it without having direct access to the protected data.

Example 3 (data are sensitive, personal data that can be anonymized)

Patients have been informed and provided consent with a signed consent form regarding the publication of their anonymized data in a public repository. The collected patient data will be fully anonymized before publication. The anonymized data will be released under the standard usage licence (rightsstatements.org/page/InC-NC/1.0/) in the ETH Research Collection [alternatives: appropriate subject specific repository XY; Zenodo; others…] as a FAIR data repository. Anonymized data underlying publications will be shared at the point of publication of a journal article or book chapter, while all remaining data will be made available at the end of the project period.

Example 4 (data are sensitive, personal data that cannot be fully anonymized)

The collected data contains genetic information that could easily identify individual persons. Therefore, the respective data must be protected and cannot be published in a repository. As far as the metadata do not contain any confidential or personal information, metadata-only entries that describe the datasets will be made available in the ETH Research Collection [alternatives: appropriate subject specific repository XY; Zenodo; others…] as a FAIR data repository. In that way, other researchers can find the dataset and get information about it without having direct access to the personal data.

Example 5 (example with GitHub and GitHub-Zenodo connection)

Some of the ongoing data will be shared on [Researcher1]’s Github repository (results and code from the project, data from twitter searches). Major revisions of this page will be baked up using the Github-Zenodo connection (see: https://guides.github.com/activities/citable-code/). All other data we will be published on Zenodo under CC0 Public Domain Dedication.

We chose Zenodo because it supports the FAIR principles (http://about.zenodo.org/principles/). Zenodo implements long-term preservation features, notably bitstream preservation.

Example 6

For this project, the National Geoscience Data Centre (NGDC) (see http://www.bgs.ac.uk/services/ngdc/home.html) is the most suited repository. As it is adapted to geodata, it facilitates storage and allows interactive geographical search. In addition, many other researchers in our field are familiar with it.

This repository requires the deposition under Open Governement Licence (see : http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/), which demands attribution when the data is reused (our dataset must by cited, similarly to the CC BY license).

References

SNSF’s criteria for non-commercial respositories

Contact for assistance – ETH Zurich

Digital Curation Office: data-management@library.ethz.ch

Website of the Research Collection




4.2 Are there any necessary limitations to protect sensitive data?

Questions you might want to consider

  • Under which conditions will the data be made available (timing of data release, reason for delay if applicable)?

Data have to be shared as soon as possible, but at the latest at the time of publication of the respective scientific output.

Restrictions may be only due to legal, ethical, copyright, confidentiality or other clauses. Describe your restrictions for data sharing due to ethical or legal constraints, preparation for patent application, security constraints, contractual obligations, intended commercial purposes and copyright issues as outlined in the Guidelines for Research Data Management at ETH Zurich. Be aware that confidential and/or person-related research data (as defined in footnote 4) can only be published in completely anonymised 9 form and in line with consent obtained from study participants. This purpose should already be considered when preparing consent forms for study participants.

Sensitive or confidential data are usually information relating to an identifiable person. Data can also be confidential, e.g., because they have to be protected from third-party access due to contractual agreements. If such data are used in a research project, the data management practice has to be adapted to deal with sensitive or confidential data in an appropriate way. Sensitive data are data that might cause serious harm when they fall into the hands of unauthorized persons. Regarding research data, sensitive data include but are not limited to sensitive personal data, research plans, contract agreements or geolocation data. Sensitive data must usually be classified as CONFIDENTIAL or STRICTLY CONFIDENTIAL[i] at ETH Zurich.

In case of business interests or similar restrictions, consider whether a non-disclosure agreement would give sufficient protection for confidential data.

This relates to the FAIR Data Principles A1 & R1.1

[i] Directive on “Information Security at ETH Zurich (RSETHZ 203.25) https://rechtssammlung.sp.ethz.ch/Dokumente/203.25en.pdf (as of 1 August 2021)

Recommendations


You may mention specifically the conditions under which the data will be made available:

  • there are no sensitive data

  • the data are not available at the time of publication

  • the data are not available before publication

  • the data are available after the embargo of …

  • the data are not available because of the patent of … for a period of…


Examples of answer to be adapted to your research application

Example 1

The project does not involve usage of any sensitive data. Therefore, no special limitations to data use or reuse are necessary.

Example 2 (in case of strictly confidential data or confidential owned by e.g. a company)

The data used in this project will be handled in line with the respective classification level [strictly confidential and/or confidential, see DMP section 2.2.] that has been selected in accordance with the Directive on “Information Security at ETH Zurich”. All data is aggregated, anonymized and processed to be compliant with data privacy laws. As described, the original data will nonetheless not be made available due to strict confidentiality. The data ownership lies with the partner company.

Example 3 (in case of personal, sensitive data, i.e. strictly confidential data)

The data used in this project will be handled in line with the respective classification level [strictly confidential and/or confidential, see DMP section 2.2.] that has been selected in accordance with the Directive on “Information Security at ETH Zurich”. All data is aggregated, anonymized and processed to be compliant with data privacy laws. [Data that cannot be fully anonymized cannot be published. Anonymized data can only be published to the extent covered by informed consent.]

[Ooptional addition:] Personal data will be anonymized before diffusion based on the recommendations from the Federal Act on Data Protection (FADP) (https://www.edoeb.admin.ch/edoeb/en/home/latest-news/aktuell_news.html#-2053438021, accessed 01.11.2022). The package SDC-Micro (https://cran.r-project.org/package=sdcMicro) will be used to assess the risk of identification: we will make sure that each data set has a k-anonymity of 3 at least.

Example 2 (Eawag example)

The extensive household survey about water-born diseases poses severe challenges with regard to anonymization. Pseudonymization is inufficient to guard against the identification of individual households by an inference that uses other available information.

Therefore we will be only able to publish summary statistics together with the associated article. If a sufficiently anonymized dataset turns out to still hold scientific value, we will publish it no later than one year after completion of the project.

References

ETH Zurich Guidelines on scientific integrity, RSETHZ 414 (as of 01.01.2022)

Guidelines for Research Data Management at ETH Zurich, RSETHZ414.2 (as of 01.07.2022)

The ETH Zurich Compliance Guide


Contact for assistance – ETH Zurich

Digital Curation office: data-management@library.ethz.ch

Ethics Commission (Website or Contact: ethics@sl.ethz.ch)




4.3 All digital repositories I will choose are conform to the FAIR Data Principles 

[CHECK BOX]


Recommendations


The SNSF requires that repositories used for data sharing are conformed to the FAIR Data Principles. For more information, please refer to the SNSF’s explanation of the FAIR Data Principles.

You can find certified repositories in Re3data.org, an exhaustive registry of data repositories.

ETH Zurich’s Research Collection also complies with the FAIR Principles.





4.4 I will choose digital repositories maintained by a non-profit organisation

[RADIO BUTTON yes/no]

Recommendations


If you do not choose a repository maintained by a non-profit organization, you have to provide reasons for that.

One possible reason would be to ensure the visibility of your research, for example, if your research community is standardly publishing data on a well-established but commercial digital repository.

Please note that the SNSF supports only the use of non-commercial repositories for data sharing. Costs related to data upload are only covered for non-commercial repositories. Check the SNSF’s criteria for non-commercial repositories (section 5.2).





External useful resources

Digital Curation Centre glossary (alternatively, glossary in this Wiki: Glossary - Research Data Management)

Casrai dictionary

List of useful tools prepared by the Swiss DLCM project


  • No labels