This is the collection of the most frequently used RDM-related terms from the ETH Library. These terms refer to the conditions at ETH Zurich and are aligned with Swiss law. A universal and more comprehensive RDM terminology is provided by CODATA.
In the context of research data, accessibility is part of the FAIR principles. It requires, "that data are archived [...] and can be made available using standard technical procedures. This does not mean that the data have to be openly available for everyone, but information on how the data could be retrieved (or not) has to be available. […]. Ideally, though, the information about data accessibility can also be read by machines, e.g., by way of machine-readable standard licences.”1
“An archive is a system which makes the organised storage and retrieval of historical data, documents and objects possible. The way its contents are organised depends on the underlying policy. Archives can be provided as a service or set up and operated independently. For long-term preservation of 10 years and more, special archiving systems are required. A particular form of archive is a repository."2
Anonymisation means that “all items which, when combined, would enable the data subject to be identified without disproportionate effort, must be irreversibly masked or deleted”.3Anonymisation is different from pseudonymisation.
A backup is a copy of files and folders that is stored on a different medium and in a separate location. Backups facilitate data recovery if necessary, for instance, when original data are lost or destroyed.
“Community Standards are understood as both explicit, formalised standards and informal, but well established and widely accepted best practices within a community. A community can, e.g. comprise researchers sharing an interest in the same object of research or working with the same methods. Common community standards address, e.g. identification, citation and reporting of data and metadata. They reflect a community’s consensus, at a certain point in time, on how reproducible and reusable research should be implemented.”4
Copyright gives authors exclusive rights to decide how to deal with their works and copyright law protects authors from abuse of their work. “Works are literary and artistic intellectual creations with individual character, irrespective of their value or purpose.”5 According to Swiss law, scientific work and computer programmes are included.
Creative Commons License
Creative Commons licenses offer individuals and institutions “a standardized way to grant the public permission to use their creative work under copyright law. From the reuser’s perspective, the presence of a Creative Commons license on a copyrighted work answers the question, “What can I do with this work?”6
Data Availability Statement
“Data Availability Statements provide a statement about where data necessary for validating the research findings reported in a result publication can be found - including, where applicable, hyperlinks to publicly archived additional datasets analysed or generated during the research activity.”7
Recommendations and example formulations for such statements can be found here
“Data Curation describes the management activities necessary to maintain research data long-term so that they are available for conservation and reuse. In the broadest sense curation is a compilation of processes and actions performed to create, manage, maintain and validate a component. Therefore, it is the active and ongoing management of data during its’ life cycle. Data curation facilitates the search, discovery, and availability of data as well as quality control, value, and reuse over time.”8
Data Life Cycle (DLC)
In the course of a project, research data pass through various stages, from generation and collection of the data until final storage in a repository or archive for the purpose of long-term reuse or deletion if required. All these stages are combined in the so-called life cycle of your research data. It is recognised that the DLC represents a simplified model of data stages in a research project.9
Data Management Plan (DMP)
A plan intended to define the “roadmap” for researchers during their research project and which documents the long-term handling of the research data collected. The DMP usually reflects every step of the Data Life Cycle.
This wiki page will provide further details about DMPs including checklists and guides to observe requirements of the respective funding agency.
Data Minimisation is a principle when working with personal data. It requires entities to process only adequate, relevant personal data limited to what is necessary for the purpose. Additional personal data must not be collected or kept when it is not required for the intended purpose.10
“Long-term preservation and the principle of open access to research data offer broad opportunities for the scientific community. In the last decade, more and more universities and research centres established research data repositories allowing permanent access to data sets in a trustworthy environment. Due to disciplinary requirements, the landscape of data repositories is very heterogeneous.”11
The institutional repository at ETH Zurich is the ETH Research Collection.
“A person responsible for keeping the quality, integrity, and access arrangements of data and metadata in a manner that is consistent with applicable law, institutional policy, and individual permissions. A data steward aims at guaranteeing that data is appropriately treated at all stages of the research cycle (i.e., design, collection, processing, analysis, preservation, data sharing and reuse).”12
Digital Long-Term Preservation
Digital long-term preservation encompasses all measures to perpetuate reusability of digital information in the future.
Digital Object Identifier (DOI)
“A Digital Object Identifier (DOI) is an alphanumeric string assigned to uniquely identify an object. It is tied to a metadata description of the object as well as to a digital location, such as a URL, where all the details about the object are accessible.”13 A DOI is one type of a persistent identifier.
Electronic Lab Notebook (ELN)
An application that in its simplest form supports digital notetaking in the lab or in the field. Besides essential advantages such as searchability and backups, an enhanced functionality allows e.g., linking to additional digital resources. You might employ the Electronic Lab Notebook Finder to search for an appropriate tool for your research.
In the context of publishing research data “a (temporary) embargo is a timespan in which only descriptions of the research data, meaning descriptive metadata, is accessible, for example in repositories. The corresponding data publication is not available. An embargo can be used if the publication of research data is supposed to be delayed (e.g., during a peer-review-process).”14
ETH Data Archive
The ETH Data Archive represents ETH Zurich’s long-term preservation solution for digital data. Direct access to the ETH Data Archive is offered exclusively for archiving software and source code within the scope of the ETH transfer software registration process. All other data or scientific publications are first published in the ETH Research Collection and then automatically exported to the ETH Data Archive where they will be preserved in the long term (minimum 10 years).
ETH Research Collection
The ETH Research Collection is the institutional data repository of the ETH Zurich which follows the FAIR principles. ETH Zurich researchers can publish scientific publications and research data via the repository. The Research Collection also serves as ETH Zurich’s bibliography for the Annual Academic Achievements reporting.
FAIR Data and FAIR Principles
The FAIR principles apply to research data. However, a large part of the more technical requirements for making data FAIR can be met by uploading data to a FAIR data repository. Nevertheless, researchers themselves must still provide the relevant scientific context information.
In the context of research data, findability is part of the FAIR principles. It “means that the data can be discovered by both humans and machines, for instance by exposing meaningful machine-actionable metadata and keywords to search engines and research data catalogues. The data are referenced with unique and persistent identifiers and the metadata include the identifier of the data they describe.”16
Good Scientific Practice
“Good scientific practice is defined as discipline-specific concretisations in the form of standards derived from the Basic Principles. These can concern, among other things, specifications regarding study design, source references or authorship of publications. Corresponding standards are formulated by professional societies, academies, research funding organisations, publishers and universities and are to be followed by the scientists interacting with these organisations or belonging to these organisations.”17
Guidelines for Research Data Management at ETH Zurich
“The Guidelines for Research Data Management […] specify further details of Research Data Management […]. […]. They serve to establish minimal requirements for all ETH Zurich members involved in scientific research and to define responsibilities. They can be complemented by departmental regulations which foster recognised Community Standards pertaining to Research Data Management and Open Research Data by design.”18
“Persons may only be involved in a research project if they have given their informed consent. […]. The persons concerned must receive comprehensible oral and written information on: The nature, purpose and duration of, and procedure for, the research project; the foreseeable risks and burdens; the expected benefits of the research project, in particular for themselves or for other people; the measures taken to protect the personal data collected; their rights. Before a decision on consent is made by the persons concerned, they must be allowed an appropriate period for reflection.”19
“If the intention exists to make further use for research of biological material sampled or health-related personal data collected, the consent of the persons concerned must be obtained at the time of such sampling or collection, or they must be informed of their right to dissent.”20 "The persons concerned may withhold or revoke their consent at any time, without stating their reasons."21
In the context of research data, interoperability is part of the FAIR principles. It “means that the data can be exchanged and used across different applications and systems — also in the future, for example, by using open file formats. It also means that the data can be integrated with other data from the same research field or data from other research fields. This is made possible by using metadata standards, standard ontologies, and controlled vocabularies as well as meaningful links between the data and related digital research objects.”22
"Metadata are data that provide information about data.”23 The term "metadata" comprises all auxiliary information which describe the characteristics of a set of data (i.e., data about the actual dataset). “Metadata are stored either independently of or together with the data they describe. An exact definition of metadata is difficult since the term is being used in different contexts and distinctions can vary according to perspective. Usually there is a distinction between discipline-specific and technical/administrative metadata.”24
“For interoperability, i.e., the linking and common processing of metadata, metadata standards for specific purposes were set up. Metadata standards aim at a uniform description of similar data, both in terms of content and structure. A metadata standard as such can often provide a so-called mapping to another metadata standard.”25 Please check Community Standards in our glossary for a more general view on standards.
Non-proprietary File Format
Non-proprietary file formats are often signified as “open” file formats. They can be opened and edited by various software and are free to be used by everybody. In a non-proprietary file format, data is ordered and stored according to an open and well-documented encoding scheme. In contrast, proprietary file formats can be opened and complete functionality is guaranteed only when using the specific application available from the commercial producer that must be employed under a certain licensing agreement. For that reason non-proprietary file formats must be employed explicitly to make data reusable in the long run.
Open Access (OA)
Open Access publications mean “literature […] which scholars give to the world without expectation of payment. […]. By ’open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself.”26
Open Research Data (ORD)
“Open Research Data (ORD) are Research Data that are FAIR and, in addition, publicly available, accessible and reusable at no cost. […] ORD [can be understood] as an ideal to be strived for that might not be achievable in all cases”27
Open Researcher and Contributor Identifier (ORCID)
The “ORCID iD is a unique, open digital identifier that distinguishes you from every other researcher with the same or a similar name to you.
Anyone who participates in research, scholarship, or innovation can register an ORCID iD for themselves free of charge, and you can use the same iD throughout your whole career -- even if your name changes or you move to a different organization, discipline, or country.”28
Persistent Identifier (PID)
“A persistent identifier is a long-lasting reference to a digital resource.
An identifier is a label which gives a unique name to an entity: a person, place, or thing. Unlike URLs, which may break, a persistent identifier reliably points to a digital entity.”29 For example, an ORCID iD or a DOI are persistent identifiers for a person or for a digital resource, respectively.
Personal data is defined as “all information relating to an identified or identifiable natural person”.30 “A person is identifiable if a third-party having access to the data of the person is able to identify such person with reasonable effort.”31Such natural persons "can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person."32
In the context of research data management “programming code signifies machine-readable instructions created in the context of a research project, for example, to analyse research data, to reproduce research findings from a given data set or to perform experiments.”33
"Pseudonymisation means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”34 Pseudonymisation is different from anonymisation.
Public Domain Dedication (CC0)
A Public Domain Dedication can be expressed as CC0, a special statement within the Creative Commons System: “The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighbouring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.”35
Research data comprise every data type that arises during planning, performance and documentation of scientific work and form the basis for new findings and conclusions. Types of research data can vary greatly and depend on the research field. Note, however, that guidelines like the Guidelines for Research Data Management at ETH Zurich36 may refer specifically to only part of such data, e.g., only to data that are relevant for reproducing certain results.
Research Data Management (RDM)
“Research Data Management [….] is the systematic, transparent, and adequate handling of data across the complete data lifecycle using recognised services and infrastructures. Research Data Management planning includes measures to be undertaken at every step of the data lifecycle to ensure transparency and reproducibility of research. Data management planning starts from the creation and/or collection of data, followed by their analysis, publication, storage, sharing, preservation and reuse.”37
Research Data Management Strategy
Defined conventions and common practices of research data management in a research group or across several groups can be compiled in a research data management strategy. The rules and practices specified therein synergise activities involved and facilitate collaboration among members which will eventually advance research endeavours.
You can find instructions and an exemplary template for drafting your data management strategy for your research group and beyond here.
In the context of research data, reusability is part of the FAIR principles. It “means that the data are well documented and curated and provide rich information about the context of data creation. The data should conform to community standards and include clear terms and conditions on how the data may be accessed and reused, preferably by applying machine-readable standard licences. This allows others either to assess and validate the results of the original study, thus ensuring data reproducibility, or to design new projects based on the original results, in other words data reuse in the stricter sense.”38
Sensitive data are data that might cause serious harm when they fall into the hands of unauthorized persons. Regarding research data, sensitive data include but are not limited to sensitive personal data (e.g., biomedical patient data), research plans, contract agreements or geolocation data. Sensitive data must usually be classified as CONFIDENTIAL or STRICTLY CONFIDENTIAL at ETH Zurich.39 Sensitive data requires special cybersecurity measures to protect the confidentiality, integrity and availability of data and to protect the privacy of individuals. ETH Zurich offers the Leonhard Med Secure Scientific IT Platform for handling sensitive data.40
Sensitive Personal Data
Sensitive personal data include data about: religious, ideological, political or trade union-related views or activities, health, the intimate sphere or the racial origin, social security measures, administrative or criminal proceedings and sanctions that can be related to an identified or identifiable natural person.41
With regard to research data the terms “software” and “programming code” are used similarly. However, there is a tendency to use “software” when a level of intellectual property is involved that might justify more explicit licensing and/or a registration with your institution’s technology transfer office. Moreover, software typically refers to an end product that is explicitly released to a wider audience.
Indication of source
2Adapted from https://forschungsdaten.info/praxis-kompakt/english-pages/glossary/ (accessed in October 2022)
3Ordinance on Human Research with the Exception of Clinical Trials (810.301), Article 25, https://www.fedlex.admin.ch/eli/cc/2013/642/en (as of 26 May 2021)
4Guidelines for Research Data Management at ETH Zurich (RDM Guidelines, RSETHZ 414.2), https://rechtssammlung.sp.ethz.ch/Dokumente/414.2en.pdf (as of 1 July 2022)
5Federal Act on Copyright and Related Rights (231.1), Article 3, https://www.fedlex.admin.ch/eli/cc/1993/1798_1798_1798/en (as of 1 January 2022)
9Dierkes, J. 2021. 4.1 Planung, Beschreibung und Dokumentation von Forschungsdaten. In: Putnings, M., Neuroth, H. and Neumann, J. ed. Praxishandbuch Forschungsdatenmanagement. Berlin, Boston: De Gruyter Saur, pp. 303-326. https://doi.org/10.1515/9783110657807-018
10Adapted from: https://ec.europa.eu/info/law/law-topic/data-protection/reform/rules-business-and-organisations/principles-gdpr/how-much-data-can-be-collected_en (accessed 27 October 2022)
12Mijke Jetten, Marjan Grootveld, Annemie Mordant, Mascha Jansen, Margreet Bloemers, Margriet Miedema, & Celia W.G. van Gelder. (2021). Professionalising data stewardship in the Netherlands. Competences, training and education. Dutch roadmap towards national implementation of FAIR data stewardship (1.1). Zenodo. https://doi.org/10.5281/zenodo.4623713
15Wilkinson, Mark et al. 2016. “The FAIR Guiding Principles for scientific data management and stewardship.” Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. See also http://www.go-fair.org/fair-principles and for a more comprehensive description please visit https://howtofair.dk/what-is-fair.
17ETH Zurich Guidelines on scientific integrity (Integrity Guidelines, RSETHZ 414), https://rechtssammlung.sp.ethz.ch/Dokumente/414en.pdf (as of 1 January 2022)
26Adapted from: https://oa100.snf.ch/en/context/open-access/definition/ (accessed 15 September 2022)
28Adapted from: https://support.orcid.org/hc/en-us/articles/360006897334-What-is-my-ORCID-iD-and-how-should-I-use-it- (accessed 15 September 2022)
29Adapted from: https://support.orcid.org/hc/en-us/articles/360006971013-What-are-Persistent-identifiers-PIDs- (accessed 15 September 2022). Please visit also https://ardc.edu.au/resources/working-with-data/citation-identifiers/ (accessed 15 September 2022) for additional information about the different types of PIDs.
31ETH Zurich Legal Department, Factsheet “Data Protection in Research Projects”, https://ethz.ch/content/dam/ethz/associates/services/organisation/Schulleitung/Generalsekretariat/dokumente_rechtsdienst/Dataprotection_Research_Final.pdf (as of December 2019)
35Adapted from: https://creativecommons.org/publicdomain/zero/1.0/ (accessed 22 September 2022)
39Directive on “Information Security at ETH Zurich (RSETHZ 203.25), https://rechtssammlung.sp.ethz.ch/Dokumente/203.25en.pdf (as of 1 August 2021)
40For guidelines on Leonhard Med please regard "The Acceptable Use Policy (AUP)of the Leonhard Med Secure Scientific IT Platform", https://rechtssammlung.sp.ethz.ch/Dokumente/438.1.pdf (as of 1 August 2022)