
Dr. Domenico Giusti
Paläoanthropologie, Senckenberg Centre for Human Evolution and Palaeoenvironment
Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)
"In 2009 open data started to become visible in the mainstream, with various governments (such as the USA, UK, Canada) announcing new initiatives towards opening up their public information." The Open Data Handbook
EU countries followed up soon after by opening their public data (DE, IT, etc...).
"The European data strategy aims to make the EU a leader in a data-driven society. Creating a single market for data will allow it to flow freely within the EU and across sectors for the benefit of businesses, researchers and public administrations".
Access to data and the ability to use it are essential for innovation and growth.
"Open research data is data that can be freely accessed, reused, remixed and redistributed, for academic research and teaching purposes and beyond. Ideally, open data have no restrictions on reuse or redistribution, and are appropriately licensed as such. In exceptional cases, e.g. to protect the identity of human subjects, special or limited restrictions of access are set. Openly sharing data exposes it to inspection, forming the basis for research verification and reproducibility, and opens up a pathway to wider collaboration".
NOTE: (meta)data is used to refers to both metadata and data
NOTE: (meta)data is used to refers to both metadata and data
Metadata provide a basic description of the data, often including authorship, dates, title, abstract, keywords, and license information. They serve first and foremost the findability of data (e.g. creator, time period, geographic location)
Data outlive its original context - Limitations of data may be obvious within their original context, such as a library catalog, but may not be evident once data is divorced from the application it was created for.
Data cannot stand alone - Information about the context and provenance of the data - how and why it was created, what real-world objects and concepts it represents, the constraints on values - is necessary to helping consumers interpret it responsibly.
Structuring metadata about datasets in a standard, machine-readable way encourages the promotion, shareability, and reuse of data.
DOI - "The Digital Object Identifier (DOI) system provides a technical and social infrastructure for the registration and use of persistent interoperable identifiers, called DOIs, for use on digital networks. [...] Although originating in text publishing, the DOI was conceived as a generic framework for managing identification of content over digital networks, recognising the trend towards digital convergence and multimedia availability." DOI Handbook
Dublin Core - A set of fifteen "core" elements (properties) for describing digital resources (video, images, web pages, etc.) as well as physical resources such as books or works of art.
Darwin Core - An extension of Dublin Core for biodiversity informatics
ISO 19136-1:2020 - The Geography Markup Language (GML), defined by the Open Geospatial Consortium express geographic information.
★ | make your stuff available on the web (whatever format) under an open license |
★★ | make it available as structured data (e.g. Excel instead of image scan of a table) |
★★★ | make it available in a non-proprietary open format (e.g. CSV instead of Excel) |
★★★★ | use URLs to identify things, so that people can point at your stuff |
★★★★★ | link your data to other people’s data to provide context |
The Semantic Web is an extension of the World Wide Web with the aim to make published information and data machine-readable. The ultimate goal is to enable computers to better manipulate information and make meaningful interpretations. For the Semantic Web to function, computers must have access to LOD, structured data modeled as a graph and published in a way that allows interlinking across servers.
P.S. The Semantic Web is widely used in the development of knowledge graphs in different domains, science included
Sensitive data - Many fields of scientific disciplines involve working with sensitive personal data. Their management is well regulated in data protection legislation (in Europe through national implementations of the General Data Protection Regulation) and ethics procedures as they are established in most research institutions /// Sensitive cultural heritage data
Intellectual Property (IP) - "a legal term that refers to creations of the mind. Examples of intellectual property include music, literature, and other artistic works; discoveries and inventions; and phrases, symbols, and designs". Open Research Glossary
Intellectual Property Rights (IPR) - "the rights given to the owners of intellectual property. IPR is protected either automatically (eg copyright, design rights) or by registering or applying for it (eg trademarks, patents). Protecting your intellectual property makes it easier to take legal action against anyone who steals or copies it. IPR can be legally sold, assigned or licenced by the creator to other parties, or joint-owned". Open Research Glossary
With an appropriate data management plan much sensitive and proprietary data can be FAIRly shared and reused.
The metadata can almost always be shared.
Research data are often the most valuable output of many research projects, they are used as primary sources that underpin scientific research and enable derivation of theoretical or applied findings.
"In order to make findings/studies replicable, or at least reproducible or reusable in any other way, the best practice recommendation for research data is to be as open and FAIR as possible, while accounting for ethical, commercial and privacy constraints with sensitive data or proprietary data." The Open Science Training Handbook
This image (and the previous one) was created by Scriberia for The Turing Way community and is used under a CC-BY licence
PRO: Minor effort for the individual researchers CONS: Supplementary material is available to subscribers to the journal or, if the article or journal is open access, to everyone. Publishers own the data.
PRO: Even less effort CONS: Data availability linked to the website life
In order of preference:
Develop a trusted, virtual, federated environment that cuts across borders and scientific disciplines to store, share, process and re-use research digital objects (like publications, data, and software) following FAIR principles
"Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse" Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014
Data citation PRINCIPLES: Importance, Credit and Attribution, Evidence, Unique Identification, Access, Persistence, Specificity and Verifiability, Interoperability and Flexibility.
The data citation advantage is a tangible benefit to researchers who share data with publications, although the magnitude of the advantage varies greatly in different research areas.
Data sharing is unfunded, unrewarded, and only rarely required.
Is it sufficient to make my data openly available?
No—openness is a necessary but not sufficient condition for maximum reuse. Data have to be FAIR in addition to open.
What do the FAIR principles mean/imply for different stakeholders/audiences?
Researchers may be reluctant to share their data because they are afraid that others will reuse them before they have extracted the maximum usage from them, or that others might not fully understand the data and therefore mis-use them. You may publish your data to make them findable with metadata, but set an embargo period on the data to make sure that you can publish your own article(s) first.
Is making my data FAIR a lot of extra work?
Not necessarily! Making data FAIR is not only the responsibility of the individual researchers but of the whole group. The best way to ensure that your data is FAIR is to create a Data Management Plan and plan everything beforehand. During the data collection and data processing follow the discipline standards and measures recommended by a repository.
I want to share my data. How should I license them?
First of all think about who owns the data? A research funder or an institution that you work for. Then, think about authorship. Applying a suitable license to your data is crucial in order to make them reusable.
I cannot make my data directly available—they are too large to share conveniently / have restrictions related to privacy issues. What should I do?
You should talk to experts in domain specific repositories on how to provide sufficient instructions to make your data findable and accessible.
"Similarly, some archaeologists may fear the limitations to publication potential that could result from others using their open data and code, the possibility that their materials may be used without citation, and the risk that competitors may gain an advantage. Our view is that these risks have always been present in the traditional research practices of scholarly communication and peer review, and that open science licensing and citation practices effectively mitigate them. Moreover, because sharing of data and code enables and encourages collaborative research, more open science practices can even increase the potential for new research (and publications) with extant data — an important benefit to junior researchers in particular". Marwick et al. 2017
Have you ever requested, for your own research, access to some data and received an unjustified negative answer?