The ODISSEI Portal combines metadata from a wide variety of research data repositories into a single interface, allows advanced semantic queries to support findability, and facilitates data access.
The ODISSEI Portal’s metadata catalogue will extend the coverage of available data by including key research datasets that are currently not findable via NARCIS (the main national portal for information about researchers and their work). The project will extend the current catalogue with the metadata of (a) all datasets of Statistics Netherlands, including the metadata of the microdata catalogue, (b) all datasets developed in the ODISSEI Laboratory (LISS), and (c) all datasets developed in the ODISSEI Observatory (EVS, GGP, SHARE, ESS, NTR, HSN).
This dataset extension task is a joint effort of trained data stewards at DANS, the ODISSEI team of Data Scouts at the Observatory, and the aforementioned data repositories. In collaboration with these partners and experts at VU Amsterdam, this task will also develop a metadata ingestion pipeline to make sure that the Portal can be maintained and kept up to date. This pipeline will be used by the hosting party to add new datasets during and after the end of the project. All new datasets that will become available via the ODISSEI Portal will also be added to the national NARCIS research dataset catalogue.
Currently, existing tools for findability in the social sciences are limited in that they only identify specific terms or synonyms (e.g. United Kingdom question bank or the Question Variable Database). The ODISSEI Portal will extend and improve search functionality by using semantic queries which will enable broader probabilistic matching and link functions over an enriched knowledge graph representation of the FAIR metadata catalogue. This incorporates the context of specific terms which are crucial in social research. By using rich and extensible data structures developed within the linked data community, ODISSEI will evolve the relatively flat metadata catalogues in use today into the highly interlinked and graph-based structures needed to conduct advanced semantic searches. The richness in the metadata catalogue does not solely rely on the high manual documentation and curation standards that already exist across ODISSEI associated data such as DDI. The social sciences are fortunate to have an advanced automated metadata capture system that documents the data collection process, principally through survey software. Besides the manual documentation and curation standards, it takes advantage of automatic and semi-automatic metadata enrichment and entity linking to enrich the available curated information. Where relevant, there will be alignment with standards used in CESSDA (CESSDA Metadata Model) at the European level and the Open Science Framework.
The Portal also facilitates automatic and semi-automatic data access policy management between the producers and users of research datasets. Unclear data licensing or access policies are currently an obstacle in open science and the application of the FAIR principles, even for research datasets that are available as open data. ODISSEI will enrich its research data catalogue with explicit, and as detailed as possible information on licensing and access policies, preferably in a machine-readable format. The owners of each dataset will be able to provide the Portal with metadata describing what the policy for obtaining access entails. The access process varies between data providers: Statistics Netherlands requests that the user is affiliated with an authorised research institute and using their data involves formalities and costs, whereas other research data are often freely available for download to anyone around the globe. For datasets with machine-readable access policy metadata, the ODISSEI Data Node, an automated system that is closely connected to the Portal, will be able to facilitate the researcher, for example by sending data access request to the data owner, by initiating a federated authentication session, or by redirecting researchers to the landing page of the open dataset. In case a dataset does not yet have fully machine-readable access policy metadata, the ODISSEI Data Steward based at EUR will help the data owner and researcher with the access process.
Once the data owner reaches an agreement with the researchers, the owner allows the ODISSEI Data Node to transfer the data to the designated analysis environment, typically the ODISSEI Secure Supercomputer (in case of large, complex or sensitive data) or the computer of the researcher (in case of small and/or open data).
Status: In September 2022, a first version of the Portal was launched. It contains the metadata of over 4000 datasets, including those from the CBS Microdata catalogue, the LISS panel from Centerdata, and social science data deposted at DANS. The portal will be furter developed until 2024, by enriching data with existing word lists and linked open data, to further improve the findability. We will also add additional metadata from a large amount of providers, with the goal of making all Dutch social science metadata available in one interface. In addition, the data access broker will be added, which allows researchers to request data access to multiple data owners at once.
Project team ODISSEI Portal:
- VU: Jacco van Ossenbruggen (Task Leader), Elena Beretta, Margherita Martorana, Ronald Siebes
- DANS: Ricarda Braukmann, Wim Hugo, Thomas van Erven, Fjodor van Rijsselberg, Michiel Zuidema, Slava Tykhonov, Eko Indarto, Laura Huis in ’t Veld
- SURF: Jorik Van Kemenade, Freek Dijkstra
Questions regarding the ODISSEI Portal? Contact Lucas van der Meer (ODISSEI Coordination Team).