I love PIDs - and so should you! - ODISSEI – Open Data Infrastructure for Social Science and Economic Innovations

By Ricarda Braukmann (DANS & ODISSEI FAIR Support team)

Imagine you are reading an interesting article online and the article refers to a paper that was written a few years ago that sounds very relevant for your current work. Wanting to know more, you click on the link to the paper, only to see “404 Not Found” appear on your screen. This is not only super annoying, it is also a waste of resources as the information in the paper is now forever lost. Persistent Identifiers or PIDs aim to solve this and other problems with identifying and accessing digital resources.

What are PIDs?

A Persistent Identifier or PID is “a long-lasting reference to a document, file, web page, or other object” (Wikipedia, 2021). PIDs are typically used for digital objects and in this post we are discussing them in the context of scientific resources. In addition to being long-lasting, a PID should also uniquely identify a given (digital) object and it should be actionable. This means that you can plug the PID into a web browser and it will take you to the content the PID refers to.

The most well known example of a PID is the DOI (Digital Object Identifier) that is for instance assigned to your scientific paper when it’s published in a scientific journal. The video below, created by the FREYA project – an EC-funded project which worked on developing the PID infrastructure in Europe – explains the concept of PIDs in a bit more detail.

https://doi.org/10.5281/zenodo.3958881

PIDs are powerful

As the video illustrates, PIDs can be used to uniquely identify different entities in the scientific world and by linking PIDs to each other a lot of information can be connected. This allows you, for instance, to find a publication from a specific author and get information about which grant this publication was funded by, and which collaborators were involved with it for free.

With great power comes great responsibility

PIDs thus have a huge potential and play a crucial role in making science FAIR – Findable, Accessible, Interoperable and Reusable (Wilkinson et al., 2016). However, for PIDs to actually work as intended, technical infrastructure and policies need to be in place. For example, if an institution decides to assign a PID to their dataset, they need to take care that the information about the actual location of the dataset on the web is updated with the PID provider who manages the resolution of the PIDs in case the dataset is ever moved to a different domain. The PID providers, on the other hand, need to have sustainable business models to allow their services to be usable in the future. This can be challenging in a world where short-term project-based financing is common, often at the expense of maintaining existing infrastructure.

Researchers and other users also play a crucial role in the success of PIDs. A researcher who registers with ORCID can, for instance, massively increase the value of this PID by regularly updating their profile, by allowing the linkage with other PIDs or by manually linking information which may be missing. PIDs also need to be actually used when we refer to digital information. While it’s very tempting to simply copy the URL of your paper from the browser (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7815961/) that link may be broken in a few years, while the DOI of the same paper (e.g. https://doi.org/10.1016%2Fj.patter.2020.100180) should still lead to the actual paper.

Which PIDs should I know?

As discussed above there are many different PIDs and PID providers and it can be difficult to understand which PIDs are relevant for your own work. Below, we give an overview of some key entities that are important in the research landscape and the PIDs that exist or are being developed. A more comprehensive overview is listed in a paper by Cousin and colleagues (2021) who provide an overview of PIDs for various different scientific entities and discuss how advanced their developments are.
What is good to realise is that there are many services you can use to get PIDs for your own scientific output. Publishers assign PIDs to your publications and data archives – like DANS – assign PIDs to your datasets. There are generic repositories like Zenodo that you can use to get a PID for your presentations and supported research documentation. Our FAIR Support website guides you to find the right place for the different scientific outputs.

Publications

Scientific publications typically receive a DOI (Digital Object Identifier). DOIs are based on the handle system. Most publishers are members of Crossref, one of the PID providers that allows publishers to register DOIs for their content.

A DOI has always the same format consisting of a prefix and a suffix. “The prefix identifies the registrant of the identifier and the suffix is chosen by the registrant and identifies the specific object associated with that DOI.” (Wikipedia, 2022)

Here are two example DOIs:

PREFIX/ SUFFIX
10.5281/zenodo.3958881
10.17026/dans-28d-rgjb

All DOI can be resolved through the resolver at https://www.doi.org/, by adding https://doi.org/ in front of the DOI you create a resolvable link that takes you to the content.

https://doi.org/10.5281/zenodo.3958881
https://doi.org/10.17026/dans-28d-rgjb

Datasets

Like for publications, DOIs are the most common PID used for datasets. Many repositories , like for instance DANS, are members of DataCite, one of the PID providers that allows repositories to register DOIs for Datasets.

There are also organisations, like for instance SURF, that use other PIDs for their datasets – a common being handles, for instance provided through the European Persistent Identifier Consortium (ePIC).

Handles also consist of a prefix and a suffix and can be resolved through the resolver at http://hdl.handle.net/:

http://hdl.handle.net/11304/9fb5e092-7018-11e4-ac7e-860aa0063d1f

People

Different identifiers have been used to identify people, but ORCID has become the most common PID for researchers. ORCIDs has become more and more established in the past years and is included in the PID roadmap of NWO to foster further implementation (Cruz & Tatum, 2021).

In contrast to DOIs and handles discussed above which are assigned to publications and datasets by others, researchers themselves can register with ORCID. If you don’t have an ORCID yet, it’s highly recommended that you register and start using it to link all your data and publications!

This video explains gives you more information about ORCID and how to register and update your profile.

An ORCID ID is a string of numbers looking this for example:

0000-0001-6383-7148

It can be resolved through the resolver at https://orcid.org/:

https://orcid.org/0000-0001-6383-7148

Organisations

There are different identifiers for organisations in use with ROR (Research Organisation Registry) being a relatively new not-for-profit PID registry (Cruz & Tatum,2021). Many organisations, including ODISSEI itself, have been assigned a ROR. Maybe you noticed that the links to organisations mentioned in this text don’t link directly to their website but rather to their entry in ROR. However, the organisation is still under development and not as well established as the DOIs for publications and datasets.

Grants

This is another emerging PID. Crossref has been working on the development of Crossref Grant DOIs. These PIDs are listed in the NWO PID Strategy as something to be further implemented for the Dutch scientific community. Having a PID for grants will make it much easier to provide your funder with an overview of the projects’ results and its impact.

Want to know more?

If you would like more information about PIDs and their use cases, you can check out the PID Forum at pidforum.org. The Knowledge Hub thread on the PID Forum has a list of learning materials and the forum features lots of topics going deeper into the topic of PIDs.

Reach out to ODISSEI FAIR Data Support Team for more information.

References

Cruz, M, & Tatum, C. (2021). NWO Persistent Identifier Strategy. Zenodo. https://doi.org/10.5281/zenodo.4674513

Cousijn H, Braukmann R, Fenner M, Ferguson C, van Horik R, Lammey R, Meadows A, Lambert S. Connected Research: The Potential of the PID Graph. Patterns (N Y). 2021 Jan 8;2(1):100180. doi: https://doi.org/10.1016/j.patter.2020.100180 PMID: 33511369; PMCID: PMC7815961

Wikipedia (2021). Persistent Identifier. https://en.wikipedia.org/wiki/Persistent_identifier

Wikipedia (2022). Digital Object Identifier. https://en.wikipedia.org/wiki/Digital_object_identifier

Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

Photo by Alex Shuper