Collecting vocabularies in the social sciences: the Awesome Ontologies for the Social Sciences - ODISSEI – Open Data Infrastructure for Social Science and Economic Innovations

Written by Angelica Maineri (FAIR Expertise Hub, ODISSEI FAIR support team) and Emilio Cammarata (ODISSEI FAIR support team).

Introduction

The “I” in FAIR stands for Interoperability, and refers to the ability to integrate data with other data, and with applications. To this end, the FAIR principles, in particular sub principle I2, encourage the use of semantic artefacts or vocabularies, namely curated collections of terms and relationships among them. By using vocabularies when describing data, not only a human but also a machine should be able to grasp a basic understanding of what the data represents. There are different flavours of such vocabularies: in a previous short article from the ODISSEI FAIR support series [1], we went through the concepts of controlled vocabularies, taxonomies and thesauri. In those vocabularies, the relationships between terms are quite basic (e.g. “is equivalent to”, “is broader than”), but there are situations in which more complex relationships between entities need to be specified: for this, ontologies can be used (for a more extensive discussion, see [2]).
Arguably, the social sciences have been slower than other fields at creating vocabularies. This is due to different reasons, including a slow uptake of Linked Data approaches and the inherent complexity of describing human behaviour using finite sets of terms and relationships between terms. Even when the vocabularies exist, they are often difficult to find and reuse. Inspired by CLARIAH’s Awesome Ontologies for Digital Humanities, the FAIR Expertise Hub, in collaboration with the ODISSEI FAIR support desk, started compiling a list of vocabulary resources, the Awesome ontologies for the social sciences, and made it openly available via a GitHub repository. The aim of the list is to offer a starter kit to anyone interested in (re)using semantic artefacts (i.e. ontologies, controlled vocabularies, thesauri, and taxonomies) in a social science context, and hopefully inspire them to share any additional resource they may deem useful with the rest of the community. In this short article, we explain what “Awesome” means, describe the resource, and outline different options to provide feedback and contribute to the list.

What is an awesome list?

Before describing the repository, it may be helpful to explain why we called it “awesome”. Awesome lists are a widespread format for sharing collaborative lists of resources – on pretty much anything! Check out, for instance, the Awesome Computational Social Science for resources (e.g. books, conferences, research groups) on computational social sciences, the Awesome Digital Humanities for tools, resources, and services supporting scholars in the field of Digital Humanities, but also Awesome-SciFi for Science Fiction novels and tv series, and the Awesome Slack, for Slack resources and tools.

Awesome lists are a format, started by Sindre Sorhus, of collaborative, curated lists of things that comply with the following stylistic guidelines (see more details here):

The list includes comments on why something is awesome – that is, why the item is “worthy” of being in the list;
It is clear what the list is about;
The list is regularly checked for grammar mistakes/typos;
A licence (ideally, an open licence) is attached to the repository, detailing what can be done with the list; remember: no licence means others are not allowed to reproduce, distribute or create derivative works;
The project includes contribution guidelines;
The list is stylized properly (e.g. headers, lists, etc);
Be open to other people’s opinions on the list.

The Awesome Ontologies for the Social Sciences

The Awesome Ontologies for the Social Sciences, which can be found here, is a list in a GitHub repository enumerating ontologies, vocabularies and thesauri that are relevant for social sciences. The resources are organized under headers. Under “Concepts and variables” you can, for instance, find vocabularies that represent entities such as social science concepts (e.g. employment, discrimination, trust- or family relationships). Under “International standards” we collect controlled lists of socio-demographic, socio-economic and geographic indicators. Each listed resource includes a link, a brief description and, whenever possible, a DOI to the resource on FAIRsharing (see [3] and [4]). One of the sections is dedicated to look up services, which can help you retrieve even more structured vocabularies.
The list is meant to be a living document and change over time: more resources can be added, categories can be refined, outdated resources should be removed. However, we don’t plan to do this on our own: the repository is publicly accessible and amendable, allowing everyone not only to use it but also to contribute to it (see contribution guidelines) and keep it up to date. The list is available under a CC0 (Public domain) license, meaning anyone is free to reuse, redistribute and modify the list, without any obligation to cite the source. However, please be careful that other licenses may apply to the listed resources, and you should always check the license at the source (we add the information in the description, when readily available).

Why use the list?

There are many reasons to consult the list, but let us bring two very simple examples. For instance, a researcher in the process of cleaning and preparing a dataset for publication may want to choose code and labels for the variable “countries” in a way that makes it easy to match with other data sources and prevents spelling mistakes. In this case, the Awesome Ontologies for the Social Sciences list may help discover controlled lists of countries, such as this one maintained by CESSDA and based on ISO-3166. Another use case could be a data steward working on a data model, i.e. a blueprint for the structure of some data, in the field of family sociology. They might find “PersonLink”, a multilingual ontology of family relationships, useful to specify the possible links among people in a household.

How to use

You can use the list simply by checking the README file on the GitHub repository; to do this, it is not necessary to have a GitHub account. The table of contents is meant to facilitate the navigation across the headers, and a simple “Find” function from the internet browser can be useful if you have specific keywords or features in mind (e.g. ‘multilingual’ or ‘thesaurus’). If you find any problems or have suggestions to improve the usability of the list, please also check the “How to contribute” paragraph below.

How to contribute

For those who wish to contribute to the list, for instance by adding a new item to the list, proposing a different header, fixing a typo, or removing an obsolete resource, the Contribution Guidelines, outline three different options.

If you have an account on GitHub and are familiar with its functioning, it is possible to submit a pull request following some simple formatting rules (e.g. using British spelling and adding the FAIRsharing DOI at the end of an item, if available).
If you have an account on GitHub, it is possible to open an issue (or comment under an existing one), and submit a request for an addition or change. There is a template available for opening an issue on proposing a new resource to be added to the list.
If you don’t have a GitHub account but want to contribute you can also send an email to the creator of the repository (namely, Angelica Maineri).

Once you contributed to the list, feel free to add your name to the list of contributors (or indicate in the email whether you wish your effort to be recognised there). Note that we adopted the Contributor Covenant Code of Conduct: we hence ask you to be respectful of other people’s opinions, and be constructive in your feedback and criticism.

References

[1] Maineri, Angelica Maria. (2022). Controlled vocabularies for the social sciences: what they are, and why we need them. Zenodo. https://doi.org/10.5281/zenodo.7157800

[2] Guizzardi, Giancarlo. (2020). Ontology, Ontologies and the “I” of FAIR. Data Intelligence, 2(1–2), 181–191. https://doi.org/10.1162/dint_a_00040

[3] Braukmann, Ricarda. (2022). I love PIDs – and so should you. Zenodo. https://doi.org/10.5281/zenodo.7304703

[4] Morselli, Francesca. (2023). FAIRsharing for the social sciences. What’s in it for me?. Zenodo. https://doi.org/10.5281/zenodo.7598374

Photo by Edho Pratama on Unsplash with Awesome logo.