Written by Lieke de Boer (eScience Center), Emilio Cammarata (ODISSEI FAIR support), Carlos Martinez Ortiz (eScience Center), Angelica Maineri (ODISSEI FAIR Support)
Introduction
In 2010, Carmen Reinhart and Kenneth Rogoff published a paper [1] showing that high levels of government debt, relative to Gross Domestic Product (GDP), significantly reduces a country’s economic growth. The paper had an impact on financial policy decisions around the world. For example, UK Chancellor George Osborne argued in favour of austerity measures in response to the perceived risk that high levels of debt would have on growth. But a few years later, as reported by news outlets, a graduate student found an error in the analysis that Reinhart and Rogoff had done: a faulty Excel formula led to an underestimation of the average growth rate of countries with high debt-to-GDP ratios. Once the error was corrected, the results of the paper were found to be less compelling: instead of a 0.1% shrinkage, economies of these countries actually seemed to grow by 2.2%, putting them much less further behind countries with lower debt-to-GDP ratios. The discovery of this mistake led some policymakers to question austerity measures that had been implemented based on the paper’s findings.
The Reinhart and Rogoff incident is an example of how errors in data analysis can have real-world consequences. Reinhart & Rogoff did not openly share the software they used to analyse their results. They also did not use software that was easy to reuse. Software is an integral part of the research pipeline, and it requires appropriate management to enable reproducibility and reuse – just like data. In this short article, we describe the FAIR (Findability, Accessibility, Interoperability, Reusability) principles for research software and provide practical recommendations for social scientists.
Research Software in the Social Sciences
When we say “software”, we mean the complete set of instructions that control the operations of a computer. It is the variable part of a computer, while the hardware is the stable (and physical) part. In this context, we can see data as the third component of an information system: it is the information a computer processes, a relatively stable but not physical part of the system.
Research software represents the set of operations that describe the steps taken to collect and manipulate raw data into an end result, written in an informatic language such as Python, R, STATA or many others (see here for a non-exhaustive list). This includes source code files, workflows, scripts and notebooks [2]. In the case of Reinhart and Rogoff, their research software was the Excel formula that performed calculations on different cells of the spreadsheet.
In research, software is used at different stages of the research process: data collection, processing (management of data), analysis and visualisation. Nowadays, many social scientists do not just use Excel formulas anymore: through the use of software packages and scripting languages, researchers in the social sciences have been able to handle complex analyses on large volumes of data. As in astronomy research software can be used to process large volumes of data coming from telescopes, in the social sciences they can be helpful in processing large volumes of data concerning societal, behavioural or sociological phenomena.
FAIR software
Examples such as the one above by Reinhart and Rogoff illustrate how important it is to be able to retrace and reproduce the steps undertaken to conduct research. In support of this vision, the Findability, Accessibility, Interoperability and Reusability (FAIR) principles have been proposed as a guide for good data stewardship as a means to improve transparency and trust in scientific results [3]. While the principles were meant to be broadly applicable to all digital research objects (e.g. data, software, workflows, etc.), their wording initially put the emphasis mostly on data. Starting from 2019, the FAIR for Research Software Working Group has worked on formalising the FAIR principles for Research Software (FAIR4RS). The FAIR4RS overarching principles are described in [4] and [5].
Let’s imagine how Reinhart and Rogoff would conduct their study in 2023, following the FAIR4RS principles, and using current best practices. Excel files with Macros present some issues for reproducibility, requiring a Microsoft Office licence and mixing the data and the analysis in the same file. For the sake of the following examples, we will assume that Reinhart and Rogoff would separate their data from their research software, using a CSV file for their data, and an R script for their analysis.
F: Software, and its associated metadata, is easy for both humans and machines to find.
To make their software findable, Reinhart and Rogoff could have shared their R scripts used to clean and analyse the data, and documented the different versions and updates on GitHub or other platform for code sharing .
A: Software, and its metadata, is retrievable via standardised protocols.
By writing their analysis using an open source programming language, and sharing their code in a code sharing platform, Reinhart and Rogoff would already make their software accessible. By documenting the code properly (including dependencies, authorship information, licence, etc), they would be including the required metadata.
I: Software interoperates with other software by exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs), described through standards.
In FAIR4RS, interoperability is defined as the ability for two independent pieces of software to exchange data. By providing their data as CSV (which is a format that can be read by many analysis tools), and record the analysis scripts in, for example R, Reinhart and Rogoff already make their software interoperable: the same scripts could be reused easily by another researcher to analyse a different data set or to extend their analysis.
R: Software is both usable (can be executed) and reusable (can be understood, modified, built upon, or incorporated into other software).
To ensure their software is reusable, Reinhart and Rogoff could have described each step of the data processing and analysis with rich documentation to make it understandable, and assigned a clear licence which specifies what can and cannot be done with the software.
To summarise
Some advantages of making research software FAIR include:
- Acceleration of scientific discovery: making software FAIR helps future researchers (but also your future self!) to build on each others’ work rather than reinventing the wheel.
- Increased impact: preparing software for reuse often includes making it available via a suitable repository and, as a consequence, making it a reusable and citable output of your work.
- Increased transparency: Even if you mostly work with sensitive data, it can be worth making sure that your software adheres to the FAIR principles: Others with similar data or peer reviewers may want to inspect your analysis to see exactly how you have come to your conclusions.
- Long-term sustainability: by making your software FAIR, you make it easier for others to interact with your code and keep it updated and functioning in the long run.
Recommendations
To start FAIRifying your research software, there are some very specific things you can incorporate in your research workflow:
- Write a Software Management Plan (SMP) to incorporate FAIR principles and good software management practices early in the research process, and perhaps with the help of an expert who can guide you to the process (see [2] for more guidance and templates).
- Make your software citable, for instance by linking your GitHub repository to a public repository on OSF or Zenodo to assign a DOI, and then create a CITATION.cff file in your repository (you can read more here).
- Make your software available! There are many repositories available, depending on your needs and requirements, such as Zenodo, OSF, or the Research Software Directory. Even if the underlying data cannot be accessed, it is still good practice to publish the code: see a tutorial here;
- In situations in which the underlying data can be shared, materials can be submitted to Reprohack – a platform where participants run each other’s code to ensure full reproducibility. Reprohack can help to make sure that the software is executed smoothly and that the results are reproduced.
- Add rich comments and explanations to describe the steps of your piece of code and help others reuse it or even adapt it to their needs.
- Assign a licence specifying what can(not) be done with your software. If you need guidance, check out choosealicense.com.
- Check the relevant links below for more resources!
The journey towards FAIR software can be overwhelming, but do not worry: there is help along the way. The ODISSEI SoDa team is available for consultation, and once a year the ODISSEI-eScience grant call is published to provide in-kind support to execute innovative social science research projects with the help of research software engineers.
References
[1] Reinhart, C. M., & Rogoff, K. S. (2010). Growth in a Time of Debt. American Economic Review, 100(2), 573–578. https://doi.org/10.1257/aer.100.2.573
[2] Martinez-Ortiz, Carlos, Martinez Lavanchy, Paula, Sesink, Laurents, Olivier, Brett G., Meakin, James, de Jong, Maaike, & Cruz, Maria. (2023). Practical guide to Software Management Plans (1.1). Zenodo. https://doi.org/10.5281/zenodo.7589725
[3] Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. https://doi.org/10.1038/sdata.2016.18
[4] Barker, M., Chue Hong, N. P., Katz, D. S., Lamprecht, A.-L., Martinez-Ortiz, C., Psomopoulos, F., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., & Honeyman, T. (2022). Introducing the FAIR Principles for research software. Scientific Data, 9(1), Article 1. https://doi.org/10.1038/s41597-022-01710-x
[5] Chue Hong, Neil P., Katz, Daniel S., Barker, Michelle, Lamprecht, Anna-Lena, Martinez, Carlos, Psomopoulos, Fotis E., Harrow, Jen, Castro, Leyla Jael, Gruenpeter, Morane, Martinez, Paula Andrea, Honeyman, Tom, Struck, Alexander, Lee, Allen, Loewe, Axel, van Werkhoven, Ben, Jones, Catherine, Garijo, Daniel, Plomp, Esther, Genova, Francoise, … RDA FAIR4RS WG. (2022). FAIR Principles for Research Software (FAIR4RS Principles) (1.0). https://doi.org/10.15497/RDA00068
Relevant Links
Photo by Ilya Pavlov on Unsplash