By Elizaveta Sivak (University of Groningen) and Gert Stulp (University of Groningen)
Fertility outcomes, or the number and timing of children, influence numerous aspects of individual lives and the development of societies. While researchers have extensively studied how specific factors – such as age, socioeconomic status, or network characteristics – affect fertility, relatively little attention has been paid to the predictability of fertility outcomes. How accurately can we predict fertility given the available methods, theories, data, and computational resources?
Why study the predictability of fertility outcomes?
Examining the predictability of fertility outcomes is important for several reasons:
- Insights for fertility forecasts. Accurate fertility forecasts are vital for planning in different areas such as infrastructure, healthcare, and education. Yet, fertility outcomes are very difficult to predict. Long-term forecasts of fertility are uncertain and changes in fertility are often not foreseen. Fertility forecasts typically only rely on historical fertility rates and age structures. There is evidence that external data can improve predictive accuracy – for example, Google search data helped to predict decreases in fertility after the onset of Covid-19 pandemic. It may be the case that a better understanding of the factors that are very predictive of people’s individual-level fertility will help forecast. This shifts the question to how well we can predict fertility for individuals. Studying predictability also helps clarify the extent to which these forecasts can be improved.
- Feasibility of targeted policies. Different social policies – for example, aimed at reducing involuntary childlessness – could target specific groups for which these risks are higher. Predictability research can assess whether identifying these groups is feasible.
- Advancing fertility research: comparing different methods for predicting fertility outcomes can provide new insights into research on fertility behaviour, identifying areas where current theories succeed or fall short.
- Implications for research on predictability of life outcomes. Measuring the predictability of specific fertility outcomes contributes to the broader field of research on the predictability of life outcomes (Lundberg et al. 2024), helping understand why certain outcomes are more predictable than others.
The role of benchmarking in studying predictability
Benchmarking is a process where researchers engage in a common task of trying to best predict a particular outcome in the holdout dataset based on a common training dataset using a pre-defined metric for out-of-sample predictive ability. Benchmarks, or data challenges, help quantify how predictable an outcome is, given the available data and methods, because when many researchers with various backgrounds participate in a data challenge, the final result likely reflects not just the limits of a particular method or skills of researchers, but the current limits of predictability for a given dataset.
Benchmarks can accelerate scientific progress by allowing us to compare different methods and through this comparison gain insights about what constrains predictability.
PreFer data challenge
To study the predictability of fertility, we conducted the data challenge for Predicting Fertility outcomes in the Netherlands (PreFer) in April-October 2024. It is a collaboration between the Department of Sociology at the University of Groningen, ODISSEI, Eyra, and Centerdata. This project would also not be possible without help from the eScience center, SURF, and Statistics Netherlands.
Participants of PreFer were tasked with predicting whether 18-45-year-old individuals would have a child within the next three years based on the data from previous years. More than 130 people from different countries and scientific backgrounds participated in the data challenge, developing a wide range of approaches to predicting fertility outcomes, from logistic regression models to fine-tuning a large language model.
To study the potential constraints on predictability of fertility outcomes, we used two sources of data:
- Register data. With a large sample size and longitudinal structure, the data from the Dutch population and other registers provide detailed demographic and socio-economic information about people’s life courses.
- Survey data from the LISS panel. This survey data is rich in ‘subjective’ variables such as attitudes and intentions which are not directly measured in the register data.
Several elements of research infrastructure were essential for conducting PreFer:
- The ODISSEI Secure Supercomputer (OSSC), part of the Dutch supercomputer Snellius, provided the secure computing environment crucial for handling the computational demands of developing machine learning models on large-scale register data. It enabled participants to train models on millions of observations, facilitating the development of state-of-the-art machine learning approaches to test the upper limits of the current predictability of fertility outcomes.
- a submission system designed by Eyra to enhance reproducibility while keeping the data secure. Inspired by lessons from the Fragile Families Challenge (Liu & Salganik, 2019), we asked participants of the first part of the challenge (based on survey data) to submit their code and models rather than predictions, which were automatically tested on synthetic “fake” data to identify errors. This allowed debugging and resubmission before the final evaluation on the holdout data.
This approach required additional effort from the participants – and we are deeply grateful for their commitment – but it resulted in most submissions being fully reproducible. Some reproducibility issues persist due to differences in hardware or missing software version details, but the system has already proven its value now, at the earlier stages of analysing the results of the data challenge. For instance, thanks to the longitudinal nature of the datasets it enables retraining models on earlier versions of the data to assess whether predictability changes across different three-year intervals. Furthermore, we can test the submitted models on future data variants to evaluate the impact of adding new variables, providing deeper insights into fertility predictability. - ODISSEI portal, which helped to identify relevant datasets and use the metadata about the Dutch register datasets.
We are currently analyzing the results of the data challenge, but we are excited to already announce and congratulate the winners. Three teams won the challenge in different nominations:
- Most accurate predictive model. Emily Cantrell (Princeton University) and Hanzhang Ren (Stanford University) developed the most accurate predictive models for both the LISS and register data, as measured by the F1 score. In the LISS part of the challenge, they employed a creative approach that leveraged the longitudinal nature of the data to expand the training sample size.
- Contributions to Fertility Research. Alessio Piraccini, Gianluca Tori, and Simone Meneghello (University of Padua) won this nomination for the application of various methods to interpret their predictive model, improving our understanding of fertility behaviour.
- Most Innovative Approach. Emily Cantrell (Princeton University), Flavio Hafner (Netherlands eScience Center), Sayash Kapoor (Princeton University), Malte Lüken (Netherlands eScience Center), Tiffany Liu (Princeton University), Lydia Liu (Princeton University), Arvind Narayanan (Princeton University), Juan Carlos Perdomo (Harvard University), Hanzhang Ren (Stanford University), Matt Salganik (Princeton University), Varun Satish (Princeton University), Benedikt Ströbel (Princeton University), Keyon Vafa (Harvard University), Mark Verhagen (Amsterdam Health & Technology Institute) developed a groundbreaking approach based on fine-tuning a large language model on the register data to predict the outcome.
Conclusion
This blog post highlights the importance of studying the predictability of fertility and the role of benchmarking for this task. While organising a benchmark is a very challenging task, requiring significant effort from participants and placing demands on existing infrastructure, it offers robust evidence about predictability. By organizing the Predicting Fertility data challenge (PreFer), we aim to explore the limits of predictability of fertility outcomes, encourage reproducibility, and stimulate innovation in fertility research. Using both survey data and the Dutch register data allows us to study constraints on the predictability of fertility outcomes.
Photo by Marisa Howenstine on Unsplash