,

PreFer data challenge: Predicting Fertility outcomes in the Netherlands

27 August 2024

A data challenge is a competition where many researchers work on the same dataset to make the best predictions of a well-defined outcome. Data challenges led to major progress in data science by fostering the development of new algorithms with increasing predictive ability. Data challenges can also accelerate scientific progress because the ability to compare different models gives insights into the research problem at hand. However, in the social sciences data challenges are still rare.

The goal of PreFer is to assess how accurately we can predict a specific fertility outcome: who will have a child between 2021 and 2023. The findings from PreFer could provide valuable insights into fertility behavior and potentially enhance the accuracy of fertility projections. For this challenge, hundreds of researchers will use data from the LISS survey to make predictions. Subsequently, several selected teams will work with Dutch register data, which includes longitudinal data on the entire Dutch population, covering hundreds of relevant characteristics such as education, income, partnerships, and social ties with family, colleagues, neighbors, and (former) classmates.

This extensive dataset allows for the capturing of complex patterns in fertility behavior. Typically, researchers access register data through the Remote Access (RA) environment provided by Statistics Netherlands. However, leveraging advanced predictive algorithms on such a comprehensive dataset requires more computational power than the RA environment can offer. Training these models within RA would be extremely slow, if not impossible. The ODISSEI Secure Supercomputer (OSSC) offers the necessary computational resources, enabling researchers to apply complex algorithms and thus push the boundaries of predictability in fertility outcomes.