Benchmarking

5 November 2024

Benchmarking is about creating a system in which various ways or approaches to address the same task can be compared, in order to be able to analyse the strengths and weaknesses of various modelling approaches and gain new insights. In the social sciences, benchmarking can help establish which methods and models are most suitable to answer specific research questions.

A benchmarking challenge is when various teams of participants are set a challenge to predict a particular outcome. At the end of the challenge, the teams’ performances are evaluated based on a predefined set of matrices and evaluation criteria. Participants are provided with a training dataset which includes an outcome variable (dependent variable) and a range of predictor variables (independent variables). They then train a model on this dataset that can predict the outcome variable based on the values of the predictors. The resulting model is evaluated by predicting values in a holdout dataset. This is a dataset containing typically around 20% of observations that the participants have not had access to. The winner of the challenge is the group which can most effectively predict the target variable in the holdout data.

PreFer benchmarking challenge

The aim of the challenge is to measure the current predictability of fertility outcomes in the Netherlands to advance our understanding of fertility.

Why predict fertility outcomes in a data challenge?

Fertility is widely studied in diverse disciplines due to its importance to individuals and societies. A lot of factors related to fertility outcomes have been identified. Yet these important factors only explain a fraction of the variation in fertility outcomes and we are unable to explain even their short-term changes. What do we miss?

This data challenge can potentially advance our understanding of fertility behavior and improve social policies and family planning in several ways. Measuring how well different factors and models can predict fertility outcomes for new cases will show which factors are more important. It can narrow down the scope for potential interventions and help people reach their desired family size. Comparing and interpreting different models submitted to a data challenge (e.g. theory- and data-driven) can identify new factors currently overlooked by the theories of fertility and highlight the gaps in current knowledge (e.g. important interactions or non-linear effects).