,

Modelling life outcomes through foundational machine learning Models

27 August 2024

The project focused on a question of how well can we determine ex ante individuals’ life outcomes? Traditional low dimensional models employed in the social sciences prove of limited effectiveness when given such objective. We seek to improve the performance of such models by applying modern machine learning methods on CBS microdata. Neural network models for natural language processing (eg. BERT and GPT-3) have proven to have extraordinary power when trained on large language corpora. Thus by analogy, we believe that models trained on large and rich datasets of individual life courses for millions of individuals could have great explanatory power for various social outcomes that exceed that of standard social science models. In this project we will focus testing the limits of prediction which will be applied to two dimensions: 1) who individuals’ future spouse will be, and how far in the future?2) How well can we determine students’ grades? 

The target population consists of all individuals living in the Netherlands, for the training of our neural network and graph embedding models. The OSSC is needed to perform the complex models we are interested in.