ODISSEI Conference 2024

Registration is now open! Follow the link below to register for the ODISSEI Conference 2024, hosted at Media Plaza, Jaarbeurs, Utrecht, on the 10th of December 2024.

ODISSEI, the research infrastructure for social science in the Netherlands, connects researchers with the necessary data, expertise and resources to conduct ground-breaking research and embrace the computational turn in social enquiry. This conference seeks to bring together a community of computational social scientists to discuss data, methods, infrastructure, ethics and theoretical work related to digital and computational approaches in social science research.

Registration

The registration for the conference is currently open. Please follow this link to sign up.

Registration is free, but as there is a limit to the location’s capacity, please let us know as soon as possible if you have to cancel your registration via communications@odissei-data.nl.

Download program

Date 10 December 2024

Time09.00-18.00

LocationSupernova (Jaarbeurs), Jaarbeursplein, Utrecht, Netherlands

Contactevgeniia@odissei-data.nl

Programme

Below is the preliminary programme and abstracts.

8.30-9.00 – Registration

Walk-in and registration with coffee and tea

9.00-10.15 – Infrastructure for policy research and ODISSEI Facilities Flashtalks

Room: Progress

This panel session will explore the potential of the ODISSEI infrastructure for policy research and what must be realised to achieve that.

Panellists: Claes de Vreese (Universiteit van Amsterdam/LBDI), Susan van den Braak (WODC), Valérie Pattyn (Leiden University), Verena Seibel (Utrecht University), Jacqueline Mout (OCW).
Moderator: Pearl Dyktra (EUR & ODISSEI)

ODISSEI Facilities Flashtalks

Explore the key ODISSEI Facilities in 5 minutes.

Secure ANalysis Network (SANE) – Ahmad Hesam, Project Manager, SURF
The ODISSEI Portal – Ricarda Braukmann, Data Station Manager Social Sciences, DANS
The ODISSEI SoDa Team – Erik-Jan van Kesteren, Assistant Professor, Utrecht University
FIRMBACKBONE Infrastructure – Wolter Hassink, Full Professor, Utrecht University
ODISSEI Grant Support and the Introduction to the Marketplace – Kasia Karpinska, Scientific Manager, ODISSEI

10.15-10.45 – Coffee break and poster presentations

10.45-12.00 – Parallel Session 1

Transitions in information knowledge

Chair: Andre Valdestilhas

Room: Mission 1

Slava Tykhonov (Data Archiving and Networked Services (DANS)), Fjodor van Rijsselberg (Data Archiving and Networked Services (DANS)), Eko Indarto (Data Archiving and Networked Services (DANS)). The Next Generation of Data Management with Artificial Intelligence
Xander Wilcke (VU Amsterdam – Faculty of Science), Rick Mourits (International Institute of Social History (IISG)), Auke Rijpma (Utrecht University – Faculty of Social Sciences). Data-Driven Hypothesis Discovery on Social-Historical Knowledge Graphs
Flavio Hafner (Netherlands eScience Center (NLeSC)), Chang Sun (Maastricht University – Faculty of Science and Engineering). Empirically testing privacy in machine learning

Abstracts

Slava Tykhonov (Data Archiving and Networked Services (DANS)), Fjodor van Rijsselberg (Data Archiving and Networked Services (DANS)), Eko Indarto (Data Archiving and Networked Services (DANS)), The Next Generation of Data Management with Artificial Intelligence

Building a sustainable and distributed research data infrastructure for the future, utilizing automated metadata translation and enrichment through the integration of knowledge graphs with Large Language Models in the ODISSEI project. An important obstacle with data-driven research lies with the difficulty of knowing what questions the data might have the answers to. Even if rich metadata is provided, these alone are often not sufficient to guide researchers towards new lines of research, which may leave many potentially interesting and novel insights to remain undiscovered. This problem has only exacerbated with the recent digitalization efforts of paper collections and libraries, placing vast amounts of cross-domain heterogeneous data at the researcher’s fingertips. Many of these datasets are now also being published as knowledge graphs, following Linked Data principles, thereby integrating multiple relevant sources of data that have never been together before. This provides unique opportunities for pattern detection methods, which can now leverage the graphs’ deeper relational dependencies and semantics to discover and highlight potentially interesting correlations in the data. Researchers can then inspect these correlations, and, if deemed relevant, use them as starting points for new research directions or as support for existing lines of research.

Xander Wilcke (VU Amsterdam – Faculty of Science), Rick Mourits (International Institute of Social History (IISG)), Auke Rijpma (Utrecht University – Faculty of Social Sciences), Data-Driven Hypothesis Discovery on Social-Historical Knowledge Graphs

To support researchers during this scientific workflow, we introduce an anytime algorithm for the bottom-up discovery of generalized multimodal graph patterns in knowledge graphs. Each pattern is an interconnected set of triples with special variables for classes, datatypes, and value patterns. Upon discovery, the patterns are automatically converted to SPARQL queries and presented in an interactive facet browser together with metadata, graph visualizations, and provenance information, enabling researchers to explore and analyse the patterns as well as reproduce and share relevant data selections. We evaluate our method both quantitatively and qualitatively, on social historical data from The Netherlands, and with the help of domain experts in social science and the humanities.

Flavio Hafner (Netherlands eScience Center (NLeSC)), Chang Sun (Maastricht University – Faculty of Science and Engineering), Empirically testing privacy in machine learning

How reliably do synthetic data from machine learning generators protect the privacy of the original data? We first review the empirical testing of privacy in machine learning, focusing on the intuitions for and assumptions behind such tests. We then discuss whether and how these tests could be used when evaluating generative algorithms for deployment at statistical agencies and in the health care system. Privacy audits assume strong adversaries and can identify bugs in the training algorithm. They could thus become standard in ensuring the correct implementation of a given algorithm. Membership inference attacks assume realistic threat scenarios, but they do not scale in the size of the training data. Their ability to consider the specific context of releasing privacy-sensitive models could make them informative for practitioners—if the scaling challenge is solved. We suggest some ideas for future work and discussion in this direction.

A path to a sustainable future – challenges and applications

Chair: Paulette Flore

Room: Quest

Linde Kattenberg (Maastricht University – School of Business and Economics), Juan Palacios (Maastricht University – School of Business and Economics), Erez Yoeli (MIT). Energy efficiency adoption in social housing
Erkinai Derkenbaeva (Wageningen University & Research – Wageningen Social Science Group), Caroline Christandi Loe (Wageningen University & Research – Wageningen Social Science Group). Agent-based model on solar panels diffusion in urban context
Toon Zijlstra (The Netherlands Institute of Transport Policy Analysis), Gabrielle Uitbeijerse (The Netherlands Institute for Transport Policy Analysis). Does Climate Awareness Result in Less Air Travel?
Hsin-Chueh (Lily) Chen (University of Twente, Behavioural, Management and Social Sciences), A.H. (Hannie) Gijlers (University of Twente, Behavioural, Management and Social Sciences), Pantelis Papadopoulos (University of Twente, Behavioural, Management and Social Sciences). Enhancing Climate Change Literacy Through Agent-Facilitated Collaborative Learning: A Cross-Cultural Comparison

Abstracts

Linde Kattenberg (Maastricht University – School of Business and Economics), Juan Palacios (Maastricht University – School of Business and Economics), Erez Yoeli (MIT), Energy efficiency adoption in social housing

Energy efficiency represents an important means to reduce the financial burden of energy for households, which has been exacerbated by a record-high over the past years due to the surge in the energy prices. There is extensive literature in energy and behavioral economics that aims to uncover the drivers and barriers of households to conserving energy. There is significant discussion on the energy efficiency gap, the phenomenon of households underinvesting in energy efficiency despite its profitability. Research on governmental programs promoting energy efficient technologies concludes that non-monetary barriers play a major role in this gap, though the exact mechanisms are still unknown. To better understand these barriers, we study a context where only non-monetary barriers can hinder energy efficiency adoption.
We survey tenants in social housing across three major housing corporations in the Netherlands. Their landlord will pay for the retrofit, but needs permission to do so from the tenant. It is also the segment where energy poverty is most prominent. We survey tenants at different stages of the retrofitting process and link the survey results to administrative data on household characteristics and energy consumption. Our results show that environmental beliefs and social ties in the neighborhood predict support for sustainable retrofits.
Those household that were opposed the retrofit, but received one because of majority voting on the block level are those that benefit least from the retrofit. Their reported benefits as well as their realized energy savings are significantly lower than those of their neighbors. The results highlight the importance of the design of the permission process of large scale retrofitting projects. Accommodating the needs of different groups in the social housing segment can be important in accelerating the energy transition, as well as safeguarding the position of the most vulnerable tenants.

Erkinai Derkenbaeva (Wageningen University & Research – Wageningen Social Science Group), Caroline Christandi Loe (Wageningen University & Research – Wageningen Social Science Group), Agent-based model on solar panels diffusion in urban context

This study aims to explore the diffusion of solar panels in Amsterdam using an agent-based modeling (ABM) approach. The adoption of photovoltaic (PV) systems in urban settings is critical for advancing sustainable energy solutions. Our ongoing empirical research reveals that despite a rapid PV uptake in Amsterdam, there is still a large unused residential and non-residential rooftop potential that could be utilized. This study seeks to understand the spatial and social dimensions of PV diffusion, examining how neighborhood characteristics and social networks contribute to adoption rates. The ABM will simulate PV adoption dynamics in the urban context based on the comprehensive conceptual framework considering the social influence theory and empirically evident significant spatial, temporal, and behavioral aspects. Central to our ABM is the inclusion of diverse motivations, perceptions, and cultural factors affecting decision-making processes within households and other key stakeholders in the energy system. By capturing the heterogeneity of these actors, the model aims to reflect the complexity of real-world energy networks. Additionally, the inclusion of different agent types, such as energy companies, policymakers, and community organisations, ensures a holistic representation of the urban energy ecosystem. The ABM will simulate PV adoption dynamics in Amsterdam, utilizing empirical data to validate the model and explore various policy scenarios. Key areas of focus include the impact of financial incentives, informational campaigns, and regulatory changes on PV uptake rates. The model will also investigate the interplay between individual and collective behaviors, assessing how localized adoption can influence broader urban diffusion patterns. By incorporating these elements, the ABM aims to provide insights into effective strategies for accelerating PV adoption in urban contexts. This research contributes to the growing body of knowledge on renewable energy adoption, offering valuable policy recommendations for enhancing the penetration of solar technologies in cities.

Toon Zijlstra (The Netherlands Institute of Transport Policy Analysis), Gabrielle Uitbeijerse (The Netherlands Institute for Transport Policy Analysis), Does Climate Awareness Result in Less Air Travel?

People with strong climate awareness intend to fly less often for non-business travel in the coming years. This conclusion stems from our analysis of the 2022 flight behaviour survey, within the Netherlands Mobility Panel (MPN). While knowledge of climate change is a necessary condition for reducing air travel, it is not sufficient on its own. Countervailing forces, particularly social norms and traveller identity, play a significant role. Individuals who identify as travellers do not see any option to reduce their air travel. Additionally, the perceived ability to act on climate awareness is hindered when individuals are surrounded by frequent flyers in their social circles.
As of 2024, we have the opportunity to examine if people’s intentions to fly less have translated into actual behaviour changes. We analyse data from a new questionnaire conducted in June 2024, which includes responses from about 800 participants who also completed the earlier air travel survey. This allows us to measure the extent to which intentions have turned into concrete actions.The previous survey took place during the COVID-19 pandemic, so we compare the air travel frequency with pre-2020 levels. We employ a structural equation model, conditioning for education level as an important confounder. Given the modest impact of climate awareness on flight propensity from the earlier analysis, we do not expect to see a strong effect now. This is particularly true since most people only fly sporadically for private purposes.

Hsin-Chueh (Lily) Chen (University of Twente, Behavioural, Management and Social Sciences), Hannie Gijlers (University of Twente, Behavioural, Management and Social Sciences), Pantelis Papadopoulos (University of Twente, Behavioural, Management and Social Sciences), Enhancing Climate Change Literacy Through Agent-Facilitated Collaborative Learning: A Cross-Cultural Comparison

Climate change is a critical global issue. We designed a literacy-oriented climate change curriculum implemented in a computer-supported collaborative learning environment. The curriculum, focusing on “”the impact of human activities on climate change,”” included dialogue-based activities with the conversational agent Clair (i.e., Collaborative Learning Agent for Interactive Reasoning). Clair monitors student-student chat discussions, and intervenes as a teacher would to ask questions and prompt the students to engage in a more productive dialogue.
The study involved 176 students aged 14 to 16 from Taiwan (96), Germany (29), and the Netherlands (51) and conducted a cross-cultural comparative analysis. Analysis showed that Clair intervened less frequently in Taiwanese dialogues, averaging 1-2 times, compared to 3-4 times in Germany and the Netherlands. This difference likely stems from cultural distinctions in education; Taiwanese students maintain a high degree of formality and engage less in open interactions, whereas European students’ more open classroom interactions led to more frequent agent interventions.
Preliminary data from Germany indicate that Clair’s interventions not only improved teamwork and collaboration but also enhanced students’ ability to identify reliable climate change sources and understand its complex impacts, leading to more comprehensive solutions. However, post-intervention findings suggest an overly optimistic view of nature’s resilience among students, deviating from scientific consensus. These insights help the educational community better understand and address the impact of cultural differences on instructional effectiveness, aiming for a more personalized and inclusive educational environment.

Predicting Fertility – benchmarking and prediction in social sciences

Chair: Elizaveta Sivak

Room: Progress

Elizaveta Sivak (University of Groningen – Faculty of Behavioural and Social Sciences), Gert Stulp (University of Groningen – Faculty of Behavioural and Social Sciences). The Predicting Fertility (PreFer) data challenge
Hanzhang Ren (Stanford University – Department of Sociology) and Emily M. Cantrell (Princeton University – Department of Sociology) (equal contributions). How to Expect When They’re Expecting: Strategies for Improving Fertility Predictions in Survey and Register Data
Alessio Piraccini (University of Padua), Gianluca Tori (University of Padua), Simone Meneghello (University of Padua). Understanding fertility: predicting parenthood with LISS panel
Matthew Salganik (Princeton University – Department of Sociology). Breaking out of the X matrix: Language models as foundation models for social science

Abstracts

Gert Stulp (University of Groningen – Faculty of Behavioural and Social Sciences), Elizaveta Sivak (University of Groningen – Faculty of Behavioural and Social Sciences). Prediction in the social sciences: lessons from PreFer (the data challenge for Predicting Fertility in the Netherlands)

A focus on prediction is believed to advance the social sciences by providing novel ways of learning from the data, increasing reproducibility, and improving forecasts. Yet studies with a focus on prediction are relatively rare, and the methodology of using predictive models and generative AI for advancing social sciences is still developing. PreFer (the data challenge for Predicting Fertility outcomes in the Netherlands) allowed us to evaluate the performance of cutting-edge deep learning and generative AI models against more traditional ML approaches and theory-based models, and learn more about their potential for social sciences. In this session, we will discuss the implications of PreFer for demography and other social sciences. We will discuss what we learned about the predictability of fertility and what it means for the theories of fertility, social policies, and demographic forecasting. We will discuss what new knowledge we have gained about fertility behaviour and more generally about the predictability of life outcomes.

Winners of the challenge (TBA)

Opportunities through Data Donation

Chair: Marcel Das

Room: Royal Lobby

Laura Boeschoten (Utrecht University – Faculty of Social Sciences), Niek De Schipper (University of Amsterdam – Faculty of Social and Behavioural Sciences), Adrienne Mendrik (Eyra). Digital trace data collection through data donation
Felicia Loecherbach (University of Amsterdam – Faculty of Social and Behavioural Sciences), Linda Bos (University of Amsterdam – Faculty of Social and Behavioural Sciences), Jessica Piotrowski (University of Amsterdam – Faculty of Social and Behavioural Sciences). Dissecting Digital Natives’ Political Preferences: Election Campaigns and YouTube Media Exposure
Bella Struminskaya (Utrecht University – Faculty of Social Sciences), Florian Keusch (University of Mannheim). Using Data Donation to Measure Physical Activity in Older Adults
Ilya Fominykh (Utrecht University – Faculty of Social Sciences), Bella Struminskaya (Utrecht University – Faculty of Social Sciences), Thijs Carrière (Utrecht University – Faculty of Social Sciences), Rense Corten (Utrecht University – Faculty of Social Sciences). Selectivity and Nondonation Bias in Donated Digital Trace Data: Exploration Using Linkage Analysis

Abstracts

Laura Boeschoten (Utrecht University – Faculty of Social Sciences), Niek De Schipper (University of Amsterdam – Faculty of Social and Behavioural Sciences), Adrienne Mendrik (Eyra), Digital trace data collection through data donation

In our everyday lives, we leave many traces behind on digital platforms: for example, by liking a social media post or sending a message. The promise of computational social science is that researchers can utilize these digital traces to study human behavior and social interaction at an unprecedented level of detail. However, access for researchers to these traces is increasingly being restricted. An approach to gain access to digital traces is enabled thanks to the GDPR. Thanks to this legislation, all data processing entities are required to provide citizens a digital copy of their personal data upon request (a Data Download Package, DDP). This legislation allows researchers to invite participants to share their DDPs. A major challenge is, however, that DDPs potentially contain sensitive data. Conversely, often not all data is needed to answer a specific research question.
To tackle these challenges, an alternative workflow has been developed: First, the participant requests their DDP at the platform of interest. Second, they download it onto their personal device. Third, by means of local processing, only the features of interest to the researcher are extracted from that DDP. Fourth, the participant inspects the extracted features after which they choose to donate these. Only after clicking the ‘donate’ button, the data is sent to a storage location and can be accessed by the researcher.
Port is an open-source software tool that allows researchers to fully configure their data donation study design using this workflow. Researchers can decide which digital platform to investigate, which digital traces to collect, how to present the data and what to communicate throughout the data donation process. In this presentation, we present the latest version of Port and discuss how Port can be used within the ODISSEI research infrastructure.

Felicia Loecherbach (University of Amsterdam – Faculty of Social and Behavioural Sciences), Linda Bos (University of Amsterdam – Faculty of Social and Behavioural Sciences), Jessica Piotrowski (University of Amsterdam – Faculty of Social and Behavioural Sciences), Dissecting Digital Natives’ Political Preferences: Election Campaigns and YouTube Media Exposure

Political election campaigns are characterized by heightened attention to political news and information, providing an opportunity for all voters to gain in knowledge and political engagement (Hansen & Pedersen, 2014; Van Aelst & De Swert, 2009) to an extent that is unmatched in routine periods. Yet, not only are voters more attentive to content relevant to the election, the sheer amount of political information, especially in the digital information environment (Dommett et al., 2023), is much higher than between election campaigns. Yet, while knowledge on campaign effects among the broader audience is abundant, we know relatively little about the preference formation of digital natives. This is not in the least a consequence of the fact that existing research relies on self-reported media use measures in an information context that is vastly different from the current one. Importantly, while several studies allude to it, there is little knowledge about intra-generational differences in youth campaign effects, even though it is likely that polarized online information environments have different consequences on distinct audiences (Belchior & Teixeira, 2023; Ohme, 2019). In a first step to enhance our understanding of youth campaign effects, we dive deeper into the type of election campaign content different generations were exposed to during the most recent Dutch national elections. In our study, we extend the Dutch Parliamentary Election Study with data donation of media exposure patterns from YouTube. Using the open-source software PORT, we collected YouTube histories from 306 participants (over 2.5 million videos) including advertisements, comments, and searches, allowing us to build detailed measures of content exposure to politics and news for several months per user.

Bella Struminskaya (Utrecht University – Faculty of Social Sciences), Florian Keusch (University of Mannheim), Using Data Donation to Measure Physical Activity in Older Adults

Physical activity is a key predictor of many health outcomes, especially for aging populations. The accurate measurement of PA is key to identifying determinants of health and developing appropriate interventions. To measure PA, most population studies use self-report. However, self-reports are usually limited to global measures of PA (e.g., average daily hours of moderate/vigorous activity) and suffer from misclassification (e.g., walking the dog not considered PA). More fine-grained day-reconstruction methods are burdensome for respondents and prone to recall error. As an alternative researchers are providing study participants with wearable devices that passively track PA, which reduces reactivity and recall error. However, participants’ non-compliance and high device costs are problematic. Many older adults now have smartphones that track physical activity and individuals can share these passively collected physical activity data with researchers. We use the data donation approach: individuals thanks to the General Data Protection Regulation can request the data that companies and digital platforms collect about them. These data are provided in a machine-readable format in Data Download Packages. We use a privacy-preserving data donation tool (Boeschoten et al. 2023) integrated with a probability-based online panel of the general population. We asked panel members who own smartphones to donate the data on their PA from health apps and Google Semantic Location History. We investigate determinants of consent and selection bias in PA data donation among ca. 2,000 individuals aged 50 and older in the Netherlands. Using the rich data available about panelists (last 15 years) linked to donated data, we assess the quality of the donated physical activity data, and evaluate how well multi-source physical activity data can predict health outcomes. Our study contributes to the development of future-proof methods for collecting high-quality physical activity data and innovations in surveys.

Ilya Fominykh (Utrecht University – Faculty of Social Sciences), Bella Struminskaya (Utrecht University – Faculty of Social Sciences), Thijs Carrière (Utrecht University – Faculty of Social Sciences), Rense Corten (Utrecht University – Faculty of Social Sciences), Selectivity and Nondonation Bias in Donated Digital Trace Data: Exploration Using Linkage Analysis

Integrating survey data and donated digital trace data (DTD) has shown great potential in modern science. However, the quality of the donated data often suffers from selectivity, as specific social groups are less likely to share it. We identify mechanisms of (non)donation which lead to selectivity and show how they contribute to nondonation bias. We distinguish between 4 levels of willingness to share DTD: non-response, non-compliance, non-contribution, and actual donation. We utilize data from the Dutch online general population panel, the LISS Panel, and WhatsApp Data Donation project, to explore sources on selectivity, depending on willingness to share. We find evidence that while some mechanisms are important at every stage of the data donation process (e.g., tech savviness, generalized trust, privacy concerns), others work only on the particular stages (e.g., warm glow giving effect, some personality traits). We then explore bias in donated DTD using survey data as a benchmark. Based on the results, we provide recommendations on how to use DTD combined with survey data in their research.

Labour Market Dynamics

Chair: Anne-Sophie Halbertsma

Room: Mission 2

Christoph Janietz (University of Groningen – Faculty of Behavioural and Social Sciences), Zoltán Lippényi (University of Groningen – Faculty of Behavioural and Social Sciences), Netherlands. Integrated Data Infrastructure of Inequality in Organizations (NIDIO)
Huyen Nguyen (Utrecht University – Faculty of Social Sciences), Frank van Tubergen (Netherlands Interdisciplinary Demographic Institute (NIDI)). Demographic differences in Self-Evaluation and Self-Presentation Strategies in Human vs. AI Recruitment
Pim Kastelein (Netherlands Bureau for Economic Policy Analysis), Emile Cammeraat (Netherlands Bureau for Economic Policy Analysis), Brinn Hekkelman (Netherlands Bureau for Economic Policy Analysis), Suzanne Vissers (CPB). Predictability and (Co-)Incidence of Labor and Health Shocks

Abstracts

Christoph Janietz (University of Groningen – Faculty of Behavioural and Social Sciences), Zoltán Lippényi (University of Groningen – Faculty of Behavioural and Social Sciences), Netherlands Integrated Data Infrastructure of Inequality in Organizations (NIDIO)

The use of linked employer-employee register microdata is gaining popularity in social science research. However, there are few expert resources to guide academic users how to handle and analyze these data. We introduce NIDIO, an open-source infrastructure of data-processing code and guidelines that help with the construction and use of linked employer-employee datasets. NIDIO’s overall aim is two-fold: (1) enhance the accessibility of linked employer-employee register microdata for social science research and (2) contribute to open and replicable science by improving the transparency of working with linked employer-employee register data.
This presentation addresses three common challenges that are encountered while working with linked employer-employee register microdata and illustrates how NIDIO tackles them. First, creating linkages between administrative data sources involves several decision-making steps which often remain undisclosed in published research. NIDIO provides transparent data processing routines that improve reproducibility. Second, translating administrative data and measures into social science concepts and variables is non-trivial. NIDIO provides guidelines and best practices how to bridge the gap between administrative and scientific data. Third, users with limited prior experience of working with linked employer-employee register data face high startup costs during project setup. NIDIO eases time- and labor-intensive data setup by providing installable programs and packages.
NIDIO draws on Dutch Bureau of Statistics (CBS) administrative register data on the organizational and employee population. We integrate administrative datasets into a harmonised three-level data structure (individuals; jobs or functionary appointments; organisations). NIDIO reconstructs workers’ demographic profiles and employment outcomes (wages, managerial and directorate positions, occupations) and links them to organizational characteristics, policies, and inter-organizational networks. Thereby, NIDIO allows the integrated study of inequality within and between Dutch organisations.

Huyen Nguyen (Utrecht University – Faculty of Social Sciences), Frank van Tubergen (Netherlands Interdisciplinary Demographic Institute (NIDI)), Demographic differences in Self-Evaluation and Self-Presentation Strategies in Human vs. AI Recruitment

Do people across demographics self-present differently when facing automated (AI) recruitment? Using a vignette survey experiment on 2946 working adults across industries in the US and the UK, this research investigates self-presentation strategies and self-confidence beliefs when facing human vs. automated recruiters, taking into account self-evaluation beliefs of success chances and perceptions of personal and group discrimination in the labor market. We found that, overall, self-evaluation beliefs about their interview answers were significantly lower in the AI treatment than in the human recruiter treatment. Linguistically, interview answers with a higher proportion of complex words and analytic style are more likely to be in the AI treatment than the human recruiter treatment. When considering education levels, while interview texts from AI treatment for both higher and lower-educated participants are more likely to have complex words, texts from lower-educated participants are significantly more likely to have an authentic style. Psychometrically, interview answer texts from US participants show no significant difference between the AI and human recruiter treatments. Importantly, even though women and ethnic minorities are notably more optimistic about the use of AI in tackling hiring discrimination, they believe that they have a significantly better chance of getting an interview if their answers are evaluated by human recruiters instead of AI. Our findings provide novel insights to practitioners and job seekers to understand the behavioral aspects of the labor supply side, especially for organisations that are using or considering embracing AI recruitment in their hiring processes.

Pim Kastelein (Netherlands Bureau for Economic Policy Analysis), Emile Cammeraat (Netherlands Bureau for Economic Policy Analysis), Brinn Hekkelman (Netherlands Bureau for Economic Policy Analysis), Suzanne Vissers (CPB), Predictability and (Co-)Incidence of Labor and Health Shocks

Life is inherently marked by challenges, and the ability to recover from adverse life events varies among individuals. This study investigates the predictability of recovering from setbacks in the domains of labor and health using rich administrative data on the entire Dutch population. Employing machine learning techniques, we estimate the likelihood of individuals overcoming adverse shocks within a foreseeable time frame. The results demonstrate that recovery from a labor and health shock is to a large extent predictable, especially in the labor domain. It is not only possible to accurately forecast the recovery probability, but also the specific recovery path (such as the route via various welfare benefits before work resumption or the extent to which health care costs decline after an initial spike). Furthermore, the estimated recovery probability distributions highlight that there is a lot of inequality across the population in chances of recovery. A distinct group of individuals has a near-zero probability of recovering from a setback within a year, while another group of individuals is virtually guaranteed to recover. The fact that this heterogeneity is forecastable implies that policy can be tailored towards this, for example by means of targeted prevention policies. We supplement these findings with ex-ante shock probabilities from the study of Cammeraat et al. (2024), who investigate the predictability and concurrence of risks in the domains of labor and health for the same sample. It turns out that individuals with the highest ex-ante likelihood of facing setbacks also encounter greater challenges in the recovery process. This insight underscores the critical interplay between pre-existing risk factors and the difficulties individuals face in bouncing back from setbacks. There is a distinct vulnerable group that faces predictably high ex-ante susceptibility and low ex-ante resilience, again suggesting that targeted prevention policies could be useful supporting policies.

NLP and Text analysis insights

Chair: Jessica Piotrowski

Room: Expedition

Modhurita Mitra (Utrecht University: Research and Data Management Services), Martine de Vos (Utrecht University: Research and Data Management Services), Nicola Cortinovis (Utrecht University – Faculty of Geosciences). Generative AI for research data processing: Lessons learnt from three use cases
Riccardo Loconte (Tilburg University, Tilburg School of Social and Behavioural Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences), Moving the dial in detecting embedded lies using computational analysis
Justin Chun-ting Ho (University of Amsterdam – Faculty of Social and Behavioural Sciences), A. Marthe Möller (University of Amsterdam – Faculty of Social and Behavioural Sciences), Joanna Strycharz (University of Amsterdam – Faculty of Social and Behavioural Sciences). Comparative Media Dataset from Web Crawl Data
Kristina Thompson (Wageningen University & Research – Wageningen Social Science Group), Qixiang Fang (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences). A historical disease database of the Netherlands based on newspapers using natural language processing techniques

Abstracts

Modhurita Mitra (Utrecht University: Research and Data Management Services), Martine de Vos (Utrecht University: Research and Data Management Services), Nicola Cortinovis (Utrecht University – Faculty of Geosciences), Generative AI for research data processing: Lessons learnt from three use cases

Generative AI has generated enormous interest since ChatGPT was introduced in 2022. However, the adoption of this new technology in research has been limited due to concerns about the accuracy and consistency of the outputs produced by generative AI.
In an exploratory study, we used generative AI to perform research data processing tasks for which rule-based or traditional machine learning methods were unsuitable. We performed the following complex data processing tasks relevant to the social sciences and humanities:
1. Information extraction (of plant species names) from botanical catalogues.
2. Natural language understanding (of Health Technology Assessment documents) to extract data points of interest.
3. Text classification (of projects on the crowdfunding website Kickstarter) to assign appropriate industry codes.
We present the lessons learnt from this study:
1. How to assess if generative AI is a suitable tool for a particular use case, and
2. Strategies for enhancing the accuracy and consistency of the outputs produced by generative AI.

Riccardo Loconte (Tilburg University, Tilburg School of Social and Behavioural Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences), Moving the dial in detecting embedded lies using computational analysis

Verbal deception detection research often relies on written accounts or transcripts of conversations and typically assumes statements to be either truthful or deceptive. However, a more nuanced and realistic perspective acknowledges that the veracity of statements exists on a continuum where truthful and deceptive parts are embedded within the same statement. The detection of embedded lies has been hampered by lack of suitable datasets and sufficiently granular analytical methods.
To move the dial on embedded deception detection, we developed a new dataset of truthful and deceptive statements in which participants explicitly indicated the embedded lies within their texts. Using a within-subjects design, participants provided a truthful account of a past autobiographical event. They were then instructed to re-write the same event in a deceptive manner. Importantly, each participant then selected phrases from the deceptive statements that constituted embedded lies (i.e., lies within an otherwise truthful account). Each embedded lie was rated regarding its centrality, deceptiveness and source by the participant. We further collected demographic variables and measured lying profiles (with the Lying Profile Questionnaire) after controlling for social desirability (using the Balanced Inventory of Desirable Responding).
We present the dataset as a novel resource for verbal deception detection and present findings about embedded lies from a psychological perspective on the subject level. Furthermore, we present the results of deception detection using computational techniques from Natural Language Processing and machine learning. These results will be discussed in terms of challenging the established scientific knowledge on deception detection by moving the dial on embedded lies.

Justin Chun-ting Ho (University of Amsterdam – Faculty of Social and Behavioural Sciences), A. Marthe Möller (University of Amsterdam – Faculty of Social and Behavioural Sciences), Joanna Strycharz (University of Amsterdam – Faculty of Social and Behavioural Sciences), Comparative Media Dataset from Web Crawl Data

The analysis of textual content is at the heart of communication science. Traditionally, news articles can be accessed through subscription-based databases, such as LexisNexis. Yet, these databases often prohibit batch downloading with automated means, which makes obtaining media content at a large scale a tedious task. Although harvesting articles directly from news outlets is possible, a huge amount of time and effort is needed to maintain web scrapers for each individual outlet. Recently, some efforts were made to curate large scale comparative news dataset, such as INCA and the Comparative Agendas Project. However, these projects tend to focus on content produced in Europe and the US which poses serious limitations on our ability to conduct comparative research, especially with cases in non-Western contexts. Due to the tremendous effort needed to collect news content from multiple countries, large scale comparative media research on data that are not only from Europe or the US is rare. The goal of this project is to create an annotated database that provides communication scholars access to newspapers representative of peoples and cultures worldwide.
In this project, we aim to curate a dataset of articles from the top outlets of over 20 participating countries in the European Values Survey and World Values Survey. To bypass the limitation on disseminating copyrighted materials, we will use an approach similar to the standard practice for sharing Twitter data: the dataset will not contain the actual content of the articles, but links to Common Crawl, an open access repository of web crawl data. A software package will be developed to extract the texts from the Common Crawl data files. Additionally, we will release the numerical representation (embeddings) of all articles, making it possible to do computational analysis without obtaining the full text.

Kristina Thompson (Wageningen University & Research – Wageningen Social Science Group), Qixiang Fang (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences). A historical disease database of the Netherlands based on newspapers using natural language processing techniques

The nineteenth and early twentieth century Netherlands had a very high burden of infectious diseases. Outbreaks of diseases such as tuberculosis and smallpox were common. To study health and living standards in this period, capturing the disease burden is critical.
While today we are able to measure the disease burden with infection rates, this is not possible for the past. In the Netherlands, there are no available measures of the disease burden prior to the mid-twentieth century, aside from mortality rates.
Mortality rates are also not perfect proxies of disease: they fail to capture diseases that negatively impacted an individual throughout their life, but did not kill them. This prevents researchers from, for instance, studying how exposure to infectious diseases in early life might impact someone’s health in later life. An indicator of the infectious disease burden that is not based on mortality information would enable researchers to do so.
In the nineteenth and twentieth centuries, Dutch newspapers reported on the number of new cases and deaths from specific infectious diseases. A large share of these newspapers are digitized and publicly available via the Royal Library’s Delpher.nl. This project aims to convert these newspaper mentions into a quantitative indicator of the disease burden.
To do so, we leverage natural language processing methods to: process newspaper mentions of a disease, connect them to a geographical region, map them onto numeric scales, and assess their performance as a proxy of the disease burden in a given municipality and year. The result will be an open-source disease database and historical map. Further, we show how this disease database can be used by researchers, and how to account for measurement uncertainties in the disease database

12.00-13.00 – Lunch break and poster presentations

Poster presentations

1. Jack Fitzgerald (VU Amsterdam – School of Business and Economics), The Need for Equivalence Testing in Economics
2. Abhishta Abhishta (Twente University – Faculty of Behavioural, Management and Social Sciences), Exploring the role of Internet measurements for economic decision making
3. Jérôme Francisco Conceicao (TU Delft – Faculty of Architecture and the Built Environment), Ana Petrović (TU Delft – Faculty of Architecture and the Built Environment), Maarten Van Ham (TU Delft – Faculty of Architecture and the Built Environment), The operationalisation of contextual poverty from individual and household perspectives: Does it matter how we measure income when estimating neighbourhood effects?
4. Oskar Veerhoek (Radboud University Nijmegen – Faculty of Management Sciences), The Springboarding Organization
5. Brinn Hekkelman (Netherlands Bureau for Economic Policy Analysis), Pim Kastelein (Netherlands Bureau for Economic Policy Analysis), Suzanne Vissers (Netherlands Bureau for Economic Policy Analysis), Predicting Persistence of Labor and Health Shocks
6. Pim Koopmans (Leiden University – Faculty of Law), Max van Lent (Leiden University – Faculty of Law), Marike Knoef (Tilburg University, Tilburg School of Economics and Management), The Impact of Retirement on Household Finances: Causal
Evidence from Transaction Data
7. Sander Kraaij (University of Cologne), Jan Kabátek (University of Melbourne), Sacha Kapoor (Erasmus University Rotterdam, School of Economics), Systemic Discrimination in Firing
8. Jordy Meekes (Leiden University – Faculty of Law), Maddalena Ronchi (Northwestern University), Mind the Cap: The Effects of Regulating Bankers’ Pay
9. Samin Nikkhah Bahrami (Utrecht University – Faculty of Law, Economics and Governance), Karlijn Morsink (Utrecht University – Faculty of Law, Economics and Governance), Chris Barret (Cornell University- Department of Economics ), On targeting; Predictors of Expected Consumer Welfare from Catastrophic Drought Insurance
10. Ceciel Pauls (VU Amsterdam – Faculty of Science), Michel Klein (VU Amsterdam – Faculty of Science), Stef Bouwhuis (VU Amsterdam – Faculty of Social Sciences), Objective or subjective employment precariousness? Comparing definitions to a topic model based on user-generated data.
11. Dimitris Pavlopoulos (VU Amsterdam – Faculty of Social Sciences), Roberta Varriale (Sapienza University), Mauricio Garnier-Villarreal (VU Amsterdam – Faculty of Social Sciences), Cross-country differences in employment mobility in the presence of measurement error. A multiple-group hidden Markov model using linked administrative and survey data
12. Koen Steenks (VU Amsterdam – Faculty of Social Sciences), Stef Bouwhuis (VU Amsterdam – Faculty of Social Sciences), Dimitris Pavlopoulos (VU Amsterdam – Faculty of Science), Classifying employer orientation: how do firms combine wage policy and the use of non-standard employment
13. Eduard Suari-Andreu (Leiden University – Faculty of Law), Max van Lent (Leiden University – Faculty of Law), Time to Give: Health Shocks as a Trigger for Inter-Vivos Transfers
14. Anastasiia Voloshyna (University of Groningen, Faculty of Economics and Business), Agnieszka Postepska (University of Groningen, Faculty of Economics and Business), (), Can hybrid work help close the labor market gender gaps?
15. Mert Akay (TU Delft – Faculty of Industrial Design Engineering), Abhigyan Singh (TU Delft – Faculty of Industrial Design Engineering), Mapping the Public Participation in Climate Resilience Studies through Structured Topic Modeling
16. Lianne Bakkum (VU Amsterdam – Faculty of Behavioural and Movement Sciences), Carlo Schuengel (VU Amsterdam – Faculty of Behavioural and Movement Sciences), Age of entry into the Dutch child protection system of children of parents with intellectual disability: A case-control study
17. Thales Bertaglia (Utrecht University – Faculty of Law, Economics and Governance), Catalina Goanta (Utrecht University – Faculty of Law, Economics and Governance), Adriana Iamnitchi (Maastricht University – Faculty of Science and Engineering), The Monetisation of Toxicity: Analysing YouTube Content Creators and Controversy-Driven Engagement
18. Marissa Bultman (Netherlands Court of Audit), Eline Smit (Netherlands Court of Audit), Evaluating the Effectiveness of the Wi2021 Integration Law in Accelerating Labor Market Participation of Asylum Status Holders in the Netherlands (Work in Progress)
19. Mª Ángeles Caraballo (University of Seville), Oksana Liashenko (University of Seville), Social attitudes do matter. A worldwide perspective
20. Giovanni Cassani (Tilburg University, Tilburg School of Humanities and Digital Sciences), Stijn Rotman (Tilburg University, Tilburg School of Humanities and Digital Sciences), Drew Hendrickson (Tilburg University, Tilburg School of Humanities and Digital Sciences), Boosting fertility predictions: a bottom-up, data-driven, cross-sectional Light Gradient Boosting Machine model for fertility prediction from survey data
21. Qian Chen (Tilburg University, Tilburg School of Social and Behavioural Sciences), Jonas Everaert (Tilburg University, Tilburg School of Social and Behavioural Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences), Measuring inflexible and biased interpretations using linguistic analyses to reveal pathways to depression and anxiety
22. Juliette de Wit (University of Groningen, Faculty of Economics and Business), Maite Laméris (University of Groningen, Faculty of Economics and Business), Sjoerd Beugelsdijk (Darla Moore School of Business, University of South Carolina), National identification and voting behaviour
23. Andrea Gradassi (University of Amsterdam – Faculty of Social and Behavioural Sciences), Scarlett Slagter (University of Amsterdam – Faculty of Social and Behavioural Sciences), Lucas Molleman (University of Amsterdam – Faculty of Social and Behavioural Sciences), Social influence of high-status peers in adolescents social networks
24. Rolf Granholm (University of Groningen – Faculty of Behavioural and Social Sciences), Anne Gauthier (Netherlands Interdisciplinary Demographic Institute (NIDI)), Gert Stulp (University of Groningen – Faculty of Behavioural and Social Sciences), Measuring the relative importance of fertility determinants for recent birth cohorts across 7-19 countries with GGS II data using microsimulation
25. Saurabh Khanna (University of Amsterdam – Faculty of Social and Behavioural Sciences), Knowing Unknowns in an Age of Incomplete Information
26. Angel Lazaro (Wageningen University & Research – Wageningen Social Science Group), Roger Cremades (Wageningen University & Research – Wageningen Social Science Group), Eveline van Leeuwen (Wageningen University & Research – Wageningen Social Science Group), Steering sustainable food systems: the complex co-evolution of consumer preferences, sustainable restaurants, and policymaking
27. Maël Lecoursonnais (Linköping University – Institute for Analytical Sociology), Selcan Mutgan (Linköping University – Institute for Analytical Sociology), Life-Course Trajectories of Experienced Segregation
28. Gabriele Mari (Erasmus University Rotterdam – School of Social and Behavioural Sciences), Emanuele Fedeli (University of Milan “La Statale”), Child Penalties and Public Childcare Provisions Under Fiscal Austerity
29. Tamara Mtsentlintze (Utrecht University – Faculty of Science), Esmee Dekker (Utrecht University – Faculty of Science), Damion Verboom (Utrecht University – Faculty of Science), European Value Maps: can data visualizations contribute to increased tolerance in society?
30. Vittorio Nespeca (TU Delft – Faculty of Technology Policy and Management ), Tina Comes (Tu Delft – Faculty of Technlogy, Policy and Management), Frances Brazier (TU Delft – Faculty of Technlogy, Policy and Management), Learning to select information exchange hubs: Capturing the emergence of boundary spanning in volatile conditions
31. Keenan Ramsey (Twente University – Faculty of Behavioural, Management and Social Sciences), Anne van Dongen (Twente University – Faculty of Behavioural, Management and Social Sciences), Robbert Sanderman (Twente University – Faculty of Behavioural, Management and Social Sciences), Assessing the scope of mental health (non)-recovery in the aftermath of the COVID-19 pandemic
32. Agata Troost (University of Groningen – Faculty of Spatial Sciences ), Jaap Nieuwenhuis (University of Groningen – Faculty of Behavioural and Social Sciences), Jonathan Mijs (Boston University), Exploitation-based class scheme, social inequality and contemporary conflicts: a novel empirical approach
33. Thorid Wagenblast (TU Delft), Social influence in the context of climate change adaptation: analyzing cross-national survey data
34. Jari Zegers (Tilburg University, Tilburg School of Social and Behavioural Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences), Understanding psychological responses to the COVID-19 pandemic with latent class growth analysis
35. Bente Zuijdam (Maastricht University – Faculty of Science and Engineering), Adriana Iamnitchi (Maastricht University – Faculty of Science and Engineering), Demographic and Political Differences in Twitter Abuse: The Case of Dutch Politicians
36. Christian Olesen (University of Amsterdam – Faculty of Humanities), Isadora Paiva (University of Amsterdam – Faculty of Humanities), The CLARIAH Media Suite: An introduction to qualitative media analysis using automatic data enrichments and annotation
37. Adrienne Mendrik (Eyra), Emiel van der Veen (Eyra), Jeroen Vloothuis (Eyra), The Next platform: What do data donation, benchmark challenges and participant recruitment have in common?
38. Angelica Maria Maineri (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Laura Boeschoten (Utrecht University – Faculty of Social Sciences), Niek de Schipper (University of Amsterdam – Faculty of Social and Behavioural Sciences), The impact of constant connectivity on employees’ well-being: a data donation pilot study
39. Anastasiya Alferova (University of Utrecht), Ethical / Legal Implications of Data Control and Privacy in the Digital Marketplace
40. Mohammad Behbahani (Utrecht University – Faculty of Social Sciences), Mahdi Shafiee Kamalabad (Utrecht University – Faculty of Social Sciences), Emmeke Aarts (Utrecht University – Faculty of Social Sciences), Hidden state detection in Relational event history data: an extension of Hidden Markov Model for the Relational Event Model
41. Elena Candellone (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences), Sofia Chelmi (University of Bologna), Community detection in bipartite signed networks is highly dependent on parameter choice
42. Qixiang Fang (Utrecht University – Faculty of Social Sciences), Solichatus Zahroh (Utrecht University – Faculty of Social Sciences), Daniel Oberski (Utrecht University – Faculty of Social Sciences), Enhancing Human Values Prediction from Digital Trace Data through Measurement Knowledge Integration
43. Santiago Gómez-Echeverry (Statistics Netherlands (CBS)), Arnout van Delden (Statistics Netherlands (CBS)), Ton de Waal (Tilburg University, Tilburg School of Social and Behavioural Sciences), Modeling Total Error using Linked Survey and Administrative Data: A Simulation and an Application to the Italian Labor Market
44. Anirudh Govind (KU Leuven, Belgium), Ate Poorthuis (KU Leuven, Belgium), Ben Derudder (KU Leuven & Ghent University, Belgium), Cocooning? A multi-scalar analysis of the determinants of persistent activity space segregation
45. Jiri Kaan (Wageningen University & Research – Wageningen Social Science Group), Yara Khaluf (Wageningen University & Research – Wageningen Social Science Group), Kristina Thompson (Wageningen University & Research – Wageningen Social Science Group), An Agent-Based Model Comparing Two Reward Learning Algorithms In Dynamic Food Environments
46. Jiri Kaan (Wageningen University & Research – Wageningen Social Science Group), Kristina Thompson (Wageningen University & Research – Wageningen Social Science Group), Yara Khaluf (Wageningen University & Research – Wageningen Social Science Group), Agent-Based Models Of Social Network Interventions Promoting Health And Well-being: A Systematic Review
47. Rishabh Kaushal (Maastricht University – Faculty of Science and Engineering), Adriana Iamnitchi (Maastricht University – Faculty of Science and Engineering), Nicole Kilk (Maastricht University – Faculty of Science and Engineering), The Needle in the Haystack: What Platforms Report to the DSA Transparency Database When They Don’t Have To
48. Paul Keuren (UU-FSW and CBS), Marc Ponsen (Statistics Netherlands (CBS)), Ayoub Bagheri (Utrecht University – Faculty of Social Sciences), Expert Embedding Alignment
49. Jonas Klingwort (Statistics Netherlands (CBS)), Yvonne Gootzen (Statistics Netherlands (CBS)), Daniëlle Remmerswaal (Utrecht University – Faculty of Social Sciences), Validating a smart survey travel app: how do GPS measurements compare to reported behavior?
50. Pradeep Kumar (Centerdata), Joris Mulder (Centerdata), Investigating the Impact of Survey Methodologies on Predictive Accuracy in Time Use Modeling
51. Yue Li (Twente University – Faculty of Behavioural, Management and Social Sciences), Marcello A. Gómez-Maureira (Twente University – Faculty of Electrical Engineering, Mathematics and Computer Science), Stéphanie van den Berg (Twente University – Faculty of Behavioural, Management and Social Sciences), Generalizing Behavior Prediction Models from VR Experiences
52. Sanne Peereboom (Tilburg University, Tilburg School of Social and Behavioural Sciences), Inga Schwabe (Tilburg University, Tilburg School of Social and Behavioural Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences), Cognitive phantoms in large language models through the lens of latent variables
53. Abhigyan Singh (TU Delft – Faculty of Industrial Design Engineering), Natalia Romero Herrera (TU Delft – Faculty of Industrial Design Engineering), Razieh Torkiharchegani (TU Delft – Faculty of Industrial Design Engineering), Converging design methods and computational analysis to support community lifelong learning practices
54. Weronika Sojka (Wageningen University & Research – Wageningen Social Science Group), Erkinai Derkenbaeva Derkenbaeva (Wageningen University & Research – Wageningen Social Science Group), Eveline van Leeuwen (Wageningen University & Research – Wageningen Social Science Group), Agent-based modeling and Urban Digital Twin simulations: state of the art for complex circular systems
55. Stéphanie van den Berg (Twente University – Faculty of Behavioural, Management and Social Sciences), Ulrich Halekoh (SDU), Jacob Hjelmborg (SDU), Using computer vision to estimate a Linfoot correlation
56. Sterre van der Kaaij (National Institute for Public Health and the Environment (RIVM)), Lenneke Vaandrager (Wageningen University & Research – Wageningen Social Science Group), Hanneke Kruize (National Institute for Public Health and the Environment (RIVM)), Using Agent-Based Modeling and Group Model Building to Understand How Spatial Interventions Affect Health Behaviours: A Research Protocol
57. Thom Volker (Utrecht University – Faculty of Social Sciences), Carlos Gonzalez Poses (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences), densityratio: An R-package for density ratio estimation
58. Shuai Wang (VU Amsterdam – Faculty of Science), Maria Adamidou (VU Amsterdam – Faculty of Science), Examining LGBTQ+-related Concepts and Their Links in the Semantic Web

Abstracts

Abhishta Abhishta (Twente University – Faculty of Behavioural, Management and Social Sciences), Exploring the role of Internet measurements for economic decision making

Internet presents unique opportunities and challenges for economic decision-making. This talk explores the role of active and passive internet measurements in facilitating this process. Active measurements involve introducing specific packets into the Internet network, providing insights into network performance and user interactions. Passive measurements involve monitoring existing traffic without altering it, offering detailed data on network usage and user behaviour. Through detailed case studies, this talk demonstrates the practical applications of these internet measurements in understanding the economic impacts of significant events, such as the COVID-19 pandemic, elections, and cyber attacks. For instance, it examines the changes in internet traffic patterns during the pandemic, revealing how different regions and demographics adjusted their online activities in response to lockdowns and remote work mandates. The analysis of behavioural changes following major cyber incidents, such as the 2016 DDoS attack on a DNS service provider, illustrates how consumer trust can be influenced by cybersecurity events.
These examples show the potential of internet-based data to inform economic decision-making, marketing strategies, and security measures. Policymakers can use traffic data to assess the effectiveness of public health interventions, while businesses can adjust their marketing strategies based on changes in consumer behaviour observed through internet measurements. Security measures can also be refined by understanding the patterns and impacts of cyber attacks on different business sectors on the internet.
This talk also discusses the mechanics of search engines and the utilisation of Google Alerts as data collection tools, showing their relevance for security decision-making. Google Alerts can help researchers monitor changes and patterns in real-time. These example prove the importance of integrating internet and service-based measurements into empirical research in economic decision-making. By leveraging these tools, economists and business leaders can gain deeper insights into market phenomena, leading to more informed and effective decision-making processes.

Mert Akay (TU Delft – Faculty of Industrial Design Engineering), Abhigyan Singh (TU Delft – Faculty of Industrial Design Engineering), Mapping the Public Participation in Climate Resilience Studies through Structured Topic Modeling

Climate resilience is crucial for equipping cities to handle climate-related challenges. It emphasises the critical need for public participation, i.e., engaging people and communities in processes addressing climate-related issues. Public participation presents a significant challenge to climate resilience, yet limited research offers a comprehensive overview of the relevant discussions on this topic. Existing studies employ conventional literature review methodologies, focusing on narrower perspectives. Therefore, gaining deeper insights requires exploring diverse thematic perspectives across time, disciplines, and context. Addressing this gap necessitates the use of novel and emerging methodological approaches. Hence, this presentation examines contemporary discussions on climate resilience through the lens of public participation by employing Structured Topic Modelling (STM) as a methodological approach. It addresses two interconnected questions: (1) What are the predominant themes, gaps, and challenges in current scholarly debates on climate resilience and public participation in urban settings? (2) How can STM be operationalised to identify these themes and gaps? As a novel perspective for literature review in climate resilience studies, the STM enables identifying and examining the fundamental themes, research gaps, trends, and discussions for public participation in climate resilience by analysing the bibliographic dataset of scientific articles from the Web of Science. In this context, the objectives of this presentation are twofold: (1) illustrate the result of STM analysis and reflect on the strengths and limitations of utilising STM as an emerging methodology for literature review (2) combine interdisciplinary perspectives on urban planning, climate resilience, and computational social science through STM, addressing the relevant conference theme of using computational methods for social science research. We believe that the presentation reveals the potential of STM in better understanding and communication of research findings by providing a better visual representation and ensuring a consistent and objective methodology for the literature review.

Anastasiya Alferova (University of Utrecht), Ethical / Legal Implications of Data Control and Privacy in the Digital Marketplace

Big tech companies’ (BTC) unprecedented access to vast amounts of user data has given them a significant competitive advantage, allowing them to consolidate market dominance, target advertising, and create barriers to entry for new competitors. This article examines the ethical and legal implications of data control and privacy in the digital marketplace, focusing on the intersection of competition and data protection laws.The dominance of BTC in the digital market raises a number of concerns, including privacy violations, unfair competitive practices, and potential violations of fundamental rights. This study examines how these companies use user data to strengthen their market position and the regulatory challenges this creates. The integration of advanced AI technologies such as GPT chat into Apple devices further exacerbates these challenges by introducing new aspects of data usage and competitive dynamics.Digitalization determines the dynamics of legislative norms lagging behind modern market conditions. Governments have considered various tools for adapting to such changes, for example, behavioral economics. The article takes an interdisciplinary approach, combining legal analysis, policy assessment, and ethical review. It includes a comprehensive literature review, case studies of major technology companies, and a comparative analysis of global regulatory strategies. The conclusions will include topics for further discussion on the subject, as well as recommendations aimed at improving the EU regulatory framework to more effectively address the ethical and legal issues associated with data control by large technology companies. The research aims to provide actionable ideas and innovative solutions that offer a balanced approach to protecting the public interest, promoting fair competition, and ensuring data privacy. The purpose of this work is to promote a greater understanding of the ethical and legal complexities of the digital age and stimulate debate about effective regulatory practices.

Lianne Bakkum (VU Amsterdam – Faculty of Behavioural and Movement Sciences), Carlo Schuengel (VU Amsterdam – Faculty of Behavioural and Movement Sciences), Age of entry into the Dutch child protection system of children of parents with intellectual disability: A case-control study

Background
Parents with intellectual disabilities (ID) face challenges in accessing care, meaning that risks to their children can go long undetected. However, heightened scrutiny and lower use of preventive supports may accelerate their entry into the child protection system. To sort out these conflicting expectations, we assessed children’s age at the first child protection measure, comparing parents labelled with and without ID. In addition, we compared the duration of the first measure, and the likelihood of having a sibling in child protection.
Methods
We used a case-control design with microdata from Statistics Netherlands. The population consisted of children in child protection (total N = 91,174; reporting years: 2015-2021). Using proximity score matching, children with at least one parent with ID (N = 4,526) were labelled based on indications for benefits, long term care, and/or sheltered employment (all based on ID), and were matched 1:1 with children with parents without ID, by socioeconomic status and having only one registered parent. Linear and logistic regression models were used for the analyses.
Results
Children with at least one parent labelled with ID were younger at the first child protection measure (Mdifference = 177 weeks, B = -176.76, SE = 5.57, p <.001), had longer child protection measures (Mdifference = 35 weeks, B = 34.68, SE = 4.46, p <.001), and more often had a sibling in child protection (OR = 1.28, SE = 0.04, p <.001), compared to control children.
Conclusions
The findings point to the direction of heightened scrutiny of parents labelled with ID and lower use and/or efectiveness of preventive interventions. The longer duration of child protection measures further supports heightened scrutiny and lower effectiveness of current interventions. This study is a starting point for further exploration of the representation of children with parents labelled with ID in child protection.

Mohammad Behbahani (Utrecht University – Faculty of Social Sciences), Mahdi Shafiee Kamalabad (Utrecht University – Faculty of Social Sciences), Emmeke Aarts (Utrecht University – Faculty of Social Sciences), Hidden state detection in Relational event history data: an extension of Hidden Markov Model for the Relational Event Model

Relational Event History (REH) data are interactions between actors in some way, over time. The Relational Event Model (REM) is a gold standard for analyzing REH data, allowing for the study of how social interactions evolve over time and which factors shape these social interactions. This model accounts for the dynamic patterns and dependencies between actors, parameterizing interaction rates based on both exogenous and endogenous statistics.
Social dynamics are inherently variable, and it is likely that these dynamics shift in response to changing often unobserved circumstances, leading to variations in the model’s parameters. However, traditional REM assumes a constant effect for each statistic throughout the observation period, which may not hold in all contexts. For example, in high-stress environments like surgery rooms, communication dynamics among surgeons may alter dramatically in response to emergencies. Addressing this, research has identified time zones where changepoints occur, enhancing the understanding of how communication behavior shifts instantaneously, such as Apollo 13 mission data.
Nonetheless, changepoint models have limitations; once a data segment is exited, it cannot be revisited. This is in contrast with scenarios like surgical teams, where communication may return to normal after an emergency. To overcome this, we propose integrating REM with Hidden Markov Models (HMMs). This extension, termed HMM-REM, allows for modeling the complexities of temporal relationships and dependencies in REH data more effectively. Thus, this new model helps researchers detect hidden states that influence interactions.
In this work, we apply the HMM-REM model to both synthetic and real-world data to demonstrate its functionality. We use synthetic data to illustrate the model’s capability to detect hidden states and explore social dynamics. This exploration aids in understanding how various entities adjust their interaction patterns dynamically in different hidden states, enhancing our comprehension of complex social interactions in varied situations.

Thales Bertaglia (Utrecht University – Faculty of Law, Economics and Governance), Catalina Goanta (Utrecht University – Faculty of Law, Economics and Governance), Adriana Iamnitchi (Maastricht University – Faculty of Science and Engineering), The Monetisation of Toxicity: Analysing YouTube Content Creators and Controversy-Driven Engagement

YouTube, one of the most popular social media platforms, remains understudied within computational social sciences. Content creators are central to YouTube’s ecosystem and significantly influence their followers, especially vulnerable individuals such as children. These creators often engage in controversial behaviour to generate engagement, despite negative consequences. This study investigates the relationship between monetisation, controversy, and toxicity on YouTube.
We conducted a quantitative analysis of controversial content, focusing on monetisation strategies, engagement patterns, and the prevalence of toxic comments. Using a curated dataset of controversial YouTubers sourced from Reddit, we classified channels into two categories: Consistent Controversy, frequently involved in scandals, and Spike Controversy, which experience temporary or recent surges in controversy. Our dataset included 20 channels, covering 16,349 videos and over 100 million comments.
We analysed video descriptions for monetisation cues to identify monetisation strategies, categorising 15,952 unique URLs linked to various revenue sources. We identified six primary monetisation models, with merchandise sales being the most prevalent. Notably, Spike Controversy Channels exhibited a higher intensity of monetisation efforts than Consistent Controversy Channels, suggesting a more aggressive approach to monetising their content.
To measure toxicity, we employed a Ridge regression model trained on a dataset of YouTube comments labelled for abusive language detection. Our analysis shows that while toxic content tends to generate higher engagement through increased comments, it negatively impacts the number of likes and monetisation cues. This result indicates that controversy-driven engagement does not necessarily translate to financial benefits for content creators.
Our study also uncovers self-moderation practices among YouTubers, where creators alter their content or marketing strategies to mitigate backlash. For example, a YouTuber might change URLs for controversial merchandise to redirect attention away from problematic products. These insights lay the groundwork for future research, particularly in understanding monetisation’s qualitative aspects and toxicity’s delayed effects on audience interaction.”

Marissa Bultman (Netherlands Court of Audit), Eline Smit (Netherlands Court of Audit), Evaluating the Effectiveness of the Wi2021 Integration Law in Accelerating Labor Market Participation of Asylum Status Holders in the Netherlands (Work in Progress)

Despite significant labor shortages in the Netherlands, the labor market participation of refugees with an asylum residence permit (i.e. status holders) lags far behind that of the general working population. Moreover, most status holders who do work, are employed with part-time and temporary contracts. As a consequence, they highly depend on social benefits and often live in poverty.
Status holders’ integration in Dutch society is regulated by the Integration Law, which was recently amended in 2021. Earlier research found the previous integration law (Wi2013) to hinder the labor market integration of asylum migrants due to the lack of a link between integration obligations and societal participation. The new integration law (Wi2021) aims to contribute to ‘quick and full participation in Dutch society, preferably through paid work’. The law seeks to achieve this by focusing on ‘duality’: the combination of societal participation and learning the Dutch language in a classroom setting. While the Wi2013 centralized integration, Wi2021 decentralizes integration by placing municipalities primarily in charge of organizing an integration program for status holders. Nonetheless, the minister of Social Affairs and Employment remains responsible for the policy and partially for its implementation.
This research aims to assess whether the implementation of the new law is likely to achieve its societal goal of contributing to ‘quick and full participation, preferably through paid work’. As part of this study, we seek to evaluate to what extent the Wi2021 is more effective in accelerating labor market participation of status holders compared to the Wi2013 law. To achieve this, we will analyze data from the Dutch Central Bureau of Statistics (CBS) Employing methods that estimate causal effects (such as interrupted time series or regression discontinuity), we will assess the likely effect of the Wi2021 implementation on employment rates among status holders in the Netherlands.

Elena Candellone (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences), Sofia Chelmi (University of Bologna), Community detection in bipartite signed networks is highly dependent on parameter choice

Decision-making processes often involve voting. Human interactions with exogenous entities such as legislations or products can be effectively modeled as two-mode (bipartite) signed networks-where people can either vote positively, negatively, or abstain from voting on the entities. Detecting communities in such networks could help us understand underlying properties: for example ideological camps or consumer preferences. While community detection is an established practice separately for bipartite and signed networks, it remains largely unexplored in the case of bipartite signed networks. In this paper, we systematically evaluate the efficacy of community detection methods on bipartite signed networks using a synthetic benchmark and real-world datasets. Our findings reveal that when no communities are present in the data, these methods often recover spurious communities. When communities are present, the algorithms exhibit promising performance, although their performance is highly susceptible to parameter choice. This indicates that researchers using community detection methods in the context of bipartite signed networks should not take the communities found at face value: it is essential to assess the robustness of parameter choices or perform domain-specific external validation.

María Ángeles Caraballo (University of Seville), Oksana Liashenko (University of Seville), Social attitudes do matter. A worldwide perspective

There is a vast literature showing a negative relationship between various categories of social diversity and quality of institutions and economic growth. The explanation underlying these results is that politicians and bureaucrats may have incentives to favor and/or receive favors from certain groups which induces an inefficient allocation of resources and, in turn, hinders economic growth. In addition, the conflict of interests between groups over, for instance, the consumption of a shared public good and the receipt of transfers, also negatively affects to quality of institutions and economic performance.
Our research is related to this strand of the literature. We focus on ideological diversity and, more precisely, on three of the hottest topics in the current political arena: gender, immigration and environment. The ideological position of the citizenship towards these issues is proxied through questions selected from the World Values Survey. The questions have been classified attending to three categories: attitudes, engagement and confidence. This has allowed us to distinguish several types of individuals according to their responses. By means of computational methods for social science research, we analyze the relationships between the different groups of individuals and quality of institutions and economic growth. To measure quality of institutions and economic growth, we have use data from the World Bank and BTI. We also consider that these relationships can be weakened or strengthened by some characteristic of the society such as political participation, trust and tolerance that can be inferred from selected questions from the World Value Survey.
Our analysis permits to identify the groups that have a most relevant impact on the quality of institutions and economic growth, and the shaping elements of the society that also influence this impact.

Giovanni Cassani (Tilburg University, Tilburg School of Humanities and Digital Sciences), Stijn Rotman (Tilburg University, Tilburg School of Humanities and Digital Sciences), Drew Hendrickson (Tilburg University, Tilburg School of Humanities and Digital Sciences), Boosting fertility predictions: a bottom-up, data-driven, cross-sectional Light Gradient Boosting Machine model for fertility prediction from survey data

In the context of the Predicting Fertility (PreFer) challenge, we implemented a bottom-up, data-driven, cross-sectional solution which scored third in the leaderboard on F1 (with the best precision score). Over multiple runs, we noticed that no variable before 2017 ever appeared among the most useful predictors in a Light Gradient Boosting Machine (LGBM) model trained on all available variables. The model’s F1 was consistently around 0.7 on the validation sets, which were generated based on a random 50% split of the unique households in the available data. We thus pruned features to exclude variables from earlier waves in the LISS panel than 2017 and refitted the model. Feature importance highlighted expected predictors, such as those pertaining to household income, fertility intentions, age, and marital status. We then considered the issue of missing data. Using only dichotomized predictors that indicate whether a response was missing, a LGBM model produced an F1 above 0.6, suggesting that missingness carries information for the task. We suspect this depends on the propensity of people to answer certain questions relating to children if they do not have (or plan on having) any. In addition to adding these binary features that indicate missingness per wave, we created new variables that collapse across year and keep the last known value ( if 2020 was missing but 2019 was not, the feature reflected the value of 2019). Finally, we added features pertaining to changes in income, housing, and marital status. The final model’s F1 was between 0.75 and 0.8 on the validation splits, , while the F1 on the competition hold-out set was 0.71. Altogether, our attempt suggests that bottom-up approaches can reach state-of-the-art performance in predicting fertility; variables related to fertility in the literature influenced predictions the most; imputing variables may mask informative non-response patterns in survey data.

Qian Chen (Tilburg University, Tilburg School of Social and Behavioural Sciences), Jonas Everaert (Tilburg University, Tilburg School of Social and Behavioural Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences), Measuring inflexible and biased interpretations using linguistic analyses to reveal pathways to depression and anxiety

Everyday life is full of ambiguous social situations. People need to interpret such situations to resolve this ambiguity and understand what is happening to them. Imagine receiving a stranger’s gaze while you are giving a talk at a conference. You may interpret this gaze as either positive admiration or negative dissatisfaction. How you interpret the situation may influence how you feel about your public performance. When interpretations are negatively biased and inflexible, they may even set the stage for severe mental health conditions such as depression and anxiety. Theorists and previous studies have implicated distorted interpretation processes (interpretation bias and inflexibility) in the onset, maintenance, and relapse of depression and anxiety. However, current work is limited because it employs questionnaires and cognitive-experimental laboratory tasks with limited ecological validity. This study, combining language analysis, opens new opportunities for understanding and treating depression and anxiety. A total of 210 college students will be invited to complete an online survey, which includes the Patient Health Questionnaire-9, Generalized Anxiety Disorder-7, Lack of Emotional Awareness and Lack of Emotional Clarity scales, the emotional variant of the Bias Against Disconfirmatory Evidence (BADE) task, and a revised open-ended Interpretation Bias Questionnaire (IBQ-R). Then, based on LIWC and other language markers of interest (e.g., first-person and emotional words), we will examine language features associated with interpretations, build a language-based interpretation model, and examine its predictive power for depression and anxiety. The knowledge generated by this project will not only deepen the theoretical understanding of depression and anxiety and their risk factors, but also facilitate the identification of treatment targets and, ultimately, better treatment response.

Juliette de Wit (University of Groningen, Faculty of Economics and Business), Maite Laméris (University of Groningen, Faculty of Economics and Business), Sjoerd Beugelsdijk (Darla Moore School of Business, University of South Carolina), National identification and voting behaviour

We study if and how identification with the nation state relate to individual’s voting behaviour. We theorize that a limited number of ideal types exist that best capture the sources of identification with the nation state, and as a result, people identify with the nation state in distinctive ways. Using a unique survey dataset of Dutch respondents, we identify three ideal types: the ethnic type, who identifies strongly with traditions, symbols, and history; the civic type, who identifies via civic liberties and religious freedom; and the indifferent type, who does not identify strongly with the nation state via any characteristic. We then suggest that the way in which individuals identify with the nation state translates into different political issues being salient to them, namely those issues that are salient to their identity. This, in turn, affects how individuals vote. Considering the three types of national identification (i.e., ethnic, civic, indifferent) and voting behaviour (i.e., turnout and party preferences), our results show that the types of national identification are related to turnout and party preferences in significantly distinctive ways. This is corroborated by findings on individuals’ positions on relevant political issues. We draw two main conclusions. First, it is not only the strength, but also the type of national identification that matters for voting. Second, there is a strong and persistent tension between the ethnic and civic types of national identification that suggests a clash of two normative world views.

Qixiang Fang (Utrecht University – Faculty of Social Sciences), Solichatus Zahroh (Utrecht University – Faculty of Social Sciences), Daniel Oberski (Utrecht University – Faculty of Social Sciences), Enhancing Human Values Prediction from Digital Trace Data through Measurement Knowledge Integration

Computational social scientists are increasingly interested in using digital trace data (e.g., social media posts, user logs) to measure/predict social science constructs like human values. This task is challenging due to inherent issues: “ground-truth” labels, which are typically based on crowd-workers or survey responses, often contain measurement errors, and the data is frequently high-dimensional relative to the number of observations. To address these challenges, we propose two solutions. First, we incorporate measurement models into the loss function of prediction models to mitigate measurement error. Second, we utilise an automatic selection method based on semantic similarity to handle high-dimensional data. Our contributions are validated using a dataset of human values measurements and individuals’ social media likes history. We also compare our methods against state-of-the-art approaches aimed at preventing models from learning spurious correlations including data augmentation and invariant learning.

Jack Fitzgerald (VU Amsterdam – School of Business and Economics), The Need for Equivalence Testing in Economics

Equivalence testing methods can provide statistically significant evidence that relationships are practically equal to zero. I demonstrate their necessity in a systematic reproduction of estimates defending 135 null claims made in 81 articles from top economics journals. 37-63% of these estimates cannot be significantly bounded beneath benchmark effect sizes. Though prediction platform data reveals that researchers find these equivalence testing `failure rates’ to be unacceptable, researchers actually expect unacceptably high failure rates, accurately predicting that failure rates exceed acceptable thresholds by around 23 percentage points. To obtain failure rates that researchers deem acceptable, one must contend that over 75% of published effect sizes in economics are practically equivalent to zero, implying that Type II error rates are likely quite high throughout economics. This paper provides economists with empirical justification, guidelines, and commands in Stata and R for conducting credible equivalence testing in future research.

Jérôme Francisco Conceicao (TU Delft – Faculty of Architecture and the Built Environment), Ana Petrović (TU Delft – Faculty of Architecture and the Built Environment), Maarten Van Ham (TU Delft – Faculty of Architecture and the Built Environment), The operationalisation of contextual poverty from individual and household perspectives: Does it matter how we measure income when estimating neighbourhood effects?

The ever-growing neighbourhood effect literature, analysing the effects of contextual poverty on individual outcomes, predominantly uses the personal incomes of neighbourhood residents to measure contextual poverty. However, using individual income is inconsistent with the literature on poverty measures, which focusses on the household income. The choice of using individual or household income could have an effect on the measure of income distribution, economic segregation and, consequently, the neighbourhood effect estimation. Constructing a poverty indicator based solely on the individuals’ income overlooks, for example, the financial composition of the household, a critical unit when evaluating the individuals’ purchasing power. Additionally, there is quite some variation between studies in the use of gross or net incomes, as well as methodological choices such as selecting the income of male residents. These variation makes it difficult to understand to what extent differences between studies are real differences in estimated neighbourhood effects, or differences caused by choices made by researchers and data availability. This paper investigates the sensitivity of the neighbourhood effect estimates to the operationalisation of contextual poverty. To what extent does selecting an income variable when creating a poverty indicator determine the modelling outcomes? We use longitudinal micro-data from Dutch population registers spanning from 2011 to 2020, which provides detailed income data at the individual and household levels. We consider four potentially influential factors when measuring poverty: income unit (individual vs household), income redistribution (gross vs disposable), household standardisation (equivalence factor), and sample selection (gender groups). We conduct a systematic analysis modelling the effect of each poverty indicator while keeping all other factors constant. The results of this study will shed light on how using individual or household income would lead to different estimations of neighbourhood effects.

Santiago Gómez-Echeverry (Statistics Netherlands (CBS)), Arnout van Delden (Statistics Netherlands (CBS)), Ton de Waal (Tilburg University, Tilburg School of Social and Behavioural Sciences), Modeling Total Error using Linked Survey and Administrative Data: A Simulation and an Application to the Italian Labor Market

The expansion of administrative and Big Data and the increase in the survey’s non-responses have highlighted the relevance of assessing the quality of non-probability samples. To tackle this issue, people usually resort to the Total Error (TE) framework, which divides the error into a measurement and a representation component. Extensive literature focuses on measurement error, often using a combination of data from different sources to evaluate whether the observed variables adequately capture the concept intended to be measured. Another branch of the literature has centered on the representation error, assessing how respondents are selected in the sample, leading to systematic differences between the population and the observed units. However, research modeling both of these components simultaneously is still scant. In the present study, we address this gap by jointly modeling the measurement and the representation errors, combining recent advances in both areas. We conducted a simulation study to evaluate our TE model under different specifications of measurement and representative errors. Additionally, we performed a case study analysis using a combination of Italian administrative registers and the Labor Force Survey (LFS) to evaluate the total error in the income variable. Our preliminary results show that our model adequately captures the different error sources and provides a good strategy for assessing the TE when using a combination of probability and non-probability data.

Anirudh Govind (KU Leuven, Belgium), Ate Poorthuis (KU Leuven, Belgium), Ben Derudder (KU Leuven & Ghent University, Belgium), Cocooning? A multi-scalar analysis of the determinants of persistent activity space segregation

Recently, ethnic segregation studies have looked beyond residential neighbourhoods to include the set of locations people visit during their daily activities (i.e., activity spaces). Such work has suggested that segregation may be experienced and deliberately (re)produced across locations — an idea researchers have likened to people seeking the safety of cocoons. Currently, it remains unclear if this (re)production of segregation is an outcome of urban structure, i.e., the relative distribution of people and activities, or deliberate, i.e., people seeking cocoons. That is, are people making trips only to locations physically accessible from their residential neighbourhoods? Or, are people being more discerning and visiting subsets of accessible locations corresponding to their desired experiences of segregation?

We investigate using the case of Rotterdam with data from the Dutch Central Bureau of Statistics (CBS). For each residential neighbourhood, we determine three ethnic activity space (AS) segregation values, at multiple scales, up to ten kilometers. These segregation values are based on

– the total population accessible based on travel along the extant street network (the maximal theoretical AS),
– the population people come into contact with based on their actual AS (the experienced), and,
– the hypothetical population determined by randomizing people’s AS (the counterfactual).
Minimal differences between the first two values would indicate that segregation is an outcome of urban structure.
However, large differences would suggest the influence of people’s choices and necessitate a comparison of the latter two values to determine cocooning (i.e., counterfactual > experienced = cocooning).
Initial findings suggest that segregation is an outcome of people’s choices rather than urban structure. However, such segregation cannot always be categorized as cocooning and is dependent on the neighbourhood under consideration and the scale of analysis.”

Andrea Gradassi (University of Amsterdam – Faculty of Social and Behavioural Sciences), Scarlett Slagter (University of Amsterdam – Faculty of Social and Behavioural Sciences), Lucas Molleman (University of Amsterdam – Faculty of Social and Behavioural Sciences), Social influence of high-status peers in adolescents social networks

Dispositions for prosociality undergo major changes during adolescence, a period of increased sensitivity to peer influence and incipient internalization of societal norms. However, the proximate mechanisms for the development of prosocial preferences are poorly understood. Here, we show that high-status peers affect adolescents’ prosocial decision making. Participants repeatedly chose to either donate money to a charity or keep it for themselves and could revise their decision upon observing the (opposite) decisions of either a high-status or low-status peer from their classroom. Participants tended to conform to peer behavior (both generous and selfish), often reversing their initial preference. This pattern was especially strong when observing a high-status peer. Our findings suggest that high-status peers act as important signalers of prosocial norms and can be instrumental for the diffusion of prosocial behaviour. By using an incentivized task in a naturalistic setting and extending the experimental work with computer simulations, we bring evidence for the role of real world (high-status) peers in the development of prosocial preferences, and provide a potential path for interventions aimed at spreading cooperative norms

Rolf Granholm (University of Groningen – Faculty of Behavioural and Social Sciences), Anne Gauthier (Netherlands Interdisciplinary Demographic Institute (NIDI)), Gert Stulp (University of Groningen – Faculty of Behavioural and Social Sciences), Measuring the relative importance of fertility determinants for recent birth cohorts across 7-19 countries with GGS II data using microsimulation

While demographic research has uncovered a wide range of determinants of fertility outcomes, it has been difficult to estimate the relative importance of these determinants. This is because the fertility process is complex, and many of the determinants of fertility outcomes are interdependent. It is therefore difficult to estimate independent effects of these determinants with statistical techniques commonly used in fertility research, like event history analysis and life table approaches. Another shortcoming in fertility studies is that the physiological constraints on human fertility have rarely been explicitly modelled, despite the fact that they are the most proximate determinants of fertility outcomes, and now more relevant than ever with increasing mean ages at first birth. We address these issues with a microsimulation model we developed that reproduces the entire reproductive life courses of individual women. The model includes information on reproductive physiology, reproductive behaviour, educational attainment, and union events. The reproductive physiology part of the model is based on modelling work by Henri Leridon and clinical data. For the behavioural part of the model, we use data mainly from the Generations and Gender Survey II. We apply our simulation model to a wide range of countries. By doing this we can not only uncover the most important determinants of fertility outcomes within each country, but we can also compare determinants between countries. What makes our approach different from statistical and life table approaches is that we explicitly model the mechanisms of the fertility process based on empirical data and fertility theory, and treat fertility as a process over the entire reproductive life course of a woman.

Christian Olesen (University of Amsterdam – Faculty of Humanities), Isadora Paiva (University of Amsterdam – Faculty of Humanities), The CLARIAH Media Suite: An introduction to qualitative media analysis using automatic data enrichments and annotation

The Media Suite is one of the research environments developed within the Dutch CLARIAH research infrastructure. As an innovative digital research environment, the Media Suite is a networked, university-level access point to a large variety of digital collections – comprising key broadcast, film, paper and oral history collections from NISV, Eye Filmmuseum, the KB and DANS. Moreover the environment offers new ways of browsing, searching and analyzing the collections made available with digital tools developed specifically for the environment. The environment’s tools facilitate among other approaches exploratory research and browsing, close reading and qualitative analysis based on video annotation tools as well as data-driven modes of distant reading and visualization of collection data. Beyond digitized collection items – amounting to a couple of millions of audiovisual items – the environment is also the unique access point to automatic data enrichments, such as automatic speech recognition (ASR) data of broadcast collections and optical character recognition (OCR) of historical paper collections. In addition to introducing The Media Suite, this presentation will discuss the steps of creating, annotating and analyzing a corpus of materials from different collections available in the environment by discussing 1) the scope and aims of the Media Suite environment as a tool for distant and close reading of multimedia items, and 2) how to create a corpus of items from several collections in the environment and explore and annotate them.

Jiri Kaan (Wageningen University & Research – Wageningen Social Science Group), Yara Khaluf (Wageningen University & Research – Wageningen Social Science Group), Kristina Thompson (Wageningen University & Research – Wageningen Social Science Group), An Agent-Based Model Comparing Two Reward Learning Algorithms In Dynamic Food Environments

Food environments are riddled with ultra-processed foods and cues that encourage overeating. Repeated consumption of these foods, coupled with exposure to their cues, can condition people to eat beyond their metabolic needs at the sight or smell of them. Although conditioning is not the only pathway that influences eating behavior, understanding how conditioning affects eating behavior is critical when simulating policies intended to tackle the obesogenic environment. Although well-validated reward learning algorithms exist to formalize conditioning, direct comparisons of these algorithms in food environment models are lacking. This gap is significant because comparing these algorithms under similar conditions is essential to understanding the role of the food environment in eating behavior and designing effective public health policies. In this study, we replicate a previous agent-based model that used Temporal Difference Learning (TD) as reward learning algorithm. We then extend the agent-based model by incorporating the Rescorla-Wagner (RW) model, which is beneficial for scenarios involving immediate associations, such as classical conditioning, where understanding the strength of an association in response to immediate outcomes is crucial. Our replication and extension result in a similar “lock-in” effect, in which early exposure to a food environment dominated by ultra-processed foods leads to a non-trivial preference for these foods. To further enhance our understanding of learning dynamics in varying food environments, we introduce two additional model extensions: a generalized RW model and a two-phased RW model. These extensions significantly alter learning trajectories, providing a more comprehensive understanding of how different learning algorithms respond to changes in food environments. Overall, our results underscore the importance of a health-promoting food environment, as preferences for ultra-processed foods appear robust across all models.

Jiri Kaan (Wageningen University & Research – Wageningen Social Science Group), Kristina Thompson (Wageningen University & Research – Wageningen Social Science Group), Yara Khaluf (Wageningen University & Research – Wageningen Social Science Group), Agent-Based Models Of Social Network Interventions Promoting Health And Well-being: A Systematic Review

Social networks are complex adaptive systems characterized by dynamic processes that feedback into one another, influencing health and well-being in a nonlinear manner. However, most health interventions focus solely on individuals and do not leverage these social network processes. Social network interventions, which seek to use or modify the characteristics of social networks to improve the effectiveness of health interventions, have yielded promising results. Yet, they often lack designs that fully estimate the impact of these networks. The complexity and dynamic nature of social networks pose challenges that traditional methodologies may not fully address. Computational approaches, such as agent-based modeling, offer powerful tools to estimate the impact of social networks in interventions. These tools can potentially be used by policymakers and health practitioners to forecast various scenarios for social network interventions. Consequently, agent-based models are increasingly employed to test the effectiveness of social network interventions under different circumstances. Despite their growing use, there is currently no comprehensive overview of social network interventions tested with the aid of agent-based models and their performance in various contexts. Therefore, we will conduct a systematic literature review to address this knowledge gap. The review will adhere to PRISMA-S guidelines. We searched the Scopus, Web of Science, and PubMed databases for papers on agent-based models simulating social network interventions to enhance the effectiveness of health interventions. Our search identified 1,282 papers with 16 papers remaining after exclusion. Data will be extracted to determine which type of social network intervention was simulated, how it performed, and in what health and well-being context. Furthermore, special attention will be paid to the theories and processes that underpin the agent-based models. Overall, we will provide understanding of how agent-based models have been utilized in social network interventions, thereby guiding future research and health interventions.

Pim Kastelein (Netherlands Bureau for Economic Policy Analysis), Brinn Hekkelman (Netherlands Bureau for Economic Policy Analysis), Suzanne Vissers (CPB), Predicting Persistence of Labor and Health Shocks

Rishabh Kaushal (Maastricht University – Faculty of Science and Engineering), Adriana Iamnitchi (Maastricht University – Faculty of Science and Engineering), Nicole Kilk (Maastricht University – Faculty of Science and Engineering), The Needle in the Haystack: What Platforms Report to the DSA Transparency Database When They Don’t Have To

To promote transparency, the European Union has adopted the Digital Services Act (DSA), which requires that very large online platforms (VLOPs) share their content moderation decisions with meaningful related information, referred to as Statement of Reasons (SoRs), to the DSA Transparency Database maintained by the EU. This database contains daily data dumps with all SoRs from all platforms. A dashboard with basic data analysis visualizations is also provided and includes an advanced search option. However, this search tool returns only a limited number of SoRs, and it can only be performed based on mandatory fields of SoRs. Therefore, the free-form text fields in the SoRs may carry important information that is missed by the advanced search tools.
In this work we analyze what platforms report in optional fields. We make three contributions. First, we propose an automated approach to retrieve SoRs on a large scale based on user-supplied keywords that are searched in free text optional fields. Second, we perform text analysis to study frequent words and phrases used by platforms in the explanations of their content moderation decisions. Third, we verify that platforms use specific terminology that uniquely identifies them.
Our results contribute to a better understanding of the moderation decisions that are not easily visible in the Transparency Database dashboard.

Paul Keuren (UU-FSW and CBS), Marc Ponsen (Statistics Netherlands (CBS)), Ayoub Bagheri (Utrecht University – Faculty of Social Sciences), Expert Embedding Alignment

In this research, we look into the possibility of measuring the alignment between two expert-created Knowledge Systems and multiple different embeddings. For both the contained Thesaurus and Taxonomy, various metrics are defined and applied in conjunction with other classical and state-of-the-art methods. On top of this, we fine-tune a state-of-the-art model with information from the knowledge system to find whether this will improve the final performance.
To determine the validity of the results, we use both a dimensional reduction, as well as a plot where the retrieval chance is offset by the number of retrieved items. We found that state-of-the-art methods might not outperform classical methods depending on the number of items retrieved. That fine-tuning an embedding on a knowledge structure, does not yield a better-performing network. And that the best-performing embeddings, do not show agreement with the expert. Finally, we conclude that the applied metrics do not indicate an alignment between the expert and the embedding.

Saurabh Khanna (University of Amsterdam – Faculty of Social and Behavioural Sciences), Knowing Unknowns in an Age of Incomplete Information

The technological revolution of the Internet has digitized the social, economic, political, and cultural activities of billions of humans. While researchers have been paying due attention to concerns of misinformation and bias, these obscure a much less researched and equally insidious problem — that of uncritically consuming incomplete information. The problem of incomplete information consumption stems from the very nature of explicitly ranked information on digital platforms, where our limited mental capacities leave us with little choice but to consume the tip of a pre-ranked information iceberg. This study makes two chief contributions. First, I leverage the context of Internet search to propose a novel metric quantifying ‘information completeness’, i.e. how much of the information spectrum do we see, when browsing the Internet. I then validate this metric using 6.5 trillion search results extracted from daily search trends across 48 nations for one year. Second, I find causal evidence that awareness of information completeness while browsing the Internet reduces resistance to factual information, hence paving the way towards an open-minded and tolerant mindset.

Jonas Klingwort (Statistics Netherlands (CBS)), Yvonne Gootzen (Statistics Netherlands (CBS)), Daniëlle Remmerswaal (Utrecht University – Faculty of Social Sciences), Validating a smart survey travel app: how do GPS measurements compare to reported behavior?

Smart surveys combine passive data collection by the device sensors (e.g., accelerometer, GPS) with (inter)active data provided by the respondent (e.g., response to prompts based on the passively collected data). The interest in such smart surveys in official statistics is increasing because traditional diary surveys, such as travel surveys, are burdensome for respondents and suffer from measurement errors (e.g., underreporting and recall errors). In recent years, a substantial amount of research has been conducted into the feasibility of smart travel surveys. However, more attention must be paid to validating respondents’ measurements and provided information in smart surveys. Such empirical validation studies and their results are crucial for a deeper understanding of the data and the quality of the data collected by smartphones. We present such a validation study based on a large-scale experiment using data from the general population and conducted by Statistics Netherlands. The data are collected in 2022-2023 and include respondents for whom both app data and responses from a web questionnaire are available for the identical reporting period. The data from the web questionnaire are used to validate the app data in combination with the applied algorithms. Two target variables are considered: first, a binary variable of whether the respondent is stationary (stop) or moving (track). Second, a categorical variable regarding the mode(s) of transport used during a track. The results of this comprehensive analysis yield important conclusions about how similar app measurements and web responses are. In addition, we will report on the effects of respondent interaction and the extent to which these interactions influence the app data quality. Furthermore, we will shed light on lessons learned from this pioneering smart survey validation study, what to consider in such validation studies, and what recommendations we derive for future validation studies of smart (travel) surveys.

Pim Koopmans (Leiden University – Faculty of Law), Max van Lent (Leiden University – Faculty of Law), Marike Knoef (Tilburg University, Tilburg School of Economics and Management), The Impact of Retirement on Household Finances: Causal Evidence from Transaction Data

This paper contributes to the literature that studies the impact of retirement on household finances and financial behavior, often using survey or yearly administrative data. We use high-quality Dutch transaction data to estimate the causal effect of retirement on households’ financial outcomes. We use the discontinuity imposed by Statutory Retirement Age (SRA) and variation in the SRA in order to measure causal effects. The monthly data allow us to estimate the direct short-run impact using RD and DiD designs. Our findings show a positive spike in net flow balance at retirement, which financially constrained households use to pay off debts. Debts decline especially for low income, low wealth, blue collar workers, and social insurance recipients. In addition, we see a gradual increase in the end of month balance over time, that is not directly caused by retirement itself.

Sander Kraaij (University of Cologne), Jan Kabátek (University of Melbourne), Sacha Kapoor (Erasmus University Rotterdam, School of Economics), Systemic Discrimination in Firing

Do firms discriminate against women and ethnic minorities in firing decisions? Why? We investigate these questions using administrative data from the Netherlands and discontinuous increases in minimum wages at the birthdays of youth workers. Age-wage increases are orthogonal, therefore, to other worker characteristics that may be relevant for firing decisions, including their race, gender, or productivity. We leverage the orthogonality of these wage increases to measure the willingness of firms to pay (WTP) to retain workers of different ethnicities and genders. We identify WTP distributions across large firms to identify firms with extremely low or high WTP for disadvantaged workers. We find that the market is willing to pay less to retain workers from certain ethnic groups compared to natives, but willing to pay more to retain those from certain other groups. We find no gender differences in WTP. We also find significant dispersion across firms in the WTP for migrant relative to native workers. We consider various mechanisms that can explain our results, including heterogeneous hiring standards across ethnic groups, learning, discriminatory preferences of managers and coworkers, and costly coordination among decision-makers. Our approach lets regulators identify extreme discriminators in the market and enables firms themselves to identify extreme discriminators within the firm using observational data.

Pradeep Kumar (Centerdata), Joris Mulder (Centerdata), Investigating the Impact of Survey Methodologies on Predictive Accuracy in Time Use Modeling

Gender differences in time use are a critical factor distinguishing the lives of men and women, particularly in developing countries. Research consistently shows that men tend to allocate more time to market activities or productive labor, while women spend a disproportionate amount of time on unpaid care and domestic work, often referred to as reproductive labor (Ilahi, 2000; Rubiano-Matulevich and Viollaz, 2019). This imbalance creates challenges for women who choose to participate in the labor market, forcing them to balance personal and family responsibilities.
The primary source of time use data comes from time use surveys or multi-purpose surveys that include a time use module. These surveys generally ask respondents to recall their activities over the previous 24 hours (Fisher, 2015). However, one major issue with this method is recall bias, where respondents may not accurately remember their activities.
An innovative approach to measuring time use involves imputing activity-specific time allocations from physical activity data collected via accelerometers worn by respondents. Our earlier research in Malawi used machine learning techniques and a unique socioeconomic survey conducted in 2017.
In a follow-up experiment in Malawi, 720 households and 1,440 respondents were divided between two survey methods: (1) a 24-hour recall time use diary and (2) a real-time smartphone Time Tracker App (Daum et al., 2019). Respondents were balanced by gender and urban/rural residence. Each participant wore a research-grade physical activity tracker (ActiGraph wGT3X-BT) for nine days, and their height and weight were recorded. Participants were visited three times during data collection, with the first group completing the 24-hour recall module three times and the second group using the Time Tracker App throughout.
This presentation examines whether the effectiveness of machine learning models depends on whether they are trained on 24-hour recall data versus real-time app diary data. It also explores the results and treatment effects from both survey methods in reporting time use activities.

Angel Lazaro (Wageningen University & Research – Wageningen Social Science Group), Roger Cremades (Wageningen University & Research – Wageningen Social Science Group), Eveline van Leeuwen (Wageningen University & Research – Wageningen Social Science Group), Steering sustainable food systems: the complex co-evolution of consumer preferences, sustainable restaurants, and policymaking

The transition towards more sustainable food systems is essential for achieving climate targets and the Sustainable Development Goals (SDGs). Achieving goal 12, responsible consumption and production, in particular, will require changes in consumer behavior, the adoption of sustainable practices by suppliers, and, most importantly, the implementation of policies that foster such sustainable behaviors. Whether it is the availability of supply or the increasing demand that drives sustainability transient consumption is often discussed in economic circles. From the complex systems perspective, it is their interaction that matters. This article explores how consumer preferences and restaurant menus co-evolve to contribute to a more sustainable food system and how this co-evolution is affected by different policy instruments. We use a spatially explicit agent-based model of the catering industry in Amsterdam as a case study. The model is built using spatial microsimulation to expand on survey data from a discrete choice experiment about restaurants in Amsterdam. We observe that in the absence of policy interventions, it takes quite a lot of time for changes in diet patterns to occur. Multiple cities are working on their sustainability. Based on the generated insights, we expect our research to contribute to the current debate on the policy interventions to achieve sustainable urban food systems and the SDGs, and to be of consequence across multiple cities worldwide

Maël Lecoursonnais (Linköping University – Institute for Analytical Sociology), Selcan Mutgan (Linköping University – Institute for Analytical Sociology), Life-Course Trajectories of Experienced Segregation

Motivation
Dynamics and consequences of segregation are commonly studied within isolated domains (e.g., neighborhoods or schools) and use point-in-time exposure measures. Recent advancements emphasize that these domains (1) are likely to influence each other and (2) are expected to have both independent and multiplicative effects on socioeconomic outcomes. Given that individuals with distinct socioeconomic backgrounds experience everyday life differently, it is difficult to infer the implications of diverse experiences in separate domains from isolated contexts.
Objective, data and methods
In this study, we investigate how various facets of segregation collectively shape life-course trajectories of individuals. Drawing on Swedish administrative data, we track cohorts of primary school students into adulthood over almost 30 years. We use individual-level, annual information on classmates during education years, colleagues at the workplace and nearby neighbors. We focus our empirical analyses on three domains: the neighborhood, the school, and the workplace as they make up the majority of individuals’ time use. For each individual-year, we measure exposure to the top and bottom 20% of the income distribution. The combination of these six measures constitutes what we term experienced segregation.
Results
Preliminary results suggest several stylized facts on experienced segregation over the life-course. First, segregation is higher among affluent groups than disadvantaged ones, with greater concentration of the affluent groups. Second, we observe strong positive correlations in exposure to affluence and poverty between neighborhood and school, but lower associations at workplace and university. Third, exposure levels follow similar patterns over the life-course across socioeconomic groups. Finally, levels of exposure follow a socioeconomic gradient, with students from the highest income quintile consistently more exposed to affluence and less to poverty across all activity domains throughout their lives.

Yue Li (Twente University – Faculty of Behavioural, Management and Social Sciences), Marcello A. Gómez-Maureira (Twente University – Faculty of Electrical Engineering, Mathematics and Computer Science), Stéphanie van den Berg (Twente University – Faculty of Behavioural, Management and Social Sciences), Generalizing Behavior Prediction Models from VR Experiences

Virtual Reality (VR) has become a prevalent tool for presenting stimuli in social psychology research. Although the human behaviors in VR environment (VRE) is widely explored, there remains a gap in capturing and modelling the behavioral outcomes in VRE and explore how these indications affect behavior in real-life. To address this, we propose an approach aimed at predicting behavior in Virtual Reality experiments, which will allow us to assess the impact of tasks on prosocial behaviors. In this study, we will make use of experiments that systematically manipulate interaction levels and sensory settings within VR environments. Using complementary measures, including physiological responses, interaction, position, and stimulus properties, this multi-faceted analysis approach is designed to enable us to construct an advanced statistical model that predicts participants’ behavior based on their experience in VRE’s. The resulting behavior prediction models will provide valuable insights into the mechanisms driving prosocial behavior and demonstrate the utilization of VR as a robust tool for behavioral research. This research is aimed to advance our understanding of how immersive VR experiences can influence human behavior, offering empirical evidence to support the development of reliable, generalizable behavior prediction models. Furthermore, it has potential to enable innovations in social psychology, contributing to the potential of VR to generate deep insights into human behavior.

Angelica Maria Maineri (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Laura Boeschoten (Utrecht University – Faculty of Social Sciences), Niek de Schipper (University of Amsterdam – Faculty of Social and Behavioural Sciences), Claartje ter Hoeven (Erasmus University Rotterdam, School of Social and Behavioural Sciences), The impact of constant connectivity on employees’ well-being: a data donation pilot study

In the knowledge economy, the use of digital platforms, e.g., Slack, to connect employers among each other is extensive, to the point that thousands of teams worldwide employ them. Next to the advantages of such platforms, which enable to connect people who work remotely and also to carve out space for entertainment and release, the inherent danger of these tools is that they blur the boundaries between work and personal life. This constant connectivity, defined as a constant availability to the work organisation beyond working hours, can be detrimental for worker’s well-being because it lowers psychological detachment, e.g., taking (mental) breaks from work, which is important for employees to restore energy (Büchler et al., 2020 [1]). The detrimental effect could be attenuated by a preference for integration (i.e., when individuals like to have blurry boundaries between work and private life) instead of segmentation (i.e., when individuals prefer to keep work and private life separate).
In this study, we replicate part of Buchler et al. 2020’s findings using novel data. We investigate whether constant connectivity has a detrimental effect on employees’ well-being mediated by psychological detachment, and we also investigate whether this effect is stronger for segmenters (vs. integrators). To answer our question, we use a combination of survey data and actual Slack access logs, acquired via data donation using the D3I infrastructure, which allows us to quantify how much individual employees connect to work via Slack outside of their working hours. In this way, we further improve on the original study by being able to compare self-reported to actual constant connectivity. The use of digital trace data to measure constant connectivity in this context may allow us to better quantify and understand an important phenomenon of nowadays’ work environments, whereby self-reported measures may be inaccurate due to recall bias.

Gabriele Mari (Erasmus University Rotterdam – School of Social and Behavioural Sciences), Emanuele Fedeli (University of Milan “La Statale”), Child Penalties and Public Childcare Provisions Under Fiscal Austerity

Child penalties affecting women’s earnings are relatively large in Italy and contribute to gender pay gaps worldwide. Evidence on the effectiveness of public policies in mitigating child penalties is mixed. We revisit the question of whether, how, and for whom child penalties might be influenced by public childcare provisions. Whilst previous studies have studied expansions, we focus on the effects of constraints limiting these provisions in times of fiscal austerity.
Specifically, we examine public spending restrictions enforced since 2001 by the Domestic Stability Pact (DSP) for Italian municipalities above 5,000 inhabitants. In a fuzzy regression discontinuity (FRD) design, we leverage the DSP-induced discontinuity as an instrument for the supply of public childcare. The design allows us to tease out the causal effects of a more limited supply of public childcare.
We systematised more than a decade of data from municipal budgets with rich information on public childcare provisions. We find a sizeable decrease in public childcare supply under the fiscal austerity regime dictated by the DSP. The decrease is driven by total childcare spots and accepted applications rather than number of centres, educators, and total applications. However, when combining municipal data with social security records on payslips and employment histories for over 400,000 women over the period 2001-2015, our FRD results suggest little influence of austerity-related limits in public childcare provisions on women’s average earnings after childbirth. In the next month, we will complete our paper with – among others – an analysis of disparities depending on women’s socioeconomic status.

Jordy Meekes (Leiden University – Faculty of Law), Maddalena Ronchi (Northwestern University), Mind the Cap: The Effects of Regulating Bankers’ Pay

In this paper we investigate how restrictions to the possibility of paying large bonuses affect employees’ pay schemes and firms’ ability to attract and retain workers. Using administrative data from Statistics Netherlands, we study the Dutch bonus cap that sets a 20% limit to the ratio of variable to fixed pay for all workers employed in the banking industry. Comparing banks to financial institutions not covered by the regulation, results based on a dynamic differences-in-differences specification show that treated employees experience a sharp drop in variable pay which is not fully compensated by an increase in fixed pay, especially for talented workers. We also study whether the policy affects banks’ ability to attract and retain talented workers.

Adrienne Mendrik (Eyra), Emiel van der Veen (Eyra), Jeroen Vloothuis (Eyra), The Next platform: What do data donation, benchmark challenges and participant recruitment have in common?

Data donation, benchmark challenges, and participant recruitment are all software services available on the Next web platform. They are developed by Eyra, co-created with social sciences and humanities researchers from various universities and co-funded by ODISSEI (https://www.eyra.co/software-services). They are integrated into the Next web platform and share reusable modules within the open source Next mono codebase (https://github.com/eyra/mono). The Next web platform functions as an online operating system. In other words, researchers sign in and can readily use the available software services. Like Apps on an operating system, software services can be made available to some or all Next platform users. They are integrated into the project structure on the Next platform as templates. Researchers simply create a new project, choose a project item, such as data donation or benchmark challenge and they can immediately start filling out the required information to publish their study or challenge online. By using the Next web platform you contribute to the maintenance of the sustainable software ecosystem. Each software service benefits from and contributes to the collaborative ecosystem. This collaborative approach not only provides resource efficient maintenance but also enhances efficiency in software development for academic research. On top of this, the Next platform also functions as an integration hub for third party software services, such as services from SURF, Centerdata, and Qualtrics. These services are connected to the Next platform enabling interoperability. For instance, performing data donation studies within the LISS panel (through integration with Centerdata), using university credentials to sign in on the Next platform (through SURFconext), transferring donated data to SURF research cloud for analyses and having participants fill out a Qualtrics questionnaire. In this presentation we will demo the Next platform, talk about future plans and welcome new ideas for valuable software services on Next.

Mohsen Monji (Concordia University, Montreal, Canada), Machine Learning Approaches for Exploring the Social Determinants of Mental Health in Canada: Findings from the Canadian Community Health Survey (2017-2018)

In recent years, there has been growing concern about a rise in mental health problems in Canada, with reports showing an increase in the prevalence of anxiety and depression in Canadian society. However, these mental health problems are not proportionately distributed across the population and significant disparities in mental health outcomes exist among sub-populations. Among the key contributors to these disparities are social determinants, which are the conditions in which individuals are born, grow, live, work, and age. These include factors such as age, gender, race, socio-economic status, housing conditions, and food security, which influence individuals’ access to resources and opportunities, leading to inequalities in mental health. To reduce population mental health disparities, it is crucial to have a comprehensive understanding of the social factors shaping inequalities in mental health outcomes.
Existing research on the social determinants of mental health in Canada mainly relies on traditional statistical analyses, with limited application of machine learning approaches, especially in sociological research on mental health.This study aims to bridge this gap by developing and comparing several machines learning models, including logistic regression, SVM, and decision trees (random forests) to analyze the social determinants of mental health in Canada. Using data from the nationally representative Canadian Community Health Survey (2017-2018, N=130,000), this study examines the application of machine learning models to predict self-rated mental health and psychological distress and to identify social drivers of mental health outcomes in Canada. By using machine learning approaches to study the social determinants of mental health, this study contributes to both computational sociological research and broader research on population mental health. The findings of this study inform data-driven and evidence-based policies, promoting more inclusive and targeted strategies for enhancing mental well-being in Canada at the population level.

Tamara Mtsentlintze (Utrecht University – Faculty of Science), Esmee Dekker (Utrecht University – Faculty of Science), Damion Verboom (Utrecht University – Faculty of Science), European Value Maps: can data visualizations contribute to increased tolerance in society?

One of the most complicated challenges of our time is deeply rooted polarization in our society. To tackle the different crises (e.g., climate, migration) that we are facing today, it is important for people to work together. However, the deep social, political, and economic divides lead us to distrust each other, which only increases conflict. Social media filter bubbles, selected exposure and “echo chambers” further contribute to this polarization and especially the perception of polarization. Namely, one important distinction is the difference between the actual and the perceived polarization in society. To build bridges in our society, we need people to become more open minded to the opinions of others. We therefore propose an intervention to reduce the perception of polarization and thereby increasing the tolerance to opposite opinions on central societal and political topics.
This intervention is based on European Value Maps (EVM), a visual representation of data collected within European Value Study. The maps visualize opinions of people on crucial societal topics such as gender equality, environment protection, trust to societal and governmental organisations and moral beliefs. EVMs allow to: -observe the sheer diversity of opinions held by the participants of the survey, -interact with the maps to explore the underlying data, -locate themselves on the maps. EVM are built on the hypothesis that showing the actual distribution of opinions to people, leads observers to think that the world is not black and white, but instead exists out of a variety of different gradients of opinions. This may consequently lower the threshold to start the conversation with someone with a (slightly) different opinion and eventually become more open minded. In this talk, we will present the project EVM and the results of the initial user studies to assess the effectiveness of the maps.

Vittorio Nespeca (TU Delft – Faculty of Technology Policy and Management ), Tina Comes (Tu Delft – Faculty of Technlogy, Policy and Management), Frances Brazier (TU Delft – Faculty of Technlogy, Policy and Management), Learning to select information exchange hubs: Capturing the emergence of boundary spanning in volatile conditions

To be resilient, different formal and informal groups, such as governmental agencies, NGOs, and communities, need to coordinate effectively when responding to crises. Crucial to this coordination is the exchange of information across these groups, particularly in the volatile settings typical of crisis response. Informational Boundary Spanners (IBSs) serve as promising information exchange ‘hubs’ to facilitate this inter-group communication. However, the current understanding of the mechanisms leading to the emergence of IBSs remains limited. First, a metric is necessary to quantitatively analyze the emergence of IBSs, yet such a metric is currently unavailable. Second, while a potential mechanism for IBS emergence is the ability to learn who provides high-quality information, this mechanism has not been systematically tested. This study advances crisis resilience by providing key components for measuring and understanding the emergence of IBSs. It introduces a novel metric to identify emergent IBSs and uses this metric to investigate the role of learning as a fundamental mechanism for IBS emergence in volatile environments. The metric and the learning mechanism are formalized using an Agent-Based Model. A case study on information sharing in a disaster scenario demonstrates the metric’s validity and confirms that learning is indeed a mechanism for effective IBS emergence in high-volatility settings. Such emergence is, however, contingent upon sufficient inter-group connections and stable information sources. This study aims to establish a foundation for exploring the mechanisms underlying IBS emergence, thereby enhancing inter-group information exchange and supporting crisis resilience.

Samin Nikkhah Bahrami (Utrecht University – Faculty of Law, Economics and Governance), Karlijn Morsink (Utrecht University – Faculty of Law, Economics and Governance), Chris Barret (Cornell University- Department of Economics ), On targeting; Predictors of Expected Consumer Welfare from Catastrophic Drought Insurance

Financial products are increasingly complex, and consumers often struggle to make financial decisions that enhance their welfare. The scope to improve these decisions through additional information provision and changing choice architectures of financial products is limited. Recent efforts try to improve the choice quality of decision-makers — also outside of the financial decision-making space – by providing advice about optimal choices based on the characteristics of decision-makers. One approach is to use individual-specific information, such as past health expenditures for health insurance advice. Another approach is to identify characteristics that predict good or bad quality choices and provide advice by targeting product information and advice to individuals with similar traits. Using this approach for financial decision-making is, however, challenging because optimal choice quality is often heterogeneous and depends on the preferences and beliefs of consumers, which are difficult to easily observe.
This study explores predictors of expected consumer welfare from catastrophic drought insurance for pastoral households in Ethiopia. The insurance uses a satellite-based normalized difference vegetation index to monitor pasture quality, triggering indemnity payments when pasture quality falls below a certain threshold. This insurance can enhance welfare for low-income pastoral households. Still, its expected consumer welfare depends on household characteristics like risk attitudes, loss expectations, and the correlation between pasture quality and livestock losses. We use Group Lasso, a machine-learning feature selection method, to identify positive expected consumer welfare predictors from insurance. The objective is to determine household characteristics that predict whether they will benefit from this insurance.
This research reveals that different livestock types and decisions to purchase insurance on extensive and intensive margins result in distinct predictors. Key factors that positively influence welfare from insurance choices include education, religion, familiarity with the insurance and its promoter, and previous purchase of the insurance, among others.

Ceciel Pauls (VU Amsterdam – Faculty of Science), Michel Klein (VU Amsterdam – Faculty of Science), Stef Bouwhuis (VU Amsterdam – Faculty of Social Sciences), Objective or subjective employment precariousness? Comparing definitions to a topic model based on user-generated data.

There exists great heterogeneity in the literature regarding both the definition and the operationalization of employment precariousness (EP). Because of the lack of consensus on what EP entails, there is an unfulfilled need for a universally recognized definition and operationalization of EP. As a consequence, authors often operationalize EP based on the variables available in the data. The literature on EP can be roughly divided into two approaches: one of which relates to the objective contractual arrangement (objective EP) and the other to an individual’s experience (subjective EP). However, it is unclear how theoretically and data-driven definitions of EP, such as such and subjective EP, compare to data ‘in the wild’. In this study, we use an objective definition of employment precariousness based on the Employment Precariousness Scale (EPRES) and subjective EP as formulated by Mai et al. to determine how prevalent user-generated topics in relation to EP compare to both the objective and subjective definitions of EP. We aim to gain understanding in the ways in which individuals generate discourse about the various dimensions of EP and how this discourse related to objective and subjective EP. We apply Bi-term Topic modeling on user-generated data from Dutch employment forums to identify a number of k latent employment-related topics. Our preliminary results demonstrate that the dimensions in the EPRES scale are not in alignment with the topics regarding EP discussed in online discourse.

Dimitris Pavlopoulos (VU Amsterdam – Faculty of Social Sciences), Roberta Varriale (Sapienza University), Mauricio Garnier-Villarreal (VU Amsterdam – Faculty of Social Sciences), Cross-country differences in employment mobility in the presence of measurement error. A multiple-group hidden Markov model using linked administrative and survey data

In this paper, we investigate whether measurement error can bias cross-country differences in employment mobility. We particularly focus on the comparison between two countries with very different labour markets: the Netherlands and Italy and we study mobility between permanent employment, temporary employment, self-employment and non-employment. For this purpose, we define a multigroup mixed hidden Markov model (MgHMM) with two independent observed indicators for employment status. These two indicators come from linked administrative data from the National Statistical Institutes of Italy (Istat) and the Netherlands (CBS) and survey data from the Labour Force Survey for the years 2017-19. The dimensionality of the data is therefore quite different for the two sources: administrative data cover the whole population, while survey data cover only a sample. The MgHMM is flexible to model measurement error in both data sources in both countries. The results of our analysis indicate that, when correcting for random measurement error, cross-country differences in employment mobility over time are smaller than originally thought. Error-corrected estimates of over time mobility from temporary to permanent employment, self-employment and non-employment to temporary or permanent employment are much smaller than the relevant observed mobility rates. For example, the 3-month transition rate from temporary to permanent employment was never larger than 9.7% in Italy or 7.7% in the Netherlands, while the 3-month transition rate from non-employment to temporary employment never exceeded 4.1% in Italy or 13.0% in the Netherlands. Random error seems more present in the Labour Force Survey than administrative data in both countries. In administrative data, random error seems to bias only estimates on self-employment in the Netherlands and on temporary employment in Italy.

Sanne Peereboom (Tilburg University, Tilburg School of Social and Behavioural Sciences), Inga Schwabe (Tilburg University, Tilburg School of Social and Behavioural Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences), Cognitive phantoms in large language models through the lens of latent variables

Large language models (LLMs) increasingly reach real-world applications but are poorly understood. The size and complexity of LLMs complicate the study of potential higher-order constructs such as attitudes or behavioural tendencies. Inspired by ethology and psychology, an alternative approach to studying LLMs is to treat them as participants in psychology experiments. Recent studies administering psychometric questionnaires to LLMs report human-like traits in LLMs. However, using psychometric instruments developed for humans presupposes equivalence in the internal representation of a construct in LLMs and humans. In psychometrics, suchlike constructs are known as latent variables: unobservable, abstract constructs that are measured through observable variables. Yet, typical analytical procedures rarely investigate the internal representations of latent phenomena in LLMs, and resort to comparisons of aggregate dimension scores instead. This study corrects this misalignment and applies formal psychometric methods to investigate if and how the latent representations of psychological traits differ between humans and LLMs. A latent variable modelling approach was used to compare a representative human sample with LLM responses on two validated personality inventories (HEXACO-60 and Dark Side of Humanity Scale). Our findings indicate that administering psychometric inventories can create the illusion of human-like traits in LLMs, which does not withstand formal psychometric analyses and introduces the risk of misattributing “human-like” latent phenomena to LLMs. We highlight the need for psychometric analyses of LLM responses to avoid chasing cognitive phantoms.

Keenan Ramsey (Twente University – Faculty of Behavioural, Management and Social Sciences), Anne van Dongen (Twente University – Faculty of Behavioural, Management and Social Sciences), Robbert Sanderman (Twente University – Faculty of Behavioural, Management and Social Sciences), Assessing the scope of mental health (non)-recovery in the aftermath of the COVID-19 pandemic

Background: The COVID-19 pandemic had profound impacts on mental health globally. While research highlights the resilience of mental health in the general population, some may be left behind unanimous recovery is assumed post-pandemic. Specifically, attention is needed for those who do not experience recovery alongside broader population trends. Utilizing existing datasets presents an opportunity to harmonize disparate data sources into a robust sample to comprehensively address this issue.
Objectives: This project aims to leverage data from multiple cohort studies to create a harmonized dataset for investigating mental health (non)-recovery in the Dutch population after the COVID-19 pandemic. The initial objective is to develop a versatile pipeline capable of integrating heterogeneous datasets. The resulting harmonized dataset is the foundation for subsequent aims identifying, characterizing, and predicting mental health (non)-recovery. Ultimately, we seek to clearly communicate findings through visual decision aids, providing stakeholders with actionable insights to identify and prioritize support for individuals at risk.
Methods: Longitudinal data from Dutch cohort studies, LISS Panel and Lifelines, with data pre-, during, and post-pandemic were selected. Harmonization is facilitated by developing a pipeline for generic data-cleaning and pre-processing. In the harmonized dataset, (non)-recovery is operationalized for a three-step strategy to identify, characterize, and predict (non)-recovery. Descriptive analyses identify persistent mental health impairments, while further characterization explores factors associated. Machine learning models are used to identify and create visual mappings of potential risks.
Results: Preliminary results are expected to demonstrate the feasibility and effectiveness of the pipeline created to facilitate harmonization.
Conclusions/Implications: Using computational techniques to understand (non)-recovery post-pandemic advances both research practices and understanding concerning mental health in crisis. These insights are pivotal for informing monitoring, interventions and policies aimed at supporting populations at risk. Moreover, the methodological framework developed offers a scalable solution for robust evaluation of mental health impacts in future crises.

Raoul Schram (Utrecht University – Faculty of Social Sciences), Samuel Spithorst (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences), Metasyn: Generate synthetic tabular data in a transparent, understandable, and privacy-friendly way.

Social scientists often handle highly privacy-sensitive information, such as mental health questionnaires, personal income data, or social media information. This makes it ethically and legally problematic for the researcher to share the real data publicly as part of the research process, even if the data is pseudonymized. Among other issues, this can create reproducibility problems: while the analysis code for the research might be published, other researchers cannot reproduce the results without the original data. One solution that can improve the situation is to crea

Mónika Simon (University of Amsterdam – Faculty of Social and Behavioural Sciences), The anatomy of a conspiracy theory: A multi-modal investigation of the
evolution of distrust narratives surrounding “”Kastepiracies””

Much has been uncovered about the psycho-social predictors and correlates of conspiracy theories, highlighting their profound impact on interpersonal relationships, inter-group relationships, and democratic institutions at large. These simplified explanations of complex realities have a powerful appeal and persuasive impact. They create an environment ripe for misinformation by undermining trust in traditional gatekeepers, encouraging extremist belief systems and actions, and leading to disengagement from conventional political activities. However, there is a significant gap in understanding how narrative features like simplified language, logical fallacies and biases, sensationalism, and negativity present in various (social) media content contribute to the emergence and spread of conspiracy theories, fostering societal distrust. To address this gap, we employ state-of-the-art automated content analytic techniques (textual and visual) to study ‘Katespiracies’ — a series of conspiracy theories that arose around the prolonged public absence of the Princess of Wales following major abdominal surgery in early 2024. A failed attempt by Kensington Palace to debunk these theories with a “”doctored”” family photo, which was given a kill notice by the Associated Press, exacerbated the spread of conspiracy theories and raised more suspicion. This persisted until a video was shared by Kate detailing her cancer diagnosis on March 21st, 2024 that effectively put an end to almost all conspiracy theorizing around her absence. This clear timeline of the emergence and rapid demise of Katespiracies allowed us to examine the evolution of (dis)trust narratives in social media and the British press. Drawing on theories of narrative persuasion, agenda setting, and framing, we study how specific narrative elements and textual and visual features shape the evolution of ‘Katespiracies’ in both social and legacy media. Our insights derived from a unique multi-modal dataset offer actionable insights for combating conspiracy theories by identifying and addressing problematic narrative features that fuel their spread on (social) media.

Abhigyan Singh (TU Delft – Faculty of Industrial Design Engineering), Natalia Romero Herrera (TU Delft – Faculty of Industrial Design Engineering), Razieh Torkiharchegani (TU Delft – Faculty of Industrial Design Engineering), Converging design methods and computational analysis to support community lifelong learning practices

Energy learning communities are institutions involving various stakeholders that engage in lifelong learning practices integrating knowledge from research, government, industry, and society to accelerate energy transition. A challenge that learning communities often face is the context-dependent nature of problems and strategies, making it difficult to understand what changes to make, when, how, and with whom. We present the work of Greengage, an interactive digital platform that supports energy learning communities in developing knowledge about their local socio-technical environment.
In this presentation, we showcase our ongoing work and discuss the intended direction of three data-oriented stages of Greengage to support energy learning communities in the context of secondary school. In the data collection stage, buildings’ performance from indoor climate comfort sensors and energy meters is integrated with social data on experiences, opinions, and community preferences to prompt individuals’ contribution to community learning. This stage aims to support engaging and playful reporting activities as well as inclusive and representative lifelong monitoring practices. In the data processing stage, both technical and social computational analyses are explored, including machine learning and text analysis, to develop integrated benchmarks of building performance and community learning performance. In the feedback loop stage, a technical-social radar aims to provide real-time and historical indicators of the alignments and misalignments of the community’s values, preferences, and experiences and the connection to factual knowledge from sensors and meters.
The development and implementation of these stages follow a participatory and iterative approach to exploring schools’ needs and abilities to develop and apply knowledge about their local environment to experiment and transform current practices. Overall, our presentation contributes to the discussion on using design methods combined with computational techniques and highlights the benefits and challenges of integrating technical and social contextual data to support lifelong learning practices.

Weronika Sojka (Wageningen University & Research – Wageningen Social Science Group), Erkinai Derkenbaeva Derkenbaeva (Wageningen University & Research – Wageningen Social Science Group), Eveline van Leeuwen (Wageningen University & Research – Wageningen Social Science Group), Agent-based modeling and Urban Digital Twin simulations: state of the art for complex circular systems

Digital Twins simulations do not have a consistent definition in the literature. Nowadays, we can already identify multiple definitions and sets of sub-tools that they include, depending on their purpose or specificities of the industry that they are being designed for. Nonetheless, in contemporary urban practice, cities are increasingly turning to various kinds of Urban Digital Twins and other simulation models, e.g. Agent-Based Models (ABM), to simulate and analyse complex issues. This study aims to address the knowledge gap lying at the intersection of these two practices and to indicate opportunities for their integration.
In addition, the transition to sustainable policies requires significant resources and timely implementation based on evidence. Therefore, encompassing technological, social, environmental, and financial aspects affecting complex systems and their ongoing interplay is crucial. Lack of coherence in this aspect, hinders efforts to optimize current city management systems and align policies with the principles of the circular economy. Integrating ABM into the Digital Twins simulations is an evident solution to bridge those aspects.
In this study, we conduct a literature review and analyse real-life examples and pilot projects to investigate how they approach the system holistically and to broaden the understanding of those tools including their similarities and differences. We aim to explore whether ABMs can be considered as a component or a sub-system of a Digital Twin and construct a conceptual framework for our next studies.
We expect that the results of this exploration will determine the benefits of existing practices together with challenges that the cities might have encountered both during their creation and after their occurrence. A practice review will serve for determining the variables causing changes, risks, and opportunities for our system’s design in the future.

Koen Steenks (VU Amsterdam – Faculty of Social Sciences), Stef Bouwhuis (VU Amsterdam – Faculty of Social Sciences), Dimitris Pavlopoulos (VU Amsterdam – Faculty of Science), Classifying employer orientation: how do firms combine wage policy and the use of non-standard employment

There is a growing scholarly interest in the pivotal role that employer policies play in shaping social inequality. Particularly, employer orientation (EO), i.e. how employers combine wage policy and the use of non-standard employment (NSE), may influence social disparities as they influence inequalities regarding job security and income. Previous empirical research on employer policies has mostly focused on either wage policy or the use of NSE. Research on how employers combine this is mostly theoretical. These studies usually distinguish between two types of orientations: external and internal. Externally oriented entities base their policies on the fluctuating dynamics of supply and demand within the labour market, and often use NSE and performance-based wages. Internally oriented firms more often use permanent contracts and offer their employees a continuously increasing wage based on administrative rules such as wage scales.
To our knowledge, this study is the first to empirically investigate how employers combine wage policy and the use of NSE. To do so, we use register data on employment from Statistics Netherlands linked to the structure of earnings survey. Since employer orientation is a multi-dimensional latent variable, we employ Latent Class Analysis (LCA) to cluster employers with similar orientations. In addition, we explore how firm size, sector, whether a firm is a multinational, and labour union activity influence a firm’s EO.
The preliminary results show that most employers are either flexible with respect to wages or contracts, indicating that there exists a trade-off between wage policies and the use of NSE to generate flexibility. This contrasts with the theoretical distinction between internal and external orientation where high (or low) wage flexibility goes together with high (or low) contract flexibility.

Eduard Suari-Andreu (Leiden University – Faculty of Law), Max van Lent (Leiden University – Faculty of Law), Time to Give: Health Shocks as a Trigger for Inter-Vivos Transfers

In this study we investigate the effects of health status as a predictor of giving via inter-vivos wealth transfers. To that end, we use high-quality administrative data for the whole Dutch population. We construct a measure of negative health shocks and use it to carry out an event study that exploits the random timing of the shock. Our results show a significant and positive increase in the probability of giving wealth during the years following a health shock. Several extensions of our baseline analysis (using the diagnoses related to the shock as well as data on the children of individuals who experience the health shocks) indicate that the transfers we observe are mostly intergenerational transfers. In addition they show that the wealth transfers respond to increased mortality and that they fit in a model of intergenerational altruism. Our findings are relevant for social policy and for tax policy.

Kristina Thompson (Wageningen University & Research – Wageningen Social Science Group), Johan van Ophem (Wageningen University & Research – Wageningen Social Science Group), Investigating socio-economic status’s role in the intergenerational transmission of mortality

What makes someone long-lived? Although lifespans worldwide are generally increasing, some people enjoy a longevity advantage, while others do not. In the Netherlands, there is similarly evidence of disparities in life expectancy. Understanding the sources of this variation is crucial to identifying ways of narrowing the gap between the short-and long-lived.
Already, research suggests that an individual’s survival chances are heavily tied to the lifespans of their family members. That is, longevity can considered something that is inherited from one generation to the next. Likewise, research suggests that socio-economic status is a key determinant of health and mortality, and is also influenced by the socio-economic status of previous generations.
To fully unravel how mortality and socio-economic status are related, intergenerational data on both are necessary. Such datasets are extremely rare. The newly-linked Historical Sample of the Netherlands (HSN) and the System of Social-statistical Datasets (SSD) offers researchers a unique ability to study how socio-economic status and mortality are related across generations. This dataset stands out for the ability to link up to three generations of kin, and for containing the cause of death of the final generation.
With this dataset, we will explore the moderating role of socio-economic status on the intergenerational transmission of mortality. Exploiting cause-of-death information will further illuminate any inherited and/or socio-economic patterns in mortality.

Agata Troost (University of Groningen – Faculty of Spatial Sciences ), Jaap Nieuwenhuis (University of Groningen – Faculty of Behavioural and Social Sciences), Jonathan Mijs (Boston University), Exploitation-based class scheme, social inequality and contemporary conflicts: a novel empirical approach

Much of current quantitative social research on inequality uses proxies (income, education) to assess a person’s social position and experiences. We argue that we need better data on social class dynamics, as well as on personal assets and wealth. Data capturing people perception of social classes, including their own class identity, can help with explaining the mechanisms of social inequality and related societal conflicts. In December 2023 we won the ODISSEI LISS panel data grant financing a survey collecting such data, which can be matched with Dutch administrative register datasets and other LISS surveys on topics such as one’s housing situation, political views and activities. The grant gave us a unique opportunity to design survey questions on individual perceptions of inequality, work autonomy, and social class.
Our novel data, in combination with the existing LISS and Microdata datasets, allow us to answer a key question in contemporary research on social classes, based on theories inspired by thinkers such as E.O. Wright and Bourdieu. Is class identity predominantly determined by the economic or cultural capital? Can the relative lack of class-based political mobilisation be explained by low levels of class consciousness? Or are attitudes about ethnic divisions, or maybe feelings of powerlessness and precarity more important? This research allows for exploring issues surrounding social inequality among the growing concerns about divisions and polarisation of the Dutch society.

Stéphanie van den Berg (Twente University – Faculty of Behavioural, Management and Social Sciences), Ulrich Halekoh (SDU), Jacob Hjelmborg (SDU), Using computer vision to estimate a Linfoot correlation

As long as data are multivariate normally distributed, Pearson correlation coefficients are perfect estimators of bivariate relationships. But many traits in social and medical sciences are not normally distributed. Existing methods to estimate mutual information (the basis for correlation) show bias and large variance. Deep convolutional neural networks have shown great performance on image data. Here we make use of this strength by transforming data sets into images, and let deep learning do what it is good at. We compare performance with numerical methods to estimate mutual information.

Sterre van der Kaaij (National Institute for Public Health and the Environment (RIVM)), Lenneke Vaandrager (Wageningen University & Research – Wageningen Social Science Group), Hanneke Kruize (National Institute for Public Health and the Environment (RIVM)), Using Agent-Based Modeling and Group Model Building to Understand How Spatial Interventions Affect Health Behaviours: A Research Protocol

Background: Western society is structured in a way that is conducive to physical inactivity and unhealthy food consumption. Spatial interventions in the built environment, such as restructuring greenspaces, might be important in promoting healthy behaviours. However, insights into the system that influences these behaviours are lacking, and the effectiveness of spatial interventions remains unclear. This research protocol outlines the methodology that will be used to understand how spatial interventions in the built environment affect physical activity (PA) and healthy food consumption, particularly focusing on greenspace interventions, using a systems approach. Methodology: We will design Agent-Based Models (ABM) to study adult health behaviour at the neighborhood level in the Netherlands. ABMs are particularly suited for this research because they can simulate heterogeneous individual behaviours and their adaptations to interventions in the built environment. The ABM will be based on existing theories (such as COM-B) and data collected from 2-3 living labs in the Netherlands. A living lab involves municipalities working together with citizens to co-create and implement spatial interventions aimed at promoting PA and/or healthy food consumption. The data collection for this study will involve two phases. First, Group Model Building (GMB) will engage citizens, neighborhood professionals, and scientific experts to identify factors and relationships in the living environment that contribute to healthy behaviour. In the second phase, these factors will be measured among citizens before and after the intervention to determine its impact. Measurements will include walkability, levels of physical activity, and citizens’ experience with the spatial intervention.

Oskar Veerhoek (Radboud University Nijmegen – Faculty of Management Sciences), The Springboarding Organization

As wage setters, organisations play a pivotal role in upward intergenerational mobility. Which types of organisations contribute to upward intergenerational mobility remains unclear. As a theoretical contribution, this paper proposes that springboarding organisations shape upward mobility. Springboarding organisations are organisations that substantially increase lifetime earnings of employees by boosting their future earnings. They achieve this by facilitating rapid accumulation of their employees’ cultural and social capital. Cultural and social capital are built through extensive on-the-job training, networking, and employer reputation. The boost in future earnings occurs when employees convert cultural and social capital into economic capital. This paper makes two empirical contributions: (1) identification of springboarding organization characteristics and (2) estimation of impact on upward intergenerational mobility. For identification, the characteristics are size, sector, types of employees, and age. For estimation of impact, the method is variance decomposition. This research is made possible by the 2022 ODISSEI Microdata Access Grant. This grant covers access to the CBS microdata, a comprehensive administrative data set of the Dutch government. With the CBS microdata, 4.5 million people in the Dutch labor market are followed from 2006 to 2022. In the data, an organization is classified as a springboarding organization based on the ratio between employees’ future wages and their wage during first year of employment. The CBS microdata are based on tax records, family links, and organization registers. This study is based on theory from social stratification (Bourdieu) and labor economics (Becker). It aims to contribute to the relatively new field of intergenerational mobility within labor economics.

Thom Volker (Utrecht University – Faculty of Social Sciences), Carlos Gonzalez Poses (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences), densityratio: An R-package for density ratio estimation

The density ratio (i.e., the ratio of the distribution of two datasets) is a workhorse in many computational social science tasks, such as sample selection bias adjustment, non-parametric two-sample testing, change-point detection and synthetic data utility evaluation. The key advantage of the density ratio in these applications lies in its ability to identify where and how two distributions differ. Over the past years, advanced methods have been developed to accurately estimate the density ratio from two samples. Despite these innovations, tools for density ratio estimation are rather inaccessible, because existing software only implements a narrow range of estimation techniques, is relatively slow, and/or lacks user-friendliness. To make the tools from the density ratio estimation literature available to computational social science researchers and beyond, we present the R-package densityratio. The densityratio-package is designed to support novice and advanced users in a wide range of practical situations. It contains a comprehensive suite of methods for density ratio estimation, including novel extensions to deal with high-dimensional data. All methods efficiently estimate the density ratio between two input datasets using non-parametric kernel-based estimation techniques implemented in C++. Automatic hyperparameter tuning through fast, multi-core cross-validation minimizes the need for model specification on behalf of the end-user, allowing researchers to focus on their substantive questions. Densityratio makes it easy for users to not only estimate density ratios, but also to inspect, validate, and extend their functionality; two-sample testing, prediction, and plotting are built-in, allowing researchers to use the estimated density ratio in subsequent tasks and visualize the output of the model. In the presentation, we demonstrate the densityratio-package in two empirical examples in the domain of sample selection bias and two-sample testing. As such, we illustrate how densityratio makes density ratio estimation a useful and accessible tool in the toolbox of computational social scientists.

Anastasiia Voloshyna (University of Groningen, Faculty of Economics and Business), Agnieszka Postepska (University of Groningen, Faculty of Economics and Business), Can hybrid work help close the labor market gender gaps?

The work-from-home experiment initiated by the Covid pandemic has transformed the global work environment. Employers now offer the option to continue working remotely, and policymakers recognize the potential of hybrid work to address labour market shortages by enabling greater participation and longer working hours for those with informal care responsibilities. This study empirically tests Goldin’s (2014) hypothesis using Dutch administrative data and sub-sample population surveys. We investigate whether remote work has led to a more equal labour market regarding promotion rates, job hopping, participation, weekly hours, and wages among men and women in the Netherlands, particularly those with informal caregiving duties. We analyse post-pandemic data, abstracting from the detrimental effects of lockdowns, and examine both the impact of individual hybrid work and the hybrid work of partners, aiming to uncover new pathways toward a gender-equal labour market. The study begins by merging work-from-home data from the Labour Force Survey with administrative employment and family condition records to explore the labour outcomes of interest. Using this sub-sample, we train a Random Forest Model to predict the probability of working from home for the larger population and identify changes in work-from-home patterns before and after the Covid pandemic. Additionally, we introduce heterogeneity through between/within-industry variation in statutory provisions on flexible work arrangements in collective agreements. We conduct a difference-in-difference analysis employing matching techniques to correct for individual characteristic imbalances. Preliminary results indicate increased labour force participation among women with minor children and an increase in average hours worked, primarily among those sometimes/always working from home. Furthermore, women with children employed by companies that allow remote work are less likely to switch jobs. Finally, we find that a male partner’s status of sometimes/always working from home is significantly associated with higher employment probabilities and increased hours worked after childbirth of mothers, as well as with a somewhat shorter time to the next birth within the household.

Thorid Wagenblast (TU Delft), Social influence in the context of climate change adaptation: analyzing cross-national survey data

To adapt against the impacts of climate change, adaptation across all scales is needed. This includes adaptation on individual or household, community or local, and national and international levels. Social interaction and influence connect individuals in their communities. Furthermore, they are, next to risk and coping perceptions, key drivers of private climate change adaptation decisions and community-level adaptation. Nonetheless, so far, there is limited knowledge of how people influence each other in the context of climate change adaptation. Using cross-national survey data (Netherlands, US, Indonesia, England), we identify patterns of social interaction in this context. We describe different groups based on interaction frequency and content. There appear to be stark differences in communication based on context and country. Indonesian respondents interact more on the topic of flooding and adaptation compared to their Western counterparts, and so do people who perceive the risk as higher. Discussion often evolve around the perceived risk and coping strategies to mitigate the risk, both on individual and community level. Furthermore, we link the likelihood of being connected within a social network to household characteristics like income or risk perception. These archetypes of interactions and interactors are then used to update social influence in an agent-based model of household flood adaptation uptake, testing different communication policies and their effectiveness under the influence of the refined social interaction.

Shuai Wang (VU Amsterdam – Faculty of Science), Maria Adamidou (VU Amsterdam – Faculty of Science), Examining LGBTQ+-related Concepts and Their Links in the Semantic Web

The past years witnessed a significant adoption of LGBTQ+ ontologies and structured vocabularies in libraries. Some of them are published as linked data in the semantic web. Homosaurus is among the most popular ones with links from/to the QLIT, GSSO, Wikidata, and LCSH, etc. Over the past years, three versions of Homosaurus have been released with updates every half a year. Despite its rapid development, little has been reported about the properties of these links. In this study, we first retrieve all the mappings and links between them as well as links about concept replacement to form an integrated knowledge graph, together with some obtained links about redirection. Using them, we perform qualitative and quantitative analysis. We discuss the discovery of missing links using weakly connected components. We analyze concept drift and change by providing examples of the convergence and divergence of concepts. Finally, we discuss some potential issues with publishing related multilingual information in the semantic web and the consequences of our findings in practice in libraries, heritages, and online literature databases.

Jari Zegers (Tilburg University, Tilburg School of Social and Behavioural Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences), Understanding psychological responses to the COVID-19 pandemic with latent class growth analysis

There is increasing evidence that the psychological responses to the COVID-19 pandemic are heterogenous. In this paper, we model subgroups in a dataset of emotional responses to the pandemic using latent class growth analysis on a panel dataset of UK-based participants. Participants (n=868) rated eight emotions on a nine-point Likert scale in April of 2020, 2021, 2022, and 2023, and provided demographic variables, data on perceived social support and the impact of life events. Using a latent class trajectory analysis, we found evidence for six latent classes with some classes showing patterns of well-coping (decreasing negative emotions and increasing positive emotions) while others showed patterns of poor adjustment. Social support and negative life events were predictive of class membership in a multinomial logistic regression: having lower social support and experiencing negative life events increased the probability of participants belonging to poorly adjusted classes. Our study suggests that most individuals – over the period of four years – were able to adjust well to the pandemic, while a smaller subgroup of individuals struggled considerably. Social support may be of interest as a protective factor and could aid policy makers during future crises.

Bente Zuijdam (Maastricht University – Faculty of Science and Engineering), Adriana Iamnitchi (Maastricht University – Faculty of Science and Engineering), Demographic and Political Differences in Twitter Abuse: The Case of Dutch Politicians

In this study we look at abusive messages on Twitter (now named X) addressed to Dutch politicians. We investigate the impact of the political stance and the politician’s demographic characteristics, such as gender or religion, on the levels of abuse targeted at them. To do this, we employ computational methods to analyse a dataset of more than a million tweets that mention Dutch politicians during the year 2022. We use gpt-3.5 (ChatGPT) to identify who is the target of the abusive message that mention the politician: is it the politician or a different group? We then focus only on the messages that attack the politician and determine which characteristics influence received abuse: are they demographic characteristics (such as gender, race, religion) or aspects of the the political/ideological platform?
We discover evidence that demographic characteristics and political stance matter in received abuse. We report the differences in the effect and significance for the demographic and engagement characteristics (gender, ethnicity, religion, number of tweets and number of followers) and political stance. Moreover, we show that these factors alone cannot fully account for received levels of abuse. We also found that extreme right and conservative orientated politicians are mentioned on the largest number of messages that contain abusive language targeted at other groups. Our study extends prior research and helps inform and guide policies for a diverse and safe digital political discourse.

13.00-14.00 – Parallel Session 2

Pushing the boundaries of social sciences

Chair: Daniel Oberski

Room: Mission 1

David Ferenczi (Maastricht University – School of Business and Economics), Jean-Gabriel Young (University of Vermont), Leto Peel (Maastricht University – School of Business and Economics). Inferring Signed Networks From Contact Patterns
Tom Emery (Erasmus University Rotterdam – Erasmus School of Social and Behavioural Sciences). Network Sampling using the Population Scale Network
Rense Corten (Utrecht University – Faculty of Social Sciences), Joris Mulder (Centerdata), Stein Jongerius (Centerdata), Reggie Cushing (Netherlands eScience Center (NLeSC)). Online Behavioral Experiments in the LISS Panel

Abstracts

David Ferenczi (Maastricht University – School of Business and Economics), Jean-Gabriel Young (no), Leto Peel (Maastricht University – School of Business and Economics), Inferring Signed Networks From Contact Patterns

We consider the problem of reconstructing a network of positive and negative social relationships from the frequency of observed pairwise interactions, e.g., face-to-face encounters or other physical proximity events. The reconstructed network describes three types of relationship between pairs of individuals: positive, when people like each other, neutral when people are indifferent each other, and negative when they dislike each other.
We assume that people that like each other will interact more often and people who do not like each other will tend to avoid each other. However, it is also possible that people do not interact simply because they lack the opportunity to do so, for example, they are not physically collocated at the time. The ambiguity associated with low frequencies of interactions means that negative edges may be overrepresented. We address these issues by modelling time-dependent social interaction groups within which pairs of individuals have the opportunity to freely interact and will interact more frequently the more positive their underlying relationship.
We capture these assumptions in a probabilistic generative model that defines how frequently people interact given their relationship status. We present an efficient MCMC algorithm to perform Bayesian inference that identifies the social interaction groups and reconstructs the signed network of social relationships, and infer the probabilities for interactions depending on the type of relationship between individuals. We showcase some experiments with synthetic and real-life data.

Tom Emery (Erasmus University Rotterdam – Erasmus School of Social and Educational Sciences), Network Sampling using the Population Scale Network

The population scale network file is increasingly used to understand network dynamics in projects using administrative data at Statistics Netherlands. However, the potential for using the population scale network as a samping frame and used to collect self-reported data that is notably absent from the administrative data. In this presentation I present the sampling design for a survey to be fielded in 2025. This survey aims to assess the diffusion of attitudes and norms around formal childcare use. Crucially, the survey identifies ‘communities’ within the network using the infomap algorithm. These communities are similar in size to wijken and are then used to stratify the sample. This potentially enables the study of network behaviours via a survey using a probabilistic sampling method. The sampling design has potentially wide applications in several fields and would enable new insights into network dynamics in real social contexts. During the presentation the logistical and methodological challenges of this approach will be described, and crucial design decisions will be outlined. Potential onward applications, epistemological limitations, and ethical considerations will also be discussed.

Rense Corten (Utrecht University – Faculty of Social Sciences), Joris Mulder (Centerdata), Stein Jongerius (Centerdata), Reggie Cushing (Netherlands eScience Center (NLeSC)), Online Behavioral Experiments in the LISS Panel

Randomized controlled experiments are increasingly popular across the social sciences as the gold standard for identifying causal mechanisms underlying a wide range of social phenomena including collective action, social preferences, or market behavior. Because behavioral experiments are typically conducted with small groups sampled from student populations, a common critique is that they lack both scale and external validity. We present a proof-of-concept for a new facility for online experiments based on the popular Otree platform with participants recruited from panel surveys, more specifically the LISS panel, a long-running, high-quality and representative panel in the Netherlands. This allows researchers not only to run online experiments with larger groups and representative samples, but also to capitalize on the data already available on panel respondents to enrich experimental designs with information on social-economic backgrounds, opinions, demographics, and more. In addition, all data collected In the LISS panel can be merged to CBS microdata. Another promising opportunity is to let panel respondents participate repeatedly in online experiments, in parallel to measurements of the panel survey. We discuss the challenges and opportunities for this facility, the technical implementation, and present results of a first pilot.

Evaluation and policy research

Chair: Vardan Barsegyan

Room: Quest

Joris Broere (Netherlands Institute for Social Research), Javier Garcia Bernardo (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences). Developing a structural causal model for human wellbeing
Kyra Evers (University of Amsterdam – Faculty of Social and Behavioural Sciences), Denny Borsboom (University of Amsterdam – Faculty of Social and Behavioural Sciences), Eiko Fried (Leiden University – Faculty of Social Sciences). How should we simulate the accumulation of adverse life events in computational models?
Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences). Pensynth: Easier Causal Inference through Fast Penalized Synthetic Control Estimation

Abstracts

Joris Broere (Netherlands Institute for Social Research), Javier Garcia Bernardo (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences), Developing a structural causal model for human wellbeing

Globally, academics and governmental institutions are exploring alternatives to GDP as the primary indicator of governmental performance. Developments such as the COVID-19 pandemic and the ongoing climate crisis underscore the insufficiency of relying solely on economic indicators. Numerous alternative indicators have been proposed to replace or complement GDP. However, GDP retains many attractive properties, including a standardized methodology for measurement and the predictability of policy impacts. Therefore, any new indicator must be supported by robust methodologies and models.
In this study, we develop a model for one commonly suggested indicator: human wellbeing. The overall goal of such a model is to use data to inform policymaking. One of the key challenges we address in this study is how to model interventions or policies so their effects can be predicted with reasonable certainty. We argue that structural causal modeling can simulate the impact of governmental policies and predict their effects. We demonstrate a toy model for wellbeing and outline strategies for estimating such a model. We provide an example of how to design an intervention so its effect can be predicted. Additionally, we discuss the methodological challenges that must be overcome before these models can be safely implemented to inform policymaking.

Kyra Evers (University of Amsterdam – Faculty of Social and Behavioural Sciences), Denny Borsboom (University of Amsterdam – Faculty of Social and Behavioural Sciences), Eiko Fried (Leiden University – Faculty of Social Sciences), How should we simulate the accumulation of adverse life events in computational models?

Adverse life events can play a crucial role in the development and maintenance of mental disorders. Empirical data and numerous theories suggest symptoms may be triggered by adversity, such as the death of a spouse, losing your job, or getting divorced. Computational models have been gaining popularity as they help to formalize such theories, mathematically specifying the mechanisms of psychopathology. The key role that adverse life events play for many mental health problems raises the question of how they should be included in computational models. What process can we use to model the occurrence of adverse life events, and can we simulate how they accumulate over time? To make computational models more empirically informed, we used two national panel survey data sets: the Household, Income and Labour Dynamics in Australia (HILDA) and the Swiss Household Panel (SHP), spanning over twenty years. We investigate the frequency and timing between adverse life events, as well as the conditional dependence between types of events. In order to simulate how life events accumulate over time, we first fitted a discrete probability distribution to the number of events experienced per year. By simulating life events as independent draws from this probability distribution each year, we show that treating life events as independent occurrences poorly fits the empirical accumulation of life events. Model performance increases when assuming time dependence between the number of events in the current year with those of the past year. Time dependence is incorporated in a Markov model, which fits a transition matrix to the number of life events experienced per year. Though not a perfect fit, we show how a simple Markov process comes close to empirical data on how life events accumulate over twenty years.

Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences), Pensynth: Easier Causal Inference through Fast Penalized Synthetic Control Estimation

A multitude of social science research questions involves estimating the impact of policy interventions, historical events, natural disasters, or changes in personal life-courses. In such situations, performing a controlled experiment is often infeasible, unethical, or impossible, making it difficult to estimate the causal effect of interest. Fortunately, a trend of ever-growing availability of high-quality register data (e.g., in the Nordic countries or in the Netherlands) provides researchers with a promising alternative: causal effect estimation using observational data.
One of the most prominent methods for this is the synthetic control method. This method, which is related to matching and interrupted time-series, estimates the counterfactual control unit through a weighted average of units in an untreated “donor pool”. However, despite its popularity, this method is not without its disadvantages. Existing implementations are slow and computationally opaque, by default the method performs well only for specific data scenarios, and it is complex to do inference on the resulting causal effect estimate. This hampers the replicability and transparency of research using synthetic controls.
To address these issues, I present the open-source R package pensynth, available on CRAN. This package combines several recent developments from existing research in methodological, computational, and applied literature: a consistent and clear user interface and documentation, hold-out validated penalization to improve counterfactual prediction, and lightning-fast estimation through the use of the clarabel quadratic program solver, which enables experimental support for weighted bootstrap inference of the counterfactual.
In the presentation, I start by briefly explaining the relevant theory of penalized synthetic controls, after which I use interactive code and visuals to show how pensynth can improve workflows in causal inference with observational data. Experimental support for weighted-bootstrap based prediction intervals will be showcased and motivated as well, with ample room for discussion.

LLM in social sciences

Chair: Kevin Wittenberg

Room: Progress

Anna Machens (Twente University – Faculty of Behavioural, Management and Social Sciences). Confidence bounds for zero-shot text-classification with LLMs
Bastián González-Bustamante (Leiden University – Faculty of Governance and Global Affairs). Benchmarking Open-Source Large Language Models in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data
Qixiang Fang (Utrecht University – Faculty of Social Sciences), Javier Garcia Bernardo (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences). Addressing LLM-related Measurement Error in Social Science Modeling Research

Abstracts

Anna Machens (Twente University – Faculty of Behavioural, Management and Social Sciences), Confidence bounds for zero-shot text-classification with LLMs

Generative Large language models (LLMs) like chatGPT have revolutionised text analysis tasks, offering high performance and ease of use. However, their usefulness is often limited by concerns over data privacy, lack of transparency, and reproducibility issues. To address these limitations, we explore the potential of smaller, locally executable open-source LLMs that allow for full control over the model.
This talk investigates whether these smaller models are a suitable research tool for social scientists, focusing on sentiment analysis as a case study.
We evaluate the overall accuracy of our results and explore methods for obtaining confidence bounds. Specifically, we assess the model’s ability to generate its own confidence levels and analyse the model’s next-token probabilities to derive confidence bounds. Additionally, we examine the impact of hyperparameters on the output distribution for a given prompt.
At the end, we give an example by applying sentiment analysis to social media posts about the candidates of the US election.

Bastián González-Bustamante (Leiden University – Faculty of Governance and Global Affairs), Benchmarking Open-Source Large Language Models in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data

This article benchmarked the ability of 16 open-source Large Language Models (LLMs) to perform annotation tasks on political content. The models were deployed locally to identify toxicity and incivility in the digital sphere on a novel protest event dataset comprising almost five million digital interactions. The findings show that Nous-Hermes 2 and fine-tuned versions outperform other LLMs’ zero-shot classification annotations. In addition, Mistal-Openorca, with a smaller number of parameters, is able to perform the task with high performance, which confirms how some small, fine-tuned models deployed locally could offer a good trade-off between performance, implementing costs and computing time.

Qixiang Fang (Utrecht University – Faculty of Social Sciences), Javier Garcia Bernardo (Utrecht University – Faculty of Social Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences), Addressing LLM-related Measurement Error in Social Science Modeling Research

With the advent of large language models (LLMs), the collection of measurements related to social science constructs (e.g., personality traits, political attitudes, human values) has become easier, faster and more affordable. These measurements are subsequently used for modelling of societal and group processes that social scientists typically engage in, where inferences from samples to populations are also made. Valid modelling and inferences, however, requires high-quality measurements or at the very least, methods to deal with the presence of measurement error. Just like traditional questionnaire-based measurements, LLM-based measurements have been shown to suffer from validity and reliability issues.
While there is an abundance of research literature in dealing with measurement error, they focus on questionnaire-based measurement error. It is unclear yet how measurement issues arising from LLMs should be handled in social science modelling research.
This study has two primary objectives. First, we review existing literature to identify practices for addressing LLM-related measurement error, both in computer science and in the social science context. Second, we synthesise these findings with existing measurement modelling literature to propose a practical framework for making valid inferences using LLM-based measurements in social sciences. By bridging the gap between LLM prediction capabilities and social science inference requirements, our framework aims to enhance the reliability and validity of social science research outcomes in the era of LLMs.

ODISSEI Data Collections – future direction

Chair: Peter Lugtig

Room: Royal Lobby

Tim Reeskens: ESS Round 12 and beyond: A push-to-web
Marcel Lubbers: Mode-effects in the NKO elections studies – conclusions from the 2017, 2021 and 2023 surveys
Bella Struminskaya: The future of SHARE
Julia Rokos and Anne Gauthier: The transition to mixed mode in the Generations and Gender Survey: An overview and lessons learnt
Ruud Luijkx: The mixed mode experiment in the fifth wave of the European Values Study

Abstracts

Julia Rokos (Netherlands Interdisciplinary Demographic Institute (NIDI)), Orlaith Tunney (Netherlands Interdisciplinary Demographic Institute (NIDI)), Siyang Kong (Netherlands Interdisciplinary Demographic Institute (NIDI)), The transition to mixed mode in the Generations and Gender Survey: An overview and lessons learnt

Face-to-face interviewing was and still is the dominant mode for survey data collection in the social sciences. However, face-to-face surveys are suffering from increasing costs and declining response rates. Meanwhile, there is increasing evidence for the feasibility of web surveys. Mixed-mode designs can provide greater flexibility and speed when timely information is crucial and face-to-face surveys are too difficult, too expensive, or even impossible to implement.
The Generations and Gender Survey (GGS) has transitioned to mixed-mode data collection with web as a main component in its newest round of data collection. The GGS provides high quality, longitudinal, and cross national open-access data about population and family dynamics. So far, 26 countries/territories have participated or have secured funding for conducting GGS-II Wave 1. In this paper, we will discuss in detail the mode, recruitment strategy, response rate, and data representativity of the GGS-II countries/territories. We will also share the challenges and lessons learned in using mixed-mode to conduct cross-national social surveys.
To date, the majority of the GGS-II countries/territories used self-completion web interviewing as the main mode of data collection. Face-to-face interviewing, telephone interviewing, or paper-and-pencil is used as a fallback plan to reduce non-response and potential observation bias. Two countries have randomly assigned respondents to mode (web mode and another additional mode). Four countries only conducted face-to-face interviews for practical reasons such as internet penetration and speed.
Overall, the transition to mixed-mode in GGS-II shows that it is feasible to conduct a large-scale survey in web mode, whereby the forced move to online-only experienced during COVID-19 helped accelerate this transition. Furthermore, countries who fielded in web mode only showed little deviation from representation of age groups but a slight over-representation of women and, in some countries, an over-representation of higher educated respondents.

Bella Struminskaya (Utrecht University – Faculty of Social Sciences), Tim Reeskens (Tilburg University, Tilburg School of Social and Behavioural Sciences), Marcel Lubbers (Utrecht University – Faculty of Social Sciences). Future-proofing the data collection in large-scale surveys within ODISSEI

Large-scale cross-national survey programmes that traditionally were administered face-to-face have become increasingly expensive and difficult to administer due to the falling response rates and increased fieldwork costs. Many large comparative survey projects such as the European Social Survey (Bottoni & Fitzgerald, 2021; Maslovskaya & Lugtig, 2021), the European Values Study (Luijkx et al., 2021) the Gender and Generations Survey (Emery et al. 2020) are experimenting with ways to transition to the self-administered modes such as online surveys or a mix of online and paper surveys. The increased digitalization in many countries makes this transition possible, but there are challenges. How to transition to another mode or mixed-mode when many of those surveys have originally been designed to be interviewer-administered and, thus, include a complex questionnaire? How to motivate respondents to answer difficult or filtering questions in the absence of an interviewer? How to incorporate additional measurements such as biomarkers, physical measures and cognitive measures? How to equip users with analytical tools that allow analysis of trend or panel data given the potential effects on measurement? In this panel, we will discuss these and other challenges related to logistics, methodology and technological aspects of such transitions using examples from the ODISSEI international and national large-scale survey programmes such as the European Social Survey (ESS), Survey of Health, Ageing and Retirement in Europe (SHARE), European Values Study (EVS), Gender and Generations Survey (GGS), and the Dutch Parliamentary Election Studies (DPES).

Equality of opportunity

Chair: Weverthon Barbosa Machado

Room: Mission 2

Dieuwke Zwier (University of Amsterdam – Faculty of Social and Behavioural Sciences) One Track Mind: Secondary Effects in School Choice and Social Capital in a Stratified System
Sjoerd van Alten (VU Amsterdam – School of Business and Economics), Leandro Carvalho (University of Southern California), Titus Galama (VU Amsterdam – School of Business and Economics). A Chip Off The Old Block? Genetics and The Intergenerational Transmission of Socioeconomic Status
Sander de Vries (VU Amsterdam – School of Business and Economics). Measuring Intergenerational Persistence with Many Variables

Abstracts

Dieuwke Zwier (University of Amsterdam – Faculty of Social and Behavioural Sciences), One Track Mind: Secondary Effects in School Choice and Social Capital in a Stratified System

Socio-economically advantaged students often pursue more demanding and prestigious educational paths compared to their less advantaged peers, even at similar performance levels. These “secondary effects” of socio-economic status (SES) are typically studied in the context of school continuation decisions or track choice but may also be present in school choice. In highly stratified systems with limited direct influence on track placement, advantaged parents may seek alternative strategies, such as school choice, to enhance their children’s educational opportunities. This study examines secondary effects in secondary school choice in the Netherlands, focusing on track offerings (single-track versus multi-track schools) and ability grouping practices (homogeneous versus heterogeneous classes). Moreover, it investigates the role of social capital – resources embedded in peer and parental networks in primary school – in these patterns.
I use unique (linked) full-population register data (N_students = 328,334), peer nomination data in primary schools (N_students = 2,468), and novel web data on school characteristics (N_schools = 1,165) for Dutch students who recently transitioned to secondary school. Hypotheses are tested through regression analyses with a fixed effects estimator to account for selection into primary schools. Findings reveal stratification in secondary school choice. Higher SES students tend to avoid schools only offering pre-vocational education (and no general tracks), prefer schools only offering pre-university education (and no lower tracks), and tend to choose schools with heterogeneous ability classes at intermediate performance levels. There is no strong evidence for the relation between social capital and the type of school chosen. A notable exception is that lower SES students are more likely to avoid pre-vocational schools when integrated into resource-rich parental networks. This suggests that such networks may play a compensatory role, steering students away from decisions that may impede upward track mobility.

Sjoerd van Alten (VU Amsterdam – School of Business and Economics), Leandro Carvalho (University of Southern California), Titus Galama (VU Amsterdam – School of Business and Economics), A Chip Off The Old Block? Genetics and The Intergenerational Transmission of Socioeconomic Status

We study how one generation’s genetics affects the next generation’s economic prospects. We proxy genetic propensity for education by a polygenic index (PGI) for educational attainment. We study the causal impact of this PGI of parents on the socioeconomic outcomes of their children, exploiting the randomization of genetic material at conception, by conditioning on the PGIs of grandparents. To this end, we merge genetic data from the Dutch Lifelines Biobank to annual tax records for the 2006-2021 period. A parent’s genetic makeup influences their children’s education, income, and wealth later in life. These effects are relatively large when compared to the impacts of parent’s genetics on parent’s own outcomes. Effect sizes are also comparable to existing estimates of causal effects of family environment. We study the channels through which parent’s genetics influence their children’s SES. Genetic transmission accounts for at least 2/3 of the effects on education and income, but for only 1/3 of the effect on net wealth. The remaining part of the effect likely arises because parental genes influence the family environment of these children, even when these genes are not transmitted to the child at conception.

Sander de Vries (VU Amsterdam – School of Business and Economics), Measuring Intergenerational Persistence with Many Variables

Using administrative data on complete cohorts of children born in the Netherlands, I relate children’s outcomes to extremely rich information about the parents. As a baseline, I relate children’s incomes to their parents’ incomes only and show that intergenerational income mobility in the Netherlands is lower than previously thought. Combining all information in a machine learning approach, I find that intergenerational income dependence is strongly underestimated when only parents’ incomes are used, in particular for disadvantaged children. The gain in explanatory power is even greater for children’s completed education or criminal behavior. There are no strong gender differences in intergenerational dependence. Finally, neighborhoods, schools, migration background, and extended families explain only a small portion of the variation, suggesting that parental inputs are the key determinant of intergenerational transmission.

Science of Science

Chair: Mathilde Theelen

Room: Expedition

Anne Maaike Mulders (Radboud University Nijmegen – Faculty of Social Sciences), Lanu Kim (Korea Advanced Institute for Science and Technology (KAIST)). Bridging Gaps: Gender, Interdisciplinary Research, and the Path to the Professoriate
Maarten Bosten (Tilburg University, Tilburg School of Humanities and Digital Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences). Shedding light on potential conflicts of interest in Natural Language Processing research
Ana Macanovic (Radboud University Nijmegen – Faculty of Social Sciences), Bas Hofstra (Radboud University Nijmegen – Faculty of Social Sciences). Gendered Accumulation of Fame: Effects of Gender and Network Spillovers on the Representation of Dutch Professors in Media

Abstracts

Anne Maaike Mulders (Radboud University Nijmegen – Faculty of Social Sciences), Lanu Kim (Korea Advanced Institute for Science and Technology (KAIST)), Bridging Gaps: Gender, Interdisciplinary Research, and the Path to the Professoriate

Despite efforts to improve retention of female researchers in universities, women remain underrepresented among full professors in the Netherlands and Europe. Previous explanations for higher attrition rates among women in academia have focused on differing experiences and gender disparities in research output. However, fewer studies have explored qualitative differences in research by men and women. One factor in achieving professorship could be interdisciplinary research. Interdisciplinary research has been commended for its potential to solve complex problems and foster innovation, yet it may be challenging to execute interdisciplinarity in practice and interdisciplinary scholars can experience more difficulties gaining peer-recognition.
This paper investigates gender differences in research interdisciplinarity and its impact on academic careers. We examine whether interdisciplinarity mediates the relationship between gender and achieving full professorship, and if its effects vary between men and women. Our study uses data from scholars who earned doctorate at Dutch universities between 1990-2014, linked to a database of professors in the Netherlands from 2004-2019.
The timed professor data enable us to determine at what point in the career scholars obtain a professorship, and so we use event history analysis to account for variations in the likelihood of becoming a professor based on the year of PhD receipt and career age. While previous analyses of the (gendered) effects of interdisciplinarity on becoming a professor have been limited to a single discipline, our study uniquely examines career consequences of interdisciplinarity across all academic disciplines. To assess the impact of interdisciplinarity across the career, we include a measure of interdisciplinarity during the PhD (using a topic model applied to dissertation abstracts), as well as a yearly measure of interdisciplinarity post-PhD (by cross-linking our scholars’ publications with the OpenAlex publications database containing associated topics).

Maarten Bosten (Tilburg University, Tilburg School of Humanities and Digital Sciences), Bennett Kleinberg (Tilburg University, Tilburg School of Social and Behavioural Sciences), Shedding light on potential conflicts of interest in Natural Language Processing research

Natural Language Processing (NLP) has been hugely influential in the past decade and advancements in generative language models now reach everyday live. Oftentimes, the work underpinning new NLP technologies involves companies who employ academic researchers and publish at academic venues. In this talk, we look at the entanglement of academic research and industry involvement in a near complete dataset of published, peer-reviewed NLP research. Potential conflicts of interest received attention in areas such as medical research but little is known about the problem in NLP research, despite its potentially far-reaching consequences, such as biasing research outcomes and steering a research agenda. We retrieved all published work under the Association of Computational Linguistics (ACL) conferences starting from 2000 and categorized authors as either industry-affiliated (e.g., through tech companies) or not. Our analysis of 47k papers revealed that across the whole time span more than one in four papers (26.85%) have at least one industry-affiliated author, with an increase of industry involvement to currently more than one in three papers. We further show which specific outlets are particularly attractive for industry-affiliated authors and explore characteristics that differ between academic and industry-affiliated work. We discuss implications and suggest ways to mitigate negative consequences for NLP research.

Ana Macanovic (Radboud University Nijmegen – Faculty of Social Sciences), Bas Hofstra (Radboud University Nijmegen – Faculty of Social Sciences), Gendered Accumulation of Fame: Effects of Gender and Network Spillovers on the Representation of Dutch Professors in Media

Engagement of scientists with the general public is becoming increasingly important, with experts yielding considerable power in setting the public agenda. Yet, time in the public spotlight has historically been unequally divided between men and women. The underrepresentation of women in the public sphere can exacerbate the stereotypes positing science as a domain more suitable for men. At the same time, some have suggested that online spaces scientific engagement with the public could empower underrepresented groups. The question of gender media representation is further complicated as gender inequalities persist in terms of scientists’ academic performance and their collaboration networks. We advance the existing literature on gender representation in the media by asking which scientists are present across various public domains and explore whether gender differences in media representation persist when accounting for the indicators noted above.
We curate a fine-grained dataset of 6830 full professors in the Netherlands. For each professor, we reconstruct their academic careers through the metadata of 1.2 million academic publications, citations, and academic topics of expertise. Most uniquely, we crosswalk these data to detailed, time-stamped accounts of media attention, retrieving 784 thousand printed news mentions of professors from LexisNexis, 476 thousand mentions in online news, and attention received on Twitter (now X) using the Altmetric database. We further collect social network data on 1.35 million collaborators of the professors in our data, including collaborators’ publications, citations, and online media and Twitter/X attention.
Our results suggest that gender differences persist in printed news media but are less pronounced in the online sphere. Furthermore, scientific prominence and online attention of one’s collaborators are primarily associated with professors’ online attention, but this relationship is not moderated by gender. These findings help us understand the dynamic media landscape that men and women scientists are embedded in.

14.00-14.30 – Coffee break and poster presentations

14.30-15.30 – Parallel Session 3

FAIR practices

Chair: Angelica Maineri

Room: Mission 1

Shuai Wang (VU Amsterdam – Faculty of Science), Angelica Maineri (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Ronald Siebes (VU Amsterdam – Faculty of Science). Use Cases of FAIR Implementation Profiles in Social Sciences and Humanities
Dorien Huijser (Utrecht University – Faculty of Social Sciences), Otto Lange (Utrecht University – Faculty of Social Sciences), Coosje Veldkamp (Utrecht University – Faculty of Social Sciences). A metadata catalogue for child development cohorts: the Connecting Data in Child Development project
Deborah Thorpe (Data Archiving and Networked Services (DANS)), Ricarda Braukmann (Data Archiving and Networked Services (DANS)). Improving the Transparency of Data Access Procedures

Abstracts

Shuai Wang (VU Amsterdam – Faculty of Science), Angelica Maineri (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Ronald Siebes (VU Amsterdam – Faculty of Science), Use Cases of FAIR Implementation Profiles in Social Sciences and Humanities.

The FAIR Implementation Profiles (FIPs) capture community decisions about data management in communities. In this presentation, we would like to show recent progress with creating FIPs for communities in social sciences and humanities. We present two use cases. In the first use case, we take the FIPs as a valuable resource as suggestions for data management plans (DMPs). By creating a mapping between questions in DMPs and questions in FIPs, we can extract the corresponding answers in FIPs as suggestions for researchers when they write their DMPs. In the second use case, we explore how FIPs can be used to capture decisions in research infrastructures. More specifically, we compare three FIPs: the ODISSEI portal, the LISS Data Archive, and the DANS Data Stations.

Dorien Huijser (Utrecht University – Faculty of Social Sciences), Otto Lange (Utrecht University – Faculty of Social Sciences), Coosje Veldkamp (Utrecht University – Faculty of Social Sciences), A metadata catalogue for child development cohorts: the Connecting Data in Child Development project

Over the course of decades, 6 longitudinal cohort studies previously united in the Consortium of Individual Development (CID; YOUth, L-CID, RADAR, TRAILS, Generation R, and NTR) have collected a wealth of multidisciplinary data, encompassing social, neural, psychiatric and physical development of thousands of children in the Netherlands. Due to their sensitivity, the collected data could not be made publicly available. In order to make the cohort data FAIR, a metadata catalogue and metadata schema were developed in the project “Connecting Data in Child Development” (CD2), which currently enlists over 1000 measures, and more to come. In this session, we will demonstrate the catalogue and how it was created, highlighting both technical and community challenges. Furthermore, we will discuss exchange with external infrastructures, among which ODISSEI, and the sustainability of the catalogue in the larger context of research infrastructures.

Deborah Thorpe (Data Archiving and Networked Services (DANS)), Ricarda Braukmann (Data Archiving and Networked Services (DANS)), Improving the Transparency of Data Access Procedures”

The benefits of making data available for reuse are recognized by many. While some datasets can be openly available, others contain sensitive or personal data that needs to be protected. Many repositories, including the DANS Data Stations, provide options to publish datasets restricted access: Access is only granted after a request is evaluated by the data owner.
To optimally facilitate reuse, however, detailed information about access procedures and requirements need to be available. Currently, it is often difficult for a user to assess whether they are eligible to reuse data and under what conditions. Requirements vary and may include requests for motivations, or limitations on the reuse purpose.
Providing standardized ways to describe data access requirements is one of the goals of the ODISSEI Portal work. In the ODISSEI Portal, metadata from various Dutch social science data providers can be found in one interface. The Portal aims to also provide standardized and machine-actionable information about data access conditions and broker access to datasets through the Data Access Broker (DAB).
To better understand and describe access conditions, ODISSEI and DANS conducted a survey that aimed to enhance and refine our knowledge of common practices. The survey results will aid us to identify common conditions that should be included in our standardization effort. Moreover, the results will improve our available guidance – for instance the Guidebook for Depositing Restricted Access and related Data Access Protocol templates for depositors.
In this presentation, we showcase the results of the survey outlining the most applied data access restrictions. We discuss the resulting recommendations and next steps toward standardization. Overall, we hope to inspire the adoption of a more structured description of data access conditions and procedures to enhance the ‘A’ of FAIR (Findable, Accessible, Interoperable, Reusable): providing the exact conditions under which the data are accessible.

Approaches to studying health

Chair: Wojtek Jablonski

Room: Quest

Richard Karlsson Linnér (Leiden University – Faculty of Law). Large-Scale Simulation of Population Disease Risk using High-Performance Computing
PingPing Song (Erasmus University Medical Center), Luc Coffeng (Erasmus University Medical Center), Tom Emery (Erasmus University Rotterdam, School of Social and Behavioural Sciences). Modeling population networks in COVID-19 transmission: insights from the Dutch population and implications for future pandemic
Nadya Ali (Tilburg University, Tilburg School of Social and Behavioural Sciences), Marion van den Heuvel (Tilburg University, Tilburg School of Social and Behavioural Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences). A population-based study of the effects of poverty on maternal and child perinatal health

Abstracts

Richard Karlsson Linnér (Leiden University – Faculty of Law), Large-Scale Simulation of Population Disease Risk using High-Performance Computing

Genomic risk prediction is an emerging technology that holds great promise to revolutionize precision medicine and population screening programmes. However, there is much academic disagreement across fields and disciplines about the future relevance and societal impact of new genetic testing technologies. This project aims to inform this debate by generating a harmonized scientific resource of the disease risk distributions that future genetic tests could eventually predict. Data-informed simulation of population disease risk will be conducted using large-scale biobank data and high-performance computing infrastructures. The distributions will cover a wide range of common illnesses in a series of standardized scenarios. The resource will be disseminated to the scientific community as a way to accelerate health, social-science, and policy research on the future implications of genetic testing technologies. A key objective is to evaluate the added value of genetic tests in addition to already established risk factors, e.g., to help determine their economic value for healthcare or health-contingent decisions (e.g., insurance, or retirement). Moreover, the resource can guide forward-looking legislation by elucidating a stationary target of the future predictive accuracy, which has implications for public health policy, consumer protection law, public insurance systems (e.g., health insurance or pension systems), and beyond. This presentation will summarize the project plan, present preliminary results, and discuss how the resource can enrich research in the health and social sciences.

PingPing Song (Erasmus University Medical Center), Luc Coffeng (Erasmus University Medical Center), Tom Emery (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Modeling Population Networks In Covid-19 Transmission: Insights From The Dutch Population And Implications For Future Pandemic

INTRODUCTION: The COVID-19 pandemic has demonstrated the important role of population networks in the spread of infectious disease outbreaks. Clustering of risk factors and behaviours has made some groups more vulnerable to the negative consequences of the pandemic, as well as the interventions that aim to control it. This has caused both new, and reinforced existing social inequalities, related to poverty and pre-existing health issues.
METHOD: In this ongoing project, we will synthesize evidence on how data on population networks and mobility can be translated to inform infectious disease models. We are now investigating a comprehensive large-scale population network dataset derived from the Statistics Netherlands. This dataset encapsulates all Dutch residents’ demographic profiles and geographical clusters such as households, schools, workplaces, neighborhoods, and the broader community, accessed via the ODISSEI project. The mobility patterns will be derived from a mapping of individuals’ registered home addresses and their frequently visited places, such as schools or workplaces. Agent-based models will be developed to reproduce the disease transmission and to evaluate the effectiveness of interventions, while considering the population network complexity.
RESULT: The overall goal of this project is to investigate the impact of population connectivity on societal inequalities in transmission and control of COVID-19 in the Netherlands, and to enhance society’s readiness for future pandemics.
DISCUSSION: This project will leverage the evidence translated from several large-scale Dutch national datasets, to lay the foundations for future real-time pandemic modelling, and to provide strategies to more effectively address the disparities amplified by clustering of social inequalities in future epidemic responses.

Nadya Ali (Tilburg University, Tilburg School of Social and Behavioural Sciences), Marion van den Heuvel (Tilburg University, Tilburg School of Social and Behavioural Sciences), Erik-Jan van Kesteren (Utrecht University – Faculty of Social Sciences), A Population-Based Study Of The Effects Of Poverty On Maternal And Child Perinatal Health

Background. Nearly 16% of children around the world live in poverty and those born into it are especially susceptible to its effects. Although poverty is consistently linked to adverse health outcomes, it is a heterogenous phenomenon and depending on the context its associated risk factors can differ both in nature and in severity. Interventions during the first years of life can be both effective and cost efficient. Given the complexity of context-specific poverty risk factors and their interactions with each other, the current project takes a data-driven approach to identifying intervention targets that can aid policymakers. The goal of the study is to investigate the association between family poverty and maternal and infant health in The Netherlands, as well as potential mediating intervention targets. Method. Data used for this study is part of a large population-based dataset consisting of records from official healthcare providers and the Central Bureau for Statistics in The Netherlands. It covers a time period of 10 years (2010-2020) with ~2.6 million pregnancies. We take a two-step approach to the analysis to address the goals of the research question – 1) a descriptive analysis of relevant poverty-related factors in regards to child health; 2) causal pre-registered confirmatory mediation analyses of theory-informed mediators of the effects of poverty on child birth outcomes. After a conceptual pre-registration of the hypotheses we split 10% of the data for feasibility assessment of the planned analyses. Following the 10% holdout sample, an analytic pre-registration of the methods will be submitted, after which analyses will be conducted on the remaining 90% of the cases. Additional robustness tests will also be conducted, as well as analyses replicating the results using different variables to account for income, such as education, which can often be the only proxy for socioeconomic status that researchers have access to.

Boosting the potential of administrative data

Chair: Javier Garcia Bernardo

Room: Progress

Ahmad Hesam (Statistics Netherlands (CBS)), Frank Pijpers (Statistics Netherlands (CBS)). Country-Wide Agent-Based Epidemiological Modeling Using 17 Million Individual-Level Microdata
Christian Fang (Statistics Netherlands (CBS)), Qixiang Fang (Utrecht University – Faculty of Social Sciences), P-values and register data: Perfectly sensible or perfect nonsense?

Abstracts

Ahmad Hesam (Statistics Netherlands (CBS)), Frank Pijpers (Statistics Netherlands (CBS)), Country-Wide Agent-Based Epidemiological Modeling Using 17 Million Individual-Level Microdata

Calibration is a crucial step in developing agent-based models. Agent-based models are notorious for being difficult to calibrate as they can express various degrees of freedom when model parameters are unknown. Models that appear correctly calibrated to match macro-level observed data perform poorly when micro-level insights need to be inferred. As a result, policymakers cannot be certain that an agent-based model can accurately describe the dynamics of the real-world phenomena that the model tries to mimic. To begin tackling this challenge, we developed a methodology for an epidemiological use case at a full population scale of 17 million agents to observe the effects of using microlevel data for the calibration on the accuracy of the microlevel model outcomes. We show that by calibrating a model on national statistics, but using individual-level microdata, we can on average get 36% more accurate model outcomes on a subnational level. Our model implementation performs two orders of magnitude faster than prior work and allows efficient calibration on HPC computer systems.

Christian Fang (Statistics Netherlands (CBS)), Qixiang Fang (Utrecht University – Faculty of Social Sciences), P-values and register data: Perfectly sensible or perfect nonsense?

Quantitative social science research increasingly uses register data, due to its ready availability, relatively low cost, and, crucially, full population coverage. Despite register data not being a random sample from a population – but full population data – researchers routinely calculate inferential statistics such as standard errors, confidence intervals, and p-values in studies based on register data. Inferential statistics, however, are meant for sample data, enabling probabilistic inferences from samples to populations. When data has full population coverage, the use of inferential statistics becomes, at the very least, questionable. We argue that inappropriate use of inferential statistics, especially significance tests, on register data without sufficient justification can lead to overstating the importance of “significant” effects, incorrectly dismissing the existence of “insignificant” effects, and baseless generalisations beyond the spatiotemporal context of the study. This study serves three purposes. First, we seek to refamiliarize social scientists with what inferential statistics are, how they are calculated, and how they were originally intended to be used. Second, we discuss two competing theoretical perspectives – sampling theory and superpopulation theory – why the assumption of random sampling is likely violated in social science research with register data, and why inferential statistics likely has no meaning in the presence of register data. Third, we investigate the status quo of using inferential statistics with register data by conducting a scoping review of 46 empirical studies. These studies used register data and were published in four leading sociological journals in 2020-2023. We find that 80.4% of the reviewed papers applied inferential statistics, but only 13.5% attempted some form of justifications ranging from incorrect to tenuous. We close this contribution with practical guidelines on the use of descriptive and inferential statistics with different types of data in social science research.

Advances in survey methodology

Chair: Danielle McCool

Room: Royal Lobby

Camilla Salvatore (Utrecht University – Faculty of Social Sciences), Bella Struminskaya (Utrecht University – Faculty of Social Sciences), Silvia Biffignandi (University of Bergamo (Retired)). Bayesian Data Integration: Enhancing Inference and Reducing Costs
Angelo Moretti (Utrecht University – Faculty of Social Sciences), Camilla Salvatore (Utrecht University – Faculty of Social Sciences). The Use of New Data Sources in Small Area Estimation of Attitudes towards Climate Change
Emmeke Aarts (Utrecht University – Faculty of Social Sciences). Extracting personalized latent dynamics over time in psychological and behavioral processes utilizing the (multilevel) hidden Markov model

Abstracts

Camilla Salvatore (Utrecht University – Faculty of Social Sciences), Bella Struminskaya (Utrecht University – Faculty of Social Sciences), Silvia Biffignandi (University of Bergamo (Retired)), Bayesian Data Integration: Enhancing Inference and Reducing Costs

Traditional probability sample surveys, the gold standard for population inference, are becoming increasingly expensive and suffer from declining response rates. Thus, moving towards less expensive, but potentially biased, non-probability sample surveys is becoming a more common practice.
We present a novel Bayesian approach to integrate a small probability sample with a larger online non-probability sample survey (possibly affected by selection bias) to improve inferences about logistic regression coefficients and reduce survey costs. The approach can be applied in different contexts. We provide examples from socioeconomic contexts (volunteering, voting behaviour, trust) as well as health contexts (smoking, health insurance coverage). Through simulations and real-data analyses we show that the Mean Squared Errors of regression coefficients are generally lower when implementing data integration with respect to the case of no data integration. Also, using assumed costs for probability and non-probability samples, we show that potential cost savings are evident.
This work is supplemented with a Shiny web app where interactive cost-analyses can be performed as guidance for researchers that are interested in applying the method.
Additionally, we discuss the possibility of extending this approach to smart surveys and those that collect both self-reports and objective measurements.

Angelo Moretti (Utrecht University – Faculty of Social Sciences), Camilla Salvatore (Utrecht University – Faculty of Social Sciences), The Use of New Data Sources in Small Area Estimation of Attitudes towards Climate Change

Climate change is a global problem that has a significant impact on the world’s economy and society. To effectively address climate change, policymakers require reliable estimates of relevant indicators measuring attitudes towards climate change at a sub-national level, given that these vary at a geographical level. Measuring public attitudes towards climate change is crucial in order to investigate the collective action towards sustainable practices. However, nationally representative sample surveys collecting variables around these phenomena, e.g., the European Social Survey (ESS), are not usually designed for producing accurate and precise estimates at sub-national level. In this work, we propose to use small area estimation techniques to obtain reliable estimates of attitudes towards climate change at regional level based on the ESS. The key idea of small area estimation models is to “borrow strength” from the other areas and auxiliary information based on administrative data or the Census, to improve the survey-based estimates. In recent years, the integration of digital trace data (e.g. from websites, social media, google trends) with survey data has gained importance. A novel aspect of our approach is that we include non-traditional auxiliary information, specifically web data, into our model. Our results demonstrate that incorporating web data, in some cases, yields more reliable estimates than the model without them. The results are assessed and discussed via model selection and diagnostics. Finally, we also acknowledge and address certain limitations associated with the use of web data in small area estimation.

Emmeke Aarts (Utrecht University – Faculty of Social Sciences), Extracting personalized latent dynamics over time in psychological and behavioral processes utilizing the (multilevel) hidden Markov model

Facilitated by technological advances such as smartphones, smartwatches, and sensors, it has become relatively easy and affordable to collect data on groups of individuals with a high temporal resolution. These high temporal data are ideal to study processes in daily life as they unfold in real time at an inter-individual level. Within these data, frequently behavioral researchers are interested in the dynamics over time in a construct indirectly measured using various indicators. For example, switches in stress levels, mood, or nonverbal communication. When this latent construct can be represented by mutually exclusive categories, for example a not, mildly, and severely stressed state or a depressive vs non-depressive state, the hidden Markov model (HMM) is a promising novel approach to investigate these latent dynamics over time. The HMM is a probabilistic, unsupervised, longitudinal machine learning method which uncovers empirically derived latent (i.e., hidden) states and the dynamics between these latent states over time. By extending the HMM to the multilevel (also known as mixed or random effects) framework, the model accommodates data of multiple individuals simultaneously – allowing for heterogeneity in model parameters – while estimating one overall group-level HMM. Hence, the multilevel HMM facilitates the study of personalized dynamics in psychological and behavioral processes and the study of individual differences herein.
The usefulness of the multilevel HMM in psychological and behavioral processes is illustrated using two examples. In the first example, data on nonverbal communication between patient-therapist dyads show that interpersonal interaction dynamics uncovered by the multilevel HMM were predictive of patient depression improvement. In the second example, ESM data on five cognitive, affective, and behavioral (CAB) factors unveiled four distinctive transdiagnostic CAB crisis phases and patient individual trajectories herein. In addition, the open-source implementation of the multilevel HMM provided by the R CRAN package mHMMbayes is introduced.

Geography of inequality

Chair: Lucas Spierenburg

Room: Mission 2

Eleanor Bale (University of Sheffield (Partnered with TU Delft)), Professor Gwilym Pryce (University of Sheffield), Ana Petrović (TU Delft – Faculty of Architecture and the Built Environment). Exploring the intersectional social frontiers in Amsterdam and the Hague
Ignacio Urria Yáñez (TU Delft – Faculty of Architecture and the Built Environment), Ana Petrović (TU Delft – Faculty of Architecture and the Built Environment), Maarten van Ham (TU Delft – Faculty of Architecture and the Built Environment). How residential contexts in the Netherlands change over space and time: Multiscale patterns of exposure to social, economic and demographic characteristics
Daria Dementeva (KU Leuven, Centre for Sociological Research), Cecil Meeusen (KU Leuven, Centre for Sociological Research), Bart Meuleman (KU Leuven, Centre for Sociological Research). Unravelling Uncertain Geographic Context Problem for Interethnic Group Relations: Do Intergroup Contact and Perceived Ethnic Threat Depend on How the Neighbourhood Context is Measured?

Abstracts

Eleanor Bale (University of Sheffield (Partnered with TU Delft)), Professor Gwilym Pryce (University of Sheffield), Ana Petrović (TU Delft – Faculty of Architecture and the Built Environment), Exploring the intersectional social frontiers in Amsterdam and the Hague

This paper provides the first empirical analysis of intersectional social frontiers – the borders between adjacent communities where there are steep differences in multiple socioeconomic characteristics simultaneously. There is currently no robust evidence on whether frontiers which occur on multiple dimensions cause a greater divide than single-dimension frontiers. There is also a gap in the evidence on whether adjacent communities which have a strong ethnic social frontier between them but are economically similar form relationships based on their similarities or whether their close proximity, results in an increased feeling of competition over potentially limited resources thus increasing territorial conflict. They remain an undeveloped area of research; meaning how to identify them, their development and evolution, and their prevalence within a city are currently unknown.
By identifying the borders where unemployment and ethnic social frontiers overlap in the regions of Amsterdam and the Hague in the Netherlands using the extensive Dutch registry data available from Statistics Netherlands, this paper explores the presence of intersectional social frontiers through new aggregate measures considering their stability, strength, and the exposure of residents. These measures will allow future researchers to consider the likelihood of a frontier having a long-term impact on residents with stronger, long lasting frontiers potentially impacting on an individual’s mental health and educational attainment along with the crime rates and household mobility of an area. This is the first time in a paper that frontiers have been identified in multiple regions over a twenty year period exploring their evolution and the factors within cities which result in the frontiers developing or dissipating. The discussion surrounding variation between cities is a potentially crucial opportunity for developing policy aimed to reduce social frontiers and increase positive contact between contrasting communities.

Ignacio Urria Yáñez (TU Delft – Faculty of Architecture and the Built Environment), Ana Petrović (TU Delft – Faculty of Architecture and the Built Environment), Maarten van Ham (TU Delft – Faculty of Architecture and the Built Environment), How residential contexts in the Netherlands change over space and time: Multiscale patterns of exposure to social, economic and demographic characteristics

Understanding the sociospatial context to which individuals are exposed to is at the heart of research on inequalities in cities. Exposure to sociospatial contexts, including various spatial scales, can affect people’s life courses. In segregated urban spaces, the attributes that constitute this sociospatial context are not evenly distributed over space and across scales. Against this background, previous research characterised sociospatial contexts based on multiscale measures of population composition. In particular, studies have concentrated on the extent to which residents are exposed to specific population groups within their sociospatial context. However, less attention has been given to characterising sociospatial contexts based on exposure patterns to multiple contextual attributes and understanding how these patterns change over time. This study presents a novel approach to analyse how sociospatial contexts change across spatial scales using multiple social, economic and demographic characteristics. We propose a scalable method to compare the patterns of exposure to different contextual attributes between residential locations, both within and between urban areas over time. Using georeferenced administrative data from Statistics Netherlands (CBS), covering the entire Dutch population from 1999 to 2022, we classify the spatial contexts using attributes measured at various spatial scales. We apply a cluster classification to characterise the representative types of sociospatial contexts present across all major cities in the country and examine the specific sequences of transitions between different cluster types that residential locations exhibit across scales. The spatio-temporal analysis of these patterns of exposure will improve our understanding of the specific geographical scale at which different socioeconomic and demographic groups are clustered in space and how this has changed over time.

Daria Dementeva (KU Leuven, Centre for Sociological Research), Cecil Meeusen (KU Leuven, Centre for Sociological Research), Bart Meuleman (KU Leuven, Centre for Sociological Research), Unravelling Uncertain Geographic Context Problem for Interethnic Group Relations: Do Intergroup Contact and Perceived Ethnic Threat Depend on How the Neighbourhood Context is Measured?

Neighborhoods are crucial in shaping interethnic group relations. However, evidence on which neighbourhood indicators matter and how they impact intergroup relations is mixed, often showing opposite effects. A plausible reason for these inconsistencies is the choice of geographical units used to approximate neighbourhood context and aggregate contextual indicators. This suggests the relevance of the uncertain geographic context problem for intergroup relations.
This study examines the extent to which the definition of neighbourhood context influences findings regarding interethnic group relations in Belgium, defined as intergroup contact and perceived ethnic threat among ethnic majority group members.
Our study employs an innovative multi-geodata source approach, combining intergroup contextual indicators from traditional census data, such as socioeconomic status, ethnic diversity, and population density, with spaces of interethnic encounter constructed from big geodata from OpenStreetMap, and measures of residential and non-residential mobility obtained from geotagged Twitter/X data. We define intergroup contextual indicators based on custom-defined circular buffers of different resolutions and administrative geographies using residential pseudo-addresses from the 2020 Belgian National Election Study.
We apply traditional statistical and machine learning techniques to exemplify the uncertainty induced by contextual measurements for intergroup relations concerning the employed definition of neighbourhood context.
We further discuss the methodological importance of neighbourhood context definition and the aggregation of related contextual indicators. This paper aims to provide a primer on conceptual and methodological framework for addressing the uncertain geographic context problem for interethnic group relations.

Making families

Chair: Pearl Dykstra

Room: Expedition

Nicolás Soler (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Tom Emery (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Agnieszka Kanas (Erasmus University Rotterdam, School of Social and Behavioural Sciences). Mapping the family safety net: A population-scale network perspective on the availability of family support for parents with young children in the Netherlands
Sofia Gil-Clavel (TU Delft – Faculty of Technology, Policy, and Management), Clara H. Mulder (Faculty of Spatial Sciences, University of Groningen, Groningen). Does Twitter data mirror the European North-South family ties divide? A comparative analysis of tweets about family
Xiao Xu (Netherlands Interdisciplinary Demographic Institute (NIDI)), Anne Gauthier (Netherlands Interdisciplinary Demographic Institute (NIDI)), Gert Stulp (University of Groningen – Faculty of Behavioural and Social Sciences). Do People Really Know Their Fertility Intentions? Predicting Self-Reported Fertility Intentions with Open-ended Responses

Abstracts

Nicolás Soler (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Tom Emery (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Agnieszka Kanas (Erasmus University Rotterdam, School of Social and Behavioural Sciences), Mapping the family safety net: A population-scale network perspective on the availability of family support for parents with young children in the Netherlands

Families are important sources of social support for their members. For parents with young children, family members can provide valuable help with the coordination of care, work, and other responsibilities by acting as childminders. However, such family support is not always available: family conflicts, overburdened grandparents, and geographical distance can leave parents without family help. In this contribution, we leverage population-scale family network data sourced from administrative registers to map the availability for family support for all parents with young children in the Netherlands. We do so by representing families as networks of households that are connected to each other by parent, child, and partner relationships. We argue that the characteristics of such networks condition the availability of family support. To make this point, we empirically describe the family networks of parents in terms of their size, structure, and sociodemographic composition, reflecting on the implications for the availability of support for parents with different socioeconomic status. In this way, we attend to how the exchange of support in families takes place both within households, which act as central spaces of social reproduction, and between households, as family members coordinate beyond household walls to meet their needs, wants, and obligations. We then reflect on how our approach complements previous research to help us theorise and study families.

Sofia Gil-Clavel (TU Delft – Faculty of Technology, Policy, and Management), Clara H. Mulder (Faculty of Spatial Sciences, University of Groningen, Groningen), Does Twitter data mirror the European North-South family ties divide? A comparative analysis of tweets about family

Previous research on the relationship between geographical distance and the frequency of contact between family members has shown that the strength of family ties differs between Northern and Southern Europe. However, little is known about how family ties are reflected in peoples’ conversations on social media, despite research showing the relevance of social media data for understanding users’ daily expressions of emotions and thoughts based on their immediate experiences. This work investigates the question of whether Twitter use patterns in Europe mirror the North-South divide in the strength of family ties by analyzing potential differences in family-related tweets between users in Northern and Southern European countries. This study relies on a longitudinal database derived from Twitter collected between January 2012 and December 2016. We perform a comparative analysis of Southern and Northern European users’ tweets using Bayesian generalized multilevel models together with the Linguistic Inquiry and Word Count software. We analyze the association between regional differences in the strength of family ties and patterns of tweeting about family. Results show that the North-South divide is reflected in the frequency of tweets that are about family, that refer to family in the past versus in the present tense, and that are about close versus extended family.

Xiao Xu (Netherlands Interdisciplinary Demographic Institute (NIDI)), Anne Gauthier (Netherlands Interdisciplinary Demographic Institute (NIDI)), Gert Stulp (University of Groningen – Faculty of Behavioural and Social Sciences), Do People Really Know Their Fertility Intentions? Predicting Self-Reported Fertility Intentions with Open-ended Responses

Fertility intention data from surveys often serve as a crucial component in modeling fertility behaviors. Yet, the persistent gap between stated intentions and actual fertility decisions, coupled with the prevalence of uncertain responses, has cast doubt on the overall utility of intentions and sparked controversies about their nature. In this study, we use survey data from a representative sample of Dutch women. With the help of open-ended questions (OEQs) on fertility and Natural Language Processing (NLP) methods, we are able to conduct an in-depth analysis of fertility narratives. Specifically, we train neural text classification models to predict the fertility intentions of respondents from the LISS panel. We tested the model with and without the newly added OEQ responses, which gave us insights into the effectiveness of the addition of textual data and the most impactful factors/narratives influencing fertility intentions. In addition, we annotate the (expert) perceived fertility intentions of respondents and compare them to the predictions made by the models. Through this analysis, we aim to reveal the connections and disparities between self-reported intentions and the narratives reflected in OEQs. Furthermore, by comparing the outputs of machine learning models and experts’ annotations, we evaluate the performance of various large language models (LLMs) on survey data and the added value of including textual data in modeling social phenomena.

15.30-15.45 – Refreshments and transfer to the Progress room

15.45-16.00 – Award Ceremony

ODISSEI Poster Award
Predicting Fertility Benchmark Challenge Award

16.00-17.00 – Closing Keynote

A closing keynote by Guido Imbens, Nobel Prize laureate, and professor of economics at Stanford University.

Modern Experimentation

Abstract

Randomized experiments have become an indispensable part of the toolkit of modern economics. They allow us in settings with complex incentives to estimate causal effects with great credibility. In this presentation, I will discuss some influential experiments. I will also discuss a framework for analyzing experiments and experimental design for online experiments. In this latter part, I will discuss some novel experimental designs in marketplaces and other settings with complex interactions.

17.00-18.00 – Drinks

All day – ODISSEI Marketplace

ODISSEI Marketplace

The ODISSEI Marketplace is an initiative launched last year to provide ODISSEI partners and infrastructure providers with a platform to showcase their facilities and resources. This format enables researchers to explore a wide range of tools, services, and opportunities available through ODISSEI, fostering connections and facilitating access to valuable research infrastructure. The Marketplace will be operating throughout the day and host the following partners and facilities:

DANS
ODISSEI Portal
PORT & Data Donation
SURF & SANE
FIRMBACKBONE Infrastructure
CBS
Centerdata & LISS panel
TDCC-SSH
Netherlands eScience Center
ODISSEI SoDa Team
SSHOC-NL project: the collaborative initiative of ODISSEI & CLARIAH