ODISSEI Conference for Social Science in the Netherlands 2023

Deze conferentie nodigt de sociaal-wetenschappelijke gemeenschap uit om samen te discussiëren over onderwerpen als data, methodes, infrastructuur, ethiek en theoretisch werk over digitale en computationele methodes in sociaal-wetenschappelijk onderzoek. ODISSEI, de onderzoeksinfrastructuur voor sociale wetenschappen in Nederland, brengt onderzoekers samen met de noodzakelijke data, expertise en middelen om baanbrekend onderzoek te verrichten en de computationele ontwikkelingen in de sociale wetenschap te versterken.

Registratie conferentie: Vanwege de capaciteit van de locatie is de inschrijving inmiddels gesloten. Als u geïnteresseerd bent in deelname, neem dan contact op via communications@odissei-data.nl om de mogelijkheden te bespreken.

Datum conferentie: 2 november 2023

Locatie: Media Plaza (Jaarbeurs), Utrecht

Contactcommunications@odissei-data.nl

Streaming: vanwege de grote belangstelling voor deelname wordt de plenaire zaal van de Conferentie (Progress) live gestreamd.

Het pdf-programma van de conferentie vindt u hier.

Programme

Hieronder vindt u het voorlopige programma en de samenvattingen.

With coffee and tea.

Linnet Taylor is Professor of International Data Governance at the Tilburg Institute for Law, Technology, and Society (TILT), where she leads the ERC-funded Global Data Justice project. Her research focuses on how new sources of digital data are impacting governance, research on human and economic development, and political representation.

The title of the presentation: ‘The god’s eye view? Remote data, power and (data) justice’.

Room: Progress

Chair: Pearl Dykstra

Abstract

This talk will explore the societal implications of the ongoing shift toward linked and enriched social data in research. What are the risks and concerns of the new data sources and analytical practices involved, and are current disciplinary and legal rules sufficient safeguards? Principles of data justice will be explored as benchmarks for the beneficent use of population data.

ODISSEI Flashtalks will provide an introduction to the core ODISSEI facilities and their support to the researchers.

  • Making sensitive data available via SANE
    Lucas van der Meer, ODISSEI, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)
  • Showcase of the DANS Data Station for the Social Sciences and Humanities
    Ricarda Braukmann, Data Archiving and Networked Services (DANS); Jetze Touber, Data Archiving and Networked Services (DANS)
  • Searching data using of the ODISSEI Portal
    Angelica Maria Maineri, ODISSEI, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); The ODISSEI Portal Team
  • ODISSEI SoDa fellowship and SoDa Team support
    Erik-Jan van Kesteren, Utrecht university
  • Exploring the significance of employment for the chronically ill. Showcase of the value of linking CBS microdata to survey data from the National Panel of the Chronically Ill and Disabled and the Dutch Healthcare Consumer Panel.
    Annette Scherpenzeel, Netherlands Institute for Health Services Research (Nivel); Anne Brabers, Netherlands Institute for Health Services Research (Nivel)

Abstracts

Making sensitive data available via SANE
Lucas van der Meer, ODISSEI, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)
Privacy, copyright, and competition barriers limit the sharing of sensitive data for scientific purposes. We propose the Secure Analysis Environment (SANE): a virtual container in which the researcher can analyse sensitive data, and yet leaves the data provider in complete control. By following the Five Safes principles, SANE will enable researchers to conduct research on data that up until now are hardly available to them. SANE comes in two variants. Tinker SANE allows the researcher to see, manipulate and play with the data. In Blind SANE, the researcher submits an algorithm without being able to see the data and the data provider approves the algorithm and output. SANE uses concepts from the CBS Remote Access Environment, ODISSEI Secure Supercomputer and SURF Data Exchange, to build a generic off-the-shelf solution to be used by any sensitive data provider and researcher. SANE can be used by researchers in any discipline, as illustrated by the involvement of consortia in both the social sciences (ODISSEI) as well as humanities (Clariah). Potential sensitive data providers include the Dutch Chamber of Commerce (KvK), ING,, National Library of the Netherlands (KB) and Netherlands Institute for Sound and Vision (NISV).

Showcase of the DANS Data Station for the Social Sciences and Humanities
Ricarda Braukmann, Data Archiving and Networked Services (DANS); Jetze Touber, Data Archiving and Networked Services (DANS)
DANS is the Dutch national centre of expertise and repository for research data. Our domain-specific data archiving and publishing services enable researchers and data stewards to make their data FAIR and share them for reuse where possible. In this presentation, we would like to showcase our new repository: the DANS Data Station Social Sciences and Humanities (SSH). The release of the DANS Data Station SSH is part of the transition from our previous system EASY to a new repository system based on the open source software Dataverse. The Data Station has a number of new features and improvements that make it easier to archive, publish and find data:
– When depositing data, researchers have the option to restrict access to particular files if they cannot be openly shared. The data station also allows for new versions of a dataset if the research is updated.
– Detailed information about the datasets can be added in various metadata fields and we support SSH-specific vocabularies such as the European Language Social Science Thesaurus (ELSST) and vocabularies from CESSDA, the Consortium of European Social Science Data Archives.
– Datasets are assigned with Persistent Identifiers and the metadata is automatically transferred to the ODISSEI Portal and European data portals to make data findable for others.
– Finding relevant data in the Data Station is supported through full text and advanced search and various filter options. PDFs, images and videos can be previewed directly.
In this presentation, we will demo these and other features of the Data Station and guide the audience through the process of archiving and finding data for reuse. We will elaborate on the importance of archiving and publishing data in a trustworthy digital repository like our Data Station and how this supports FAIR data and Open Science.

Searching data using of the ODISSEI Portal
Angelica Maria Maineri, ODISSEI, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); The ODISSEI Portal Team
Datasets are often scattered across different institutional, national, and international repositories. In the case of CBS microdata, the only public information about available datasets used to be in pdf files, organised thematically, accessible only via the CBS website. This fragmented landscape limits the findability of new data sources a researcher might not be accustomed to, and makes the process of selecting an appropriate data source time-consuming and inefficient. The goal of the ODISSEI Portal is to enable and facilitate the discovery of social sciences datasets across various data providers in the Netherlands within a single interface. The Portal currently collects metadata from social science datasets available at DANS (the Dutch national centre of expertise and repository for research data), DataverseNL and the International Institute for Social History (IISH), from the LISS archive, and from the microdata catalogue available at Statistics Netherlands (CBS). Metadata of these collections is harmonised and enriched using a thesaurus which enables multilingual search, while the user interface allows users to search across all datasets available in these collections to find data relevant for their research. The prototype ODISSEI Portal is publicly available (https://portal.odissei.nl/) in order to be able to get feedback from the community on the design features and functionalities. During the demo presentation, after a brief introduction we will show the latest version of the Portal to the ODISSEI community. A virtual “suggestion box” will be made available to all participants in the session.

ODISSEI SoDa support and fellowship.
Erik-Jan van Kesteren, Utrecht university

The ODISSEI SoDa team helps social scientists with data intensive and computational research. If you work at an ODISSEI organization, we will even do this for free! Visit our booth at the conference so you can:

  • chat with us about your research, see how we can help you.
  • sign up for our upcoming workshops (e.g., supercomputing with the OSSC).
  • look at some projects we have done in the past.
  • try out creating synthetic data with our metasyn software.
  • ask us all about our new fellowship opportunity!

If you want to know more about what we do and how we do it, visit https://odissei-soda.nl.

Exploring the significance of employment for the chronically ill. Showcase of the value of linking CBS microdata to survey data from the National Panel of the Chronically Ill and Disabled and the Dutch Healthcare Consumer Panel.
Annette Scherpenzeel, Netherlands Institute for Health Services Research (Nivel); Anne Brabers, Netherlands Institute for Health Services Research (Nivel)
The prevalence of chronic conditions is expected to rise in the coming years due to population aging and factors like sedentary lifestyles. Our study focuses on the labor participation and quality of life of persons with chronic conditions compared to the general population. For this aim, we linked survey data from the National Panel of the Chronically Ill and Disabled (NPCD) and the Dutch Healthcare Consumer Panel (CoPa, a sample of the general population) with registration data on employment history from Statistics Netherlands (CBS). Both panels are managed by the Nivel.
Our findings showed that persons with chronic conditions, who have not yet retired, were less likely to be working over a four-year period than those without such conditions. Moreover, currently employed persons with chronic conditions experienced more unemployment and illness/disability benefit episodes in their recent history compared to those without chronic illnesses. Labor participation was lowest among persons with cardiovascular diseases, followed by diabetes and lung diseases. Employed persons with chronic conditions more often felt a sense of societal inclusion than non-working counterparts, although this varied depending on the specific type of disease.
In addition to the research findings, we evaluated the process of linking data from the Nivel panels and the CBS, as well as the added value of the linked data. A significant proportion of Nivel panel members consented to the data linkage, and nearly all panel members’ data were successfully linked to the microdata. Our analysis provided insights that could not be obtained from CBS microdata or panel data alone, such as insights into the relationship between chronic conditions, work history, and the sense of belonging in society. This marked the first linkage of Nivel panel data with CBS microdata, a succesful starting point for future projects.

Throughout the Conference Day, we invite you to attend the ODISSEI Marketplace from 9:00 AM to 5:00 PM. The Marketplace will present a chance to connect with ODISSEI partners and infrastructure providers and gain valuable insights into how you can elevate your research endeavors. Seize this opportunity and join us!

Room: TransitZone

Partners and infrastructure providers: 

  • LISS panel – Data Quality, Data Linkage and Innovative Measurement Projects in the LISS panel. Joris Mulder, Centerdata; Marcel Das, Centerdata
  • Portal  – Demo of the ODISSEI Portal. Angelica Maria Maineri, ODISSEI, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)
  • SANE  – Are you SANE? Research with sensitive data in the cloud. Martin Brandt, SURF; Annette Langedijk, SURF
  • ODISSEI SoDa  – Social Data Science Team: research support. Erik-Jan van Kesteren, Utrecht University
  • DANS  – DANS: Data Station for the Social Sciences and Humanities. Ricarda Braukmann, Data Archiving and Networked Services (DANS)
  • TDCC – Thematic Digital Competence Centres. Nils Arlinghaus, KNAW
  • ASreview –  An open source machine learning framework for efficient and transparent systematic reviews | Nature Machine Intelligence. Laura Hofstee, Utrecht University

Abstracts

Data Quality, Data Linkage and Innovative Measurement Projects in the LISS panel.
Joris Mulder, Centerdata; Marcel Das, Centerdata

Collecting survey data for your research seems to be a piece of cake nowadays. Free or low-cost online survey tools, platforms like Amazon’s Mechanical Turk or Prolific, or non-probability based self-registration panels are widely available. Although these tools or platforms can be useful for pilot studies or probing whether a certain phenomenon exist in the population, they are not very suitable for making reliable, representative population-level inferences.The LISS panel, which is based on a true probability sample drawn from the population registry by Statistics Netherlands, offers a high-quality online research infrastructure for academic researchers worldwide. It is therefore ideal for research where a good representation of the Dutch population is essential. Founded in 2007 and managed by research institute Centerdata, the panel is composed according to the highest scientific standards. Researchers can field their survey or experiment in the panel or can apply for the yearly LISS ODISSEI call for proposals for funded high-quality data collection. A major advantage is that all data collected in the LISS panel are made available through the LISS Data Archive and can easily be linked to other data, such as the annually fielded longitudinal LISS Core Study, which provides repeated measures for the same individuals and households on a broad range of topics. Furthermore, the data can be linked to registry data from Statistics Netherlands, further enriching your data. The LISS panel also offers data collection through innovative measurement projects. In this presentation we not only discuss data quality and data linkage, but also projects collecting data through wearable devices, speech-to-text technology, data donation of WhatsApp and Google Location data, and the integration of the open source oTree software, which allows for real-time and large-scale online behavioral experiments.

Demo of the ODISSEI Portal.
Angelica Maria Maineri, ODISSEI, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)

Datasets are often scattered across different institutional, national, and international repositories. In the case of CBS microdata, the only public information about available datasets used to be in pdf files, organised thematically, accessible only via the CBS website. This fragmented landscape limits the findability of new data sources a researcher might not be accustomed to, and makes the process of selecting an appropriate data source time-consuming and inefficient. The goal of the ODISSEI Portal is to enable and facilitate the discovery of social sciences datasets across various data providers in the Netherlands within a single interface. The Portal currently collects metadata from social science datasets available at DANS (the Dutch national centre of expertise and repository for research data), DataverseNL and the International Institute for Social History (IISH), from the LISS archive, and from the microdata catalogue available at Statistics Netherlands (CBS). Metadata of these collections is harmonised and enriched using a thesaurus which enables multilingual search, while the user interface allows users to search across all datasets available in these collections to find data relevant for their research. The prototype ODISSEI Portal is publicly available (https://portal.odissei.nl/) in order to be able to get feedback from the community on the design features and functionalities. During the demo presentation, after a brief introduction we will show the latest version of the Portal to the ODISSEI community. A virtual “suggestion box” will be made available to all participants in the session.

Are you SANE? Research with sensitive data in the cloud.
Martin Brandt, SURF; Annette Langedijk, SURF

Privacy, copyright, and competition barriers limit the sharing of sensitive data for scientific purposes. We propose the Secure ANalysis Environment (SANE): a virtual environment based on SURF Research Cloud in which the researcher can analyse sensitive data, and yet leaves the data provider in complete control. SANE comes in two variants. Tinker SANE allows the researcher to see, manipulate and play with the data. In Blind SANE, the researcher submits an algorithm without being able to see the data and the data provider approves the algorithm and output. In this live demonstration we showcase the successful prototype of the SANE environment that allowes researchers to analyse copyright-sensitive data of the Dutch National Library, that otherwise would remain unused. We will show both the “Tinker” variant, where the researcher can interact with the data, as well as the “Blind” variant, where only the algorithm has access.

ODISSEI Social Data Science Team
Erik-Jan van Kesteren, Utrecht university

The ODISSEI SoDa team helps social scientists with data intensive and computational research. If you work at an ODISSEI organization, we will even do this for free! Visit our booth at the conference so you can:

  • chat with us about your research, see how we can help you.
  • sign up for our upcoming workshops (e.g., supercomputing with the OSSC).
  • look at some projects we have done in the past.
  • try out creating synthetic data with our metasyn software.
  • ask us all about our new fellowship opportunity!

If you want to know more about what we do and how we do it, visit https://odissei-soda.nl.

DANS: Data Station for the Social Sciences and Humanities.
Ricarda Braukmann, Data Archiving and Networked Services (DANS)

DANS is the Dutch national centre of expertise and repository for research data. Our domain-specific data archiving and publishing services enable researchers and data stewards to make their data FAIR and share them for reuse where possible. In this presentation, we would like to showcase our new repository: the DANS Data Station Social Sciences and Humanities (SSH). The release of the DANS Data Station SSH is part of the transition from our previous system EASY to a new repository system based on the open source software Dataverse. The Data Station has a number of new features and improvements that make it easier to archive, publish and find data:
– When depositing data, researchers have the option to restrict access to particular files if they cannot be openly shared. The data station also allows for new versions of a dataset if the research is updated.
– Detailed information about the datasets can be added in various metadata fields and we support SSH-specific vocabularies such as the European Language Social Science Thesaurus (ELSST) and vocabularies from CESSDA, the Consortium of European Social Science Data Archives.
– Datasets are assigned with Persistent Identifiers and the metadata is automatically transferred to the ODISSEI Portal and European data portals to make data findable for others.
– Finding relevant data in the Data Station is supported through full text and advanced search and various filter options. PDFs, images and videos can be previewed directly.
In this presentation, we will demo these and other features of the Data Station and guide the audience through the process of archiving and finding data for reuse. We will elaborate on the importance of archiving and publishing data in a trustworthy digital repository like our Data Station and how this supports FAIR data and Open Science.

Thematic Digital Competence Centres (TDCC-NES, TDCC-LSH, TDCC-SSH)
Nils Arlinghaus, KNAW

Many research data professionals are already aware of the existence of the Digital Competence Centers (DCC), which offer local support to the data stewards within their institute. More recently, NWO has initiated and funded the launch of three Thematic Digital Competence Centres (TDCC-NESTDCC-LSHTDCC-SSH), as part of an assignment from the Ministry of Education, Culture and Science (OCW). These TDCCs are complimentary to the already existing local DCCs and focus on challenges that go beyond individual institutions. The TDCC-SSH will do so in two ways: By building and strengthening a national network, and by offering funding opportunities for collaborative, non-competitive projects that tackle domain-specific challenges. The Thematic DCC Social Sciences & Humanities will be present at the ODISSEI conference to connect and network with others in the research data landscape, to exchange ideas, and to explore potential collaboration opportunities. If you already want to read more about the TDCCs, visit www.TDCC.nl.

An open source machine learning framework for efficient and transparent systematic reviews | Nature Machine Intelligence
Laura Hofstee, Utrecht University

To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.

11.00-12.15 – Parallel Session 1

Chair: Marco Heilbich

  • Unveiling the Urban Divide: Novel Insights into Economic Segregation Using Fine-Grained Data
    Javier San Millán Tejedor, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE); Clémentine Cottineau, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE)
  • Spatiotemporal analysis of intersectional segregation in the Netherlands
    Ana Petrovic, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE); Maarten van Ham, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE); David Manley, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE)
  • Investigating the prevalence of social frontiers in the City of Amsterdam
    Eleanor Bale, University of Sheffield, Delft University; Duncan Lee, University of Glasgow; Gwilym Pryce, University of Sheffield; Aneta Piekut, University of Sheffield

Unveiling the Urban Divide: Novel Insights into Economic Segregation Using Fine-Grained Data
Javier San Millán Tejedor, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE); Clémentine Cottineau, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE)
Urban economic segregation is on the rise. As differences in wealth and income amplify in the contemporary era, so does the disparate concentration of poverty and affluence in distinct areas of our cities. Yet, this relationship is extremely intricate when empirically studied: contextual and contingent factors of particular cities play a notable role, some metropolitan areas exhibit unexpectedly increasing levels of income disparities coupled with decreasing urban segregation, and rises in economic inequality seem to be translated into space with a time lag. Overall, levels of urban segregation vary considerably within and between countries for reasons not very well understood. This convoluted empirical state of the field is partly derived from the absence of good quality data on indicators of urban economic segregation. Cross-country comparisons are usually performed using on decennial census data, analysis of economic segregation often need to employ questionable proxies, and geo-coded and longitudinal microdata are scarce. To address these limitations, this paper presents a novel and detailed characterization of urban economic segregation in the Netherlands. Taking advantage of granular and nation-wide microdata from the Dutch Statistical Agency (CBS), levels of income segregation in the biggest metropolitan areas of the country are calculated for every year between 2003 and 2020, using the Spatial Information Theory Index on household data available within very disaggregated areal units (100m x 100m cells), together with other spatial and aspatial measurements of segregation. Furthermore, this paper also delves into the study of social and spatial heterogeneity of urban economic segregation, distinguishing between the so-called segregation of affluence and segregation of poverty and analyzing the specific geographical patterns of segregation. The paper finally estimates the relationship between economic inequality and segregation through a fixed-effects regression model, that incorporates for the first time a time lag factor.

The spatial pattern of residential segregation in 8 European countries
Lucas Spierenburg, TU Delft – Faculty of Civil Engineering and Geosciences; Romuald Winandy-Proust, École nationale des travaux publics de l’État; Martin De Jaeghere, École nationale des travaux publics de l’État; Oded Cats, TU Delft – Faculty of Civil Engineering and Geosciences
In Europe, integrating immigrants and their descendants successfully is a priority for EU, national, and local authorities. In most European urban areas, this integration is threatened by the uneven spatial distribution of immigrants and their descent, hereafter called residential segregation. This phenomenon usually decreases the potential for interaction between families of immigrants and the rest of the population, hampering the integration process. This work contributes to a growing body of literature assessing the influence of urban development on residential segregation in European cities. We delineate segregated regions in 485 European urban cores and investigate how they overlap with the urban fabric, using spatial census data from 8 European countries. These regions are built by aggregating spatial units (100x100m2 squares) into regions that are homogeneous in terms of demographics, using agglomerative clustering. This allows us to go beyond traditional indicators capturing global properties of segregation patterns. Instead of summarizing such a pattern at the city level with a set of indicators, we identify regions that can be spatially compared to other spatial patterns in the urban fabric such as the fragmentation of space by physical constraints. In this work, we present two key findings. First, we observe a striking linear relationship between the size of segregated regions and the size of the city. Larger urban cores are characterized by larger segregated regions rather than more regions. Second, we show that residential segregation patterns clearly overlap with urban fragmentation in a substantial proportion of European cities, confirming the “wrong side of the track” phenomenon observed qualitatively in the literature.

Spatiotemporal analysis of intersectional segregation in the Netherlands
Ana Petrovic, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE); Maarten van Ham, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE); David Manley, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE)
Spatial segregation of socioeconomic and ethnic groups affects economic and social functioning of cities as integral urban systems as well as individual outcomes of people, such as income, education or health. Both causes and consequences of segregation include many different processes, such as those related to housing or labour markets, which occur at different spatial scales, ranging from small neighbourhoods to urban regions. Segregation at all these spatial scales changes over time, which becomes particularly relevant in the conditions of increasing economic inequalities, international and internal migration, and population aging. However, most of the empirical evidence about segregation in the Netherlands is cross-sectional and uses single spatial scales. Therefore, it is not clear at which spatial extents segregation is increasing or decreasing, and, therefore, also not straightforward what drives segregation trends and how to deal with the segregation in different places. Moreover, empirical evidence of segregation usually focusses on one specific characteristic of people, such as ethnicity or socioeconomic status, while the sociodemographics of people are more complex. Using individual-level register data from 1999 onwards, geocoded at 100m by 100m grid cells, on the OSSC (ODISSEI Secure Supercomputer), this paper investigates segregation trends in the Netherlands at multiple spatial scales, taking into account various sociodemographic characteristics of people.

Investigating the prevalence of social frontiers in the City of Amsterdam
Eleanor Bale, University of Sheffield, Delft University; Duncan Lee, University of Glasgow; Gwilym Pryce, University of Sheffield; Aneta Piekut, University of Sheffield
The steep and sharp boundaries between contrasting communities – social frontiers – are an under researched topic in social science research and yet they have the ability to strongly influence community cohesion, educational attainment and mental health. This methodological paper addresses some of the gaps in the current research – building upon the contribution of the Dean et al (2019) social frontier paper – by introducing three new frontier identification methods and testing their success in correctly identifying the ethnicity social frontiers – using the terms non-western and western – in Amsterdam. Furthermore, in order to evaluate the aggregate impact of the social frontiers and to compare their presence in different cities and their evolution over time there is a requirement for an aggregate measure – no current social frontier papers include a statistical description of the boundaries. Thus, this paper introduces two aggregate measures and uses them to discuss how social frontiers have changed from 2010 to 2019 in Amsterdam. These aggregate measures are used within a shuffling algorithm to compare the expected level of frontiers to the actual level for a given index of dissimilarity in Amsterdam, exploring the relationship between segregation (as measured by traditional methods) and social frontiers. These ultimately all have the aim of encouraging a discussion about the dynamics and presence of social frontiers in Amsterdam and the impact of cultural distance on cohesion. The paper employs the use of the rich Dutch registry data from Statistics Netherlands.

Chair: Paulette Flore

  • How to decide when smart surveys are sufficiently mature for implementation?
    Barry Schouten, Statistics Netherlands (CBS); Marc Houben; Statistics Netherlands (CBS)
  • Collecting travel behavior statistics in the Netherlands. Surveys, apps, sensors and register data
    Peter Lugtig, Utrecht University – Faculty of Social Sciences (UU-FSW); Danielle Remmerswaal, Utrecht University – Faculty of Social Sciences (UU-FSW); Yvonne Gootzen, Statistics Netherlands (CBS); Barry Schouten, Statistics Netherlands (CBS)
  • Respondent perceptions on surveys with smart features
    Janelle van den Heuvel, Statistics Netherlands (CBS); Anne Elevelt, Statistics Netherlands (CBS); Barry Schouten, Statistics Netherlands (CBS)
  • For Better or Worse? Digital Skills of the Dutch Population
    Mara Verheijen, Centerdata; Roxanne van Giesen, Centerdata; Patricia Prüfer, Centerdata

How to decide when smart surveys are sufficiently mature for implementation?
Barry Schouten, Statistics Netherlands (CBS); Marc Houben; Statistics Netherlands (CBS)
Smart surveys employ features of smart devices. Keeping respondents at the heart of data collection, they form a bridge between survey and big data. The incentive to go smart is especially strong for surveys that are considered burdensome, non-central to respondents and/or contain topics for which questions form weak proxies for the concepts of interest. In smart survey data collection AI and machine learning methodology play a prominent role in the transformation of ‘smart’ data to statistics. The new types of data and the new types of methodology imply investments in almost all stages of the statistical process; from data collection case management and backoffice systems to real-time processing procedures to post-survey editing and adjustment methods. Given the rationale of reducing measurement error, time series shifts are anticipated, implying that a change to smart data collection needs to be well prepared and introduced.
For official statistical institutes surveys are often repeated and large-scale, so that solid and robust logistics and architecture are imperative. The business case for ‘going smart’ must be extra strong. In Eurostat-funded project Smart Survey Implementation, several case studies are investigated, tested and evaluated on their maturity to go smart. In this presentation, we present so-called maturity criteria that have been put forward in all design levels: methodology, IT architecture, logistics and legal. The criteria are illustrated with examples and open for discussion with the conference attendees.

Collecting travel behavior statistics in the Netherlands. Surveys, apps, sensors and register data.
Peter Lugtig, Utrecht University – Faculty of Social Sciences (UU-FSW); Danielle Remmerswaal, Utrecht University – Faculty of Social Sciences (UU-FSW); Yvonne Gootzen, Statistics Netherlands (CBS); Barry Schouten, Statistics Netherlands (CBS)
Presentation will be joint with Danielle Remmerswaal (UU), Barry Schouten (CBS, UU) and Yvonne Gootzen (CBS). Travel statistics are historically collected using diaries, where people keep track of all the trips they take on week- and weekend days, along with the locations and times of starts and ends of trips. Over the past few years, several initiatives have been taken to improve measurement of travel data statistics. In this presentation, results from several studies conducted by Statistics Netherlands are showcased: 1) A smartphone app study was conducted in 2018 and 2022 to follow people using GPS locations and a smartphone app. In 2022 this study was combined with paper diaries for nonrespondents. 2) Road sensor data have been used in combination with population register data from Statistics Netherlands for the home and work locations for the entire Dutch working population to model road traffic intensities during rush-hour for all roads in the Netherlands. After shortly discussing both studies separately, this presentation will then focus on a possible future for travel statistics in which survey data, smartphone apps, register data, road sensors, and possibly other data on (public) transport may be combined to understand travel behavior both at the level of a country, region, but also to understand how individuals take decision on for example the mode of their travel behavior. There is not one way to integrate data, but the presentation will focus on exploring how different combinations may be useful for different kinds of research question.

Respondent perceptions on surveys with smart features
Janelle van den Heuvel, Statistics Netherlands (CBS); Anne Elevelt, Statistics Netherlands (CBS); Barry Schouten, Statistics Netherlands (CBS)
Smart surveys potentially offer new features to make the utility of surveys more salient and leverage any objections against surveys. They focus especially on survey topics that are burdensome, non-central to respondents and/or for which questions provide weak proxies. Doing so they often collect data that are not (fully) known to respondents themselves. Consequently, prior expectations of respondents and, importantly, of legal officers needing to set boundaries play an influential role.
The Smart Survey Implementation (SSI) project is a consortium with members from seven different European countries. The consortium aims to consolidate earlier findings from a predecessor European project on smart surveys into a robust and flexible baseline for implementation. Next to this, the project aims to involve and engage citizens hoping to the gain of their trust and participation. To reach this goal it is imperative to dive into the respondent perceptions within realistic and legitimate smart survey settings.
Therefore, as part of the SSI project, a cross-national survey on smart survey perceptions was introduced. This survey aims to provide empirically supported understanding of how citizens feel about surveys with smart features, including how well they understand what is being measured and what they consent to. This survey was conducted in three countries (Italy, Slovenia and the Netherlands). However, the presentation will focus on the Dutch design and results. The perception survey consists of a paper questionnaire on respondent perceptions about smart surveys and a smart survey with a range of different smart tasks. During the presentation, preliminary results of the survey on smart survey perceptions will be presented.

For Better or Worse? Digital Skills of the Dutch Population
Mara Verheijen, Centerdata; Roxanne van Giesen, Centerdata; Patricia Prüfer, Centerdata
Digitization is a large and complex theme. Strong digital skills are essential for a comprehensive understanding and active participation in today’s society. Commissioned by the Ministry of the Interior and Kingdom Relations, Centerdata investigated which digital skills are lacking among the Dutch population and what they aspire to learn. While previous research primarily focused on functional and basic skills, this research focuses on the importance of critical digital skills and digital awareness. In this presentation we will outline the methods and the main results of our research. We used an innovative multi-method approach that involved in-depth interviews and (design thinking) focus groups with experts, alongside an extensive online survey experiment conducted in the LISS panel (N=1,392). The digital skills of the Dutch population were measured both subjectively (through self-reported abilities) and objectively (by performing various tasks to gain insight into people’s actual critical digital skills). For instance, respondents were asked to perform a search tasks, make a purchase, identify fake news and phishing messages, and determine which types of messages should not be shared on social media. The results indicate that most respondents tend to be more digitally skilled in one context, but encounter challenges in another. Above all, this research shows that a significant proportion of the Dutch population struggle with identifying fake web shops and phishing messages, despite believing they are proficient in this area. In an online world where the number of fake web shops and phishing messages is on the rise, it is crucial to address the discrepancy between perceived and actual digital skills, raise awareness of the overestimation of one’s skills and provide support to enhance individuals’ digital competencies.

Chair: Frank Takes

  • Meta-dominance analysis – A tool for the assessment of the quality of digital behavioural data
    Wojtek Przepiorka, Utrecht University – Faculty of Social Sciences (UU-FSW); Andreas Schneck
  • Improving Descriptive Inference by Integrating Probability Sample and Nonprobability Sample
    An-Chiao Liu, Tilburg University, Tilburg School of Social and Behavioural Sciences (TiU-TSB); Sander Scholtus, Statistics Netherlands (CBS); Katrijn Van Deun, Tilburg University, Tilburg School of Social and Behavioural Sciences (TiU-TSB); Ton de Waal, Tilburg University, Tilburg School of Social and Behavioural Sciences (TiU-TSB)
  • Intuitive, accessible, flexible: A JASP module for conditional process models
    Malte Lüken, Netherlands eScience Center (NLeSC); Thijs Vroegh, Netherlands eScience Center (NLeSC); Johnny Doorn, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Eric-Jan Wagenmakers, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG)
  • Too good to be true: Linear Probability Model versus Logistic Regression
    Christian Fang, Utrecht University – Faculty of Social Sciences (UU-FSW); Qixiang Fang, Utrecht University – Faculty of Social Sciences (UU-FSW); Paulina Pankowska, Utrecht University – Faculty of Social Sciences (UU-FSW)

Abstracts

Meta-dominance analysis – A tool for the assessment of the quality of digital behavioural data
Wojtek Przepiorka, Utrecht University – Faculty of Social Sciences (UU-FSW); Andreas Schneck
We propose a simple yet comprehensive conceptual framework for the identification of different sources of error in research with digital behavioral data. We use our framework to map potential sources of error in 25 years of research on reputation effects in peer-to-peer online market platforms. Using a meta-dataset comprising 346 effect sizes extracted from 109 articles, we apply meta-dominance analysis to quantify the relative importance of different error components. Our results indicate that 85% of explained effect size heterogeneity can be attributed to the measurement process, which comprises the choice of platform, data collection mode, construct operationalisation and variable transformation. Error components attributable to the sampling process or publication bias capture relatively small parts of the explained effect size heterogeneity. This approach reveals at which stages of the research process researcher decisions may affect data quality most. This approach can be used to identify potential sources of error in established strands of research beyond the literature of behavioral data from online platforms.

Improving Descriptive Inference by Integrating Probability Sample and Nonprobability Sample
An-Chiao Liu, Tilburg University, Tilburg School of Social and Behavioural Sciences (TiU-TSB); Sander Scholtus, Statistics Netherlands (CBS); Katrijn Van Deun, Tilburg University, Tilburg School of Social and Behavioural Sciences (TiU-TSB); Ton de Waal, Tilburg University, Tilburg School of Social and Behavioural Sciences (TiU-TSB)
Many data sets do not come from a known sampling frame, for example, administrative data, social media, or web scraped data. Treating these nonprobability samples as simple random samples or iid samples may result in selection bias. For example, the distribution of the education level of Twitter data will hardly be the same as the education level in the population. Along with the nonprobability sample, a probability sample that came from the same population may assist in improving the descriptive inference. The target variable of interest may not always be available in the probability sample, while a common set of auxiliary variables which are related to the target variable and are available in both probability and nonprobability samples may be helpful. With the auxiliary variables, many approaches such as mass imputation, weighting, or doubly robust estimation can then be applied. However, the model evaluation and selection are not straightforward. Although massive approaches have been proposed such as cross-validation, or information criterion, the best model for prediction is not necessarily the best model for descriptive inference. Often for a prediction task, minimizing the mean squared error of the predicted unit value is of interest, while for descriptive inference, we are interested in minimizing the mean squared error of the population-level estimates. Besides, given that the nonprobability sample may be selective, the best-fitting model in the nonprobability sample may not be the same as the best-fitting model in the population.

Intuitive, accessible, flexible: A JASP module for conditional process models
Malte Lüken, Netherlands eScience Center (NLeSC); Thijs Vroegh, Netherlands eScience Center (NLeSC); Johnny Doorn, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Eric-Jan Wagenmakers, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG)
Conditional process models are widely employed in the social sciences for data analysis. They allow researchers to test various configurations of mediation and moderation effects. A popular implementation of these models is the PROCESS macro for SPSS which has several shortcomings: It requires a commercial, closed-source program, is limited to a fixed number of model configurations, and offers little to visualize the estimated models. The macro only implements conditional process models in the frequentist framework, withholding the advantages of Bayesian modeling.
We aim to overcome these limitations by implementing a more user-friendly module for conditional process models in the free and open-source statistics software JASP (https://jasp-stats.org/): In addition to selecting predefined model configurations, users can also build models step-by-step by specifying paths and relationships between variables. Our module provides visualizations of the models and estimated parameters for each step, making it easy for users to understand the model building process. Moreover, users can compare multiple models that contain different paths and relationships to select the model that is most consistent with data. Finally, we extend the frequentist implementation of conditional process models to a Bayesian framework.
In our implementation, we conceptualize conditional process models as structural equation models (SEMs). We build our module in the R programming language and leverage existing R packages for estimating SEMs in the frequentist and Bayesian frameworks. The implementation is centered around intuitive design and usability while maintaining partial compatibility with the design of the PROCESS macro.
Our module makes conditional process models more accessible to students, teachers, and researchers in the social sciences. It also adds flexibility to current implementations, enabling users to formulate models that are more appropriate for their specific research questions. We plan to release the module with examples and a tutorial that explains its main features.

Too good to be true: Linear Probability Model versus Logistic Regression
Christian Fang, Utrecht University – Faculty of Social Sciences (UU-FSW); Qixiang Fang, Utrecht University – Faculty of Social Sciences (UU-FSW); Paulina Pankowska, Utrecht University – Faculty of Social Sciences (UU-FSW)
Explaining binary outcomes, such as voting behavior, divorce rates, or job hiring, is a common goal in social science research. Traditionally, logit or probit regression models are employed to analyze the relationship between predictors and binary outcomes. However, with the rise of large social science datasets, logistic regression estimation can become slow, and the interpretation of logit or probit coefficients is often perceived as complex and counter-intuitive. As a result, the Linear Probability Model (LPM) has regained attention as a potentially more appealing alternative. The LPM employs a “regular” linear regression (i.e., OLS) with a binary dependent variable. While some statisticians consider LPM controversial due to its violation of key OLS assumptions and the occurrence of negative or greater-than-one estimated probabilities, proponents of LPM dismiss these concerns. They cite simulation studies demonstrating virtually indistinguishable coefficients between LPM and corresponding logistic regression models. Furthermore, they argue that LPM offers advantages, such as coefficients that are more easily interpreted and faster computations. To assess the claims made by LPM proponents, we conduct a Monte Carlo simulation study. We investigate various scenarios, including different variable types, interaction effects, and effect sizes, to determine the extent to which LPM estimates align with logistic regression estimates. Our preliminary findings indicate notable discrepancies between the two sets of estimates under several considered conditions. Specifically, LPM estimates exhibit biases, particularly when the model includes interaction terms or when predictors are skewed or contain outliers. To assist researchers in making informed modeling choices, we conclude by presenting a decision tree that can guide their selection of appropriate modeling techniques.

Chair: Marilù Mioto

  • Debunking and Exposing Misinformation among Fringe Communities: Testing Source Exposure and Debunking Anti-Ukrainian Misinformation among German Fringe Communities
    Marijn ten Thij, Maastricht University – Faculty of Science and Engineering (UM-FSE); Christiern Santos Rasmussen, European University Institute – Department of Political and Social Sciences; Amir Ebrahimi Fard, Maastricht University – Faculty of Science and Engineering (UM-FSE)
  • Community detection on signed networks: untangling co-voting patterns on online social media
    Elena Candellone, Utrecht University – Faculty of Social Sciences (UU-FSW); Javier Garcia-Bernardo, Utrecht University – Faculty of Social Sciences (UU-FSW); Erik-Jan van Kesteren, Utrecht University – Faculty of Social Sciences (UU-FSW)
  • Collective information processing of long-form text in multi-generational social networks
    Javier Garcia-Bernardo, Utrecht University – Faculty of Social Sciences (UU-FS); Aleksandra Aloric; Andrea Santoro, EPFL; Zohar Neu, University of Bristol; Allison Morgan, Code for America; Mathew Hardy, Princeton University; Tom Griffiths, Princeton University; P.M. Krafft, University of the Arts London
  • The development of misinformation on Dutch social media during the Covid pandemic: An analysis of Tweets using a machine learning model
    Lotte Schrijver, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Rens Vliegenthart, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Pearl Dykstra, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)

Abstracts

Debunking and Exposing Misinformation among Fringe Communities: Testing Source Exposure and Debunking Anti-Ukrainian Misinformation among German Fringe Communities
Marijn ten Thij, Maastricht University – Faculty of Science and Engineering (UM-FSE); Christiern Santos Rasmussen, European University Institute – Department of Political and Social Sciences; Amir Ebrahimi Fard, Maastricht University – Faculty of Science and Engineering (UM-FSE)
Background: In the current digital age, misinformation is a threat for society, as it erodes trust in democracy, increases polarization, and taps into extremist movements’ ideology. As a result, research on stopping misinformation is a very active field. However, this work focuses on a general audience, rather than members of fringe communities. Fringe communities are harder to reach with debunking approaches for as they 1) have higher consumption levels of misinformation, 2) have been shown to be more susceptible to misinformation, and 3) tend to avoid fact-checkers and traditional media. Objective: This study tests if both traditional debunking and the novel counter-misinformation strategy, source exposure, can lower consumption of misinformation media among fringe communities. Methods: Using a snowball sampling of German fringe communities on Facebook, we identified public groups (N = 35) who regularly consumed the two most popular misinformation sources in Germany (Deutschland Kurier and Reitschuster). In collaboration with the fact-checking organization VoxCheck, we conduct two types of interventions: 1) debunking anti-Ukrainian misinformation, or 2) exposing the source of the message as a spreader of misinformation. Serving as gatekeepers of posts in the groups, ten group administrators blocked our interventions. This allowed us to include another intervention, 3) targeting gatekeepers of fringe communities. Results: We find that treated groups do have a statistically significant lower consumption two weeks after treatment, compared to the control group. Looking at the long-term effect of the interventions, we find that source exposure has a statistically significant long-term effect of lowering misinformation consumption, whereas we do not observe this outcome for debunking. More surprisingly, however, is the fact that we saw a statistically significantly lower consumption of misinformation sources for the fringe communities for which the group admins rejected our treatment. Conclusions: These results suggest that proactive counter-misinformation have an effect on fringe communities.

Community detection on signed networks: untangling co-voting patterns on online social media
Elena Candellone, Utrecht University – Faculty of Social Sciences (UU-FSW); Javier Garcia-Bernardo, Utrecht University – Faculty of Social Sciences (UU-FSW); Erik-Jan van Kesteren, Utrecht University – Faculty of Social Sciences (UU-FSW)
In the last decades, the pervasive presence of online micro-blogging platforms has revolutionized the way we communicate. With the emergence of the World Wide Web, a significant portion of political discourse has migrated to online media, connecting individuals across countries and diverse perspectives. While these technological advancements have facilitated the exchange of ideas and opinions, they have also given rise to the subtle and ambiguous dissemination of misinformation and propaganda. Consequently, it is crucial to investigate the dynamics of opinion formation and evolution on social media, with a specific focus on identifying inauthentic behaviors and self-organized groups that aim to propagate false information.

In this study, we employ a variety of community detection methods specifically designed for analyzing signed networks, characterized by the presence of positive and negative edges. Our objective is to uncover distinct communities within the network and explore the relationship between network topology and the sociological interpretation of these communities. To determine the most suitable community detection algorithm for real-world scenarios, we devise a pipeline that involves simulating networks with similar structural properties to the actual network.

As a real-world application, we utilize a dataset obtained from a Spanish microblogging platform called Menéame. From this dataset, we construct a co-voting network where a positive edge is created between users if they vote in the same manner on a post, and a negative edge is established if they react differently.

Through our research, we aim to enhance our understanding of the complex interplay between online communication patterns and the formation of opinion-based communities. The insights gained from this investigation will contribute to the development of effective strategies for mitigating the spread of false information and promoting a more informed and reliable online discourse.

Collective information processing of long-form text in multi-generational social networks
Javier Garcia-Bernardo, Utrecht University – Faculty of Social Sciences (UU-FS); Aleksandra Aloric; Andrea Santoro, EPFL; Zohar Neu, University of Bristol; Allison Morgan, Code for America; Mathew Hardy, Princeton University; Tom Griffiths, Princeton University; P.M. Krafft, University of the Arts London
Previous studies on text transmission in laboratory experiments find that texts often become wildly distorted as they pass from one person to the next. These results contrast with real-world studies, where accuracy and fidelity remain high. A key difference between these domains is the complexity of the network structure—naturalistic settings are characterized by complex interaction mechanisms that provide redundancy and repeated exposure, whereas laboratory tasks typically use simple transmission chains. Here, we investigate the effects of network structure on the transmission of complex, long-form text under controlled laboratory conditions. We performed a large pre-registered experiment of 72 independent multi-generational networks where participants summarized text summaries of a news story about antibiotic resistance, which were then summarized by participants at the next generation. Our key manipulation involved comparing chains, where participants read one other participant’s summary, to networks, where participants read the summaries made by three different participants. In line with our predictions, we found that networks better preserved the content of the original story and were less semantically polarized.
We find that the network condition leads to a consensus on information content, not only between interacting participants in a given network, but also between non-interacting participants in independently tested networks. These results indicate a capacity of networks to reliably filter out specific information content from a given text.
Exploratory analyses reveal that networks also better preserved the positive sentiment of the original story, whereas chains became significantly more negative.

The development of misinformation on Dutch social media during the Covid pandemic: An analysis of Tweets using a machine learning model
Lotte Schrijver, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Rens Vliegenthart, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Pearl Dykstra, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)
In 2020, the WHO declared the abundance of (mis)information during the covid pandemic an ‘infodemic’ (World Health Organization, 2020), exemplifying the view that the world was not only suffering from a health crisis, but also a crisis of information (Hameleers & Brosius, 2022). Despite concerns about misinformation about covid, there is little knowledge about the amount of misinformation over the course of the pandemic (Gabarron et al., 2021). Therefore, we aim to answer the following research question: How has the presence of misinformation in Dutch social media developed during the covid pandemic? We map the development of the amount of misinformation about covid and its characteristics (e.g. incivility, user engagement) on Dutch-language Twitter between February 2020 and February 2022. We train a BERT model to detect misinformation. Previous studies measuring the amount of misinformation about covid have relied on drawing samples and manually fact-checking these (e.g. Kouzy et al., 2020; Pulido Rodríguez et al., 2020). Training a machine learning model allows us to detect misinformation in a large amount of data, and thus arrive at a more fine-grained analysis of the development of misinformation over a long period of time. BERT models have been shown to be able to accurately classify misinformation about covid (e.g. Choudrie et al., 2021; Moffitt et al., 2021). We take a more rigorous approach than previous studies by developing an extensive codebook and providing elaborate coder training. Thus, we further explore machine learning models as a misinformation detection tool during a crisis by answering the research question: To what extent are BERT models able to detect misinformation spread during the covid pandemic? Results and conclusions are not yet available. Preliminary results will be presented at the conference.

Chair: Kristina Thomson

  • The association between Attention Deficit Hyperactivity Disorder and treatment discontinuation for Type 2 Diabetes Mellitus: research based on a nationwide population-based cohort study in the Netherlands
    Catharina Hartman, UMCG Groningen; Tian Xie; Harold Snieder
  • Using machine learning algorithms to build prediction models for wellbeing: A data-driven approach using genetic, environmental, and psychosocial predictors
    Dirk Pelt, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); Philippe Habets, Amsterdam UMC Locatie VUmc; Christiaan Vinkers, Amsterdam UMC Locatie VUmc; Lannie Ligthart, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); Toos van Beijsterveldt, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); René Pool, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); Meike Bartels, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB)
  • Studying the mental health consequences of COVID-19 lockdowns: A microsimulation modelling approach
    Astrid Pham, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Eva Viviani, Netherlands eScience Center; Ji Qi, Netherlands eScience Center; Kristina Thompson, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG)
  • Inequalities in Healthcare Use during the COVID-19 Pandemic
    Mark Verhagen, University of Oxford, Amsterdam Health and Technology Institute; Arun Frey, University of Oxford; Andrea Tilstra, University of Oxford

Abstracts

Using machine learning algorithms to build prediction models for wellbeing: A data-driven approach using genetic, environmental, and psychosocial predictors.
Dirk Pelt, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); Philippe Habets, Amsterdam UMC Locatie VUmc; Christiaan Vinkers, Amsterdam UMC Locatie VUmc; Lannie Ligthart, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); Toos van Beijsterveldt, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); René Pool, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); Meike Bartels, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FG)
Using longitudinal data of a large cohort (Netherlands Twin Register) collected between 1991-2022, we built machine learning (ML) prediction models for adult wellbeing. Three data modalities are included: the phenome, genome, and exposome. The phenome was captured by longitudinal parent- and self-reports, providing an extensive set of psychosocial predictors from childhood to adulthood (2465 predictors). The genome was represented by 60 polygenic scores (PGS), covering a wide range of domains. The exposome was captured by linking participants’ postal codes to objective environmental exposures based on register data, providing information on, for example, urbanization, air pollution, neighborhood socio-economic status, greenspace, and population demographics (734 predictors).
After a feature selection step, we trained stacked ensemble models based on three ML algorithms (random forest, support vector machine, extreme gradient boosting) using 10-folds cross-validation, evaluating their performance (R2) in independent test sets. We used the phenome model as a baseline, and tested whether adding predictors from the other data modalities increased prediction. Sample sizes ranged between 702 and 5874 across analyses.
Performance of the model based on the phenome predictors was high (.701 [.633 – .752], while model performance based on features from the genome (.008 [-.011 – .026]) and exposome (-.006 [-.036 – .019]) was low. Adding genomic predictors to the phenome predictors increased performance marginally (Δ = .011; p = .668), while adding the exposome seemed to only add noise (Δ = -.011; p = .662). The model based on features from all three data modalities increased accuracy marginally (Δ = .006; p = .220).
Our phenome results stress the importance of deep phenotyping to arrive at prediction accuracies needed for personalized prevention and interventions for mental health. At the same time, more research is needed on the predictive power of the genome and exposome for wellbeing.

Studying the mental health consequences of COVID-19 lockdowns: A microsimulation modelling approach
Astrid Pham, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Eva Viviani, Netherlands eScience Center; Ji Qi, Netherlands eScience Center; Kristina Thompson, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG)
The COVID-19 pandemic has highlighted the importance of tailored preparedness plans to cope with potential disease outbreaks. A major challenge for policymakers is finding the balance between containing the disease and supporting mental health and well-being. Although COVID-19 lockdowns and accompanying restrictions mitigated the spread of the virus, many countries witnessed significant increasing rate of depression during these periods. It is not clear the extent to which different lockdown scenarios impacted depression rates, and whether these effects might be felt in the future. In this study, a microsimulation model, COMMA (COvid Mental-health Model with Agents), was developed to simulate behaviors of individuals under different lockdown scenarios, both actual and hypothetical, and of varying duration, sequence and severity. Ultimately, COMMA will help to understand how the prevalence of depression changed in each scenario. The characteristics of simulated population were synthesized based on health cohort data from Lifelines. The data included 2,556 participants living in the Dutch city of Groningen during the pandemic. This model is an initial attempt at clarifying the multifaceted relationship between lockdown policies and mental health outcomes, which is often overlooked when deciding lockdown policies aimed at curbing mortality and hospitalizations.This research may ultimately enrich the evidence base for policymakers considering future pandemic preparedness.

The association between Attention Deficit Hyperactivity Disorder and treatment discontinuation for Type 2 Diabetes Mellitus: research based on a nationwide population-based cohort study in the Netherlands
Catharina Hartman, UMCG Groningen; Tian Xie; Harold Snieder
Previous research suggests that Attention Deficit Hyperactivity Disorder (ADHD) is associated with Type 2 Diabetes Mellitus (T2DM). It may further be hypothesized that poor organization, planning, monitoring skills, and impulsivity, all core features of ADHD, may be involved in both onset and poorer management of T2DM. In line with the latter, the presence of ADHD is associated with a worse prognosis of T2DM but there is little empirical research on how this may work. The current study investigated if the presence of ADHD was associated with antidiabetic treatment discontinuation. Using nationwide data from the Central Bureau of Statistics (CBS) the Netherlands, adult patients who initiated using antidiabetic drugs (A10A, A10B) with a one-year washout period were selected. They were followed from the first initiation of antidiabetic drugs to death, emigration, or the end of study period (2020/12/31), whichever came first. ADHD was the exposure and was defined by diagnosis or dispensations of ADHD medication. T2DM treatment discontinuation was defined as gap of more than 135 days between dispensed prescriptions. Cox regression models were used to estimate associations. Results showed that the Hazard Ratio (HR) between ADHD and first antidiabetic treatment discontinuation was 1.26 (1.22-1.31). This estimate was 1.16 (1.12-1.20) when adjusted for birth year, age, sex, SES and other psychiatric conditions. Stratified by age we found larger HR in older-aged adults: young adults (18-34): 0.98 (0.90, 1.06); middle-aged adults 1.09 (1.05, 1.14); older-aged adults (65+ years): 1.24 (1.14, 1.34). We conclude that ADHD is associated with poorer adherence to antidiabetic drugs, especially in older-aged adults. This may explain why the prognosis of T2DM is worse in adults with ADHD compared to adults without ADHD. Treatment guidelines on T2DM stress that psychiatric comorbidity needs consideration but ADHD is rarely mentioned. Our findings indicate that this situation may need to change.

Inequalities in Healthcare Use during the COVID-19 Pandemic
Mark Verhagen, University of Oxford, Amsterdam Health and Technology Institute; Arun Frey, University of Oxford; Andrea Tilstra, University of Oxford
The COVID-19 pandemic has led to severe reductions in non-COVID related healthcare use, but little is known whether this burden is shared equally across the population. This study investigates whether the reduction in administered care disproportionately affected certain sociodemographic strata, in particular marginalised groups. Using detailed medical claims data from the Dutch universal health care system and rich registry data that cover all residents in The Netherlands, we predict expected healthcare use based on pre-pandemic trends (2017– Feb 2020) and compare these expectations with observed healthcare use in 2020. Our findings reveal a substantial 10% decline in the number of weekly treated patients in 2020 relative to prior years. Furthermore, declines in healthcare use are unequally distributed and are more pronounced for individuals below the poverty line, females, the elderly, and foreign-born individuals, with cumulative relative risk ratios ranging from 1.09 to 1.22 higher than individuals above the poverty line, males, young, and native-born. These inequalities stem predominantly from declines in middle and low urgency procedures, and indicate that the pandemic has not only had an unequal toll in terms of the direct health burden of the pandemic, but has also had a differential impact on the use of non-COVID healthcare.

  1. Using Language Models to Improve Regulatory Compliance: A Study on Detecting Sponsored Content on Instagram
    Thales Bertaglia, Maastricht University – Faculty of Science and Engineering (UM-FSE); Stefan Huber, Maastricht University – Faculty of Science and Engineering (UM-FSE); Catalina Goanta, Utrecht University – Faculty of Law, Economics and Governance (UU-REBO); Gerasimos Spanakis, Maastricht University – Faculty of Science and Engineering (UM-FSE); Adriana Iamnitchi, Maastricht University
  2. Less happy with the same: The role of migrant composition in shaping the migrant gap in neighborhood satisfaction in the Netherlands
    Weiyi Cao, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG)
  3. Offline trumps online: name recognition effects in the Five Star Movement 2012 Primary Election
    Giovanni Cassani, Tilburg University, Tilburg School of Humanities and Digital Sciences (TiU-TSH); Francesco Marolla, Tilburg University, Tilburg School of Social and Behavioural Sciences (TiU-TSB); Maria Lucia Miotto, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)
  4. Assessing bidirectional relationships between noise annoyance and health: a cross-lagged panel analysis
    Lion Cassens, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM); Sander van Cranenburgh, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM); Simeon Calvert, TU Delft – Faculty Of Civil Engineering and Geosciences (TUD-CEG); Maarten Kroesen, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM)
  5. From Finding to Re-using Confidential Data
    Freek Dijkstra, SURF; Emma Schreurs, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG)
  6. Mental health workforce in primary care and the allocation of depression treatment: a data linkage study
    Jesper Dros, Netherlands Institute for Health Services Research (Nivel); Christel van Dijk, Zorginstituut Nederland; Koen Böcker, Zorginstituut Nederland; Robert Verheij, Netherlands Institute for Health Services Research (Nivel); Bert Meijboom, Tilburg University; Jan-Willem Dik, Zorginstituut Nederland; Isabelle Bos, Nivel
  7. On Text-based Personality Computing: Challenges and Future Directions
    Qixiang Fang, Utrecht University – Faculty of Social Sciences (UU-FSW); Anastasia Giachanou, Utrecht University – Faculty of Social Sciences (UU-FSW); Ayoub Bagheri, Utrecht University – Faculty of Social Sciences (UU-FSW); Laura Boeschoten, Utrecht University – Faculty of Social Sciences (UU-FSW); Erik-Jan van Kesteren, Utrecht University – Faculty of Social Sciences (UU-FSW); Mahdi Shafiee Kamalabad, Utrecht University – Faculty of Social Sciences (UU-FSW); Daniel Oberski, Utrecht University – Faculty of Social Sciences (UU-FSW), UMCU
  8. The Hidden Divide: School Segregation of Teachers in the Netherlands
    Rafiq Friperson, VU Amsterdam – School of Business and Economics (VU-SBE); Hessel Oosterbeek, University of Amsterdam – Faculty of Economics and Business (UvA-FEB);Bas van der Klaauw, VU Amsterdam – School of Business and Economics (VU-SBE)
  9. Bridging Questionnaires and AI Models for Insights into the Intercorrelations of Big Five Traits
    Anastasia Giachanou, Utrecht University – Faculty of Social Sciences (UU-FSW); Yucheng Chen, Utrecht University – Faculty of Social Sciences (UU-FSW); Qixiang Fang, Utrecht University – Faculty of Social Sciences (UU-FSW)
  10. Farmers’ Adaptation to Climate Change in Different Regions of the World: A Natural Language Processing Systematic Review
    Sofia Gil-Clavel, TU Delft – TPM; Tatiana Filatova, TU Delft – TPM
  11. Migration Policies and Immigrants’ Language Acquisition in EU-15: Evidence from Twitter
    Sofia Gil-Clavel, TU Delft – TPM; André Grow; Maarten J. Bijlsma
  12. Education, unions, and physiology: explaining the gap between intended family size and completed fertility for the 1974-1984 birth cohort of Dutch women
    Rolf Granholm, University of Groningen – Faculty of Behavioural and Social Sciences (RUG-FGMW); Gert Stulp, University of Groningen – Faculty of Behavioural and Social Sciences (RUG-FGMW); Anne Gauthier, Netherlands Interdisciplinary Demographic Institute (NIDI)
  13. Synthetic Instagram Post Generation for Social Media Research
    Lily Heisig, Maastricht University – Faculty of Science and Engineering (UM-FSE); Sander Lardinois, Maastricht University – Faculty of Science and Engineering (UM-FSE); Thales Bertaglia, Maastricht University – Faculty of Science and Engineering (UM-FSE); Adriana Iamnitchi, Maastricht University – Faculty of Science and Engineering (UM-FSE)
  14. Implicit Association Tests: Stimuli Validation from Participant Responses
    Sally Hogenboom, Open University (OU); Katrin Schulz, University of Amsterdam – Faculty of Humanities; Leendert van Maanen, Utrecht University – Faculty of Social Sciences (UU-FSW)
  15. Health in All Networks Simulator (HANS): Building the evidence base to improve the resiliency and health of Amsterdam residents
    Jiri Kaan, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Kristina Thompson, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Annemarie Wagemakers, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Spencer Moore, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG)
  16. Cardiometabolic diseases in autistic adults: research based on a nationwide population-based cohort study from the Netherlands
    Yiran Li, University of Groningen – University Medical Center Groningen (RUG-UMCG); Tian Xie, University of Groningen – University Medical Center Groningen (RUG-UMCG); Lin Li, Örebro University – School of Medical Sciences; Harold Snieder, University of Groningen – University Medical Center Groningen (RUG-UMCG); Catharina Hartman, University of Groningen – University Medical Center Groningen (RUG-UMCG)
  17. Data linkage of health care registries to study silicone breast implants adverse events
    Annemiek Lieffering, Netherlands Institute for Health Services Research (Nivel); Lotte Ramerman, Netherlands Institute for Health Services Research (Nivel); Juliëtte Hommes, Zuyderland Medical Center; Robert Verheij, Netherlands Institute for Health Services Research (Nivel); René van der Hulst, Maastricht University Medical Center; Hinne Rakhorst, Medical Spectrum Twente; Marc Mureau, Erasmus MC Cancer Institute
  18. Evidence for severe mood instability in patients with bipolar disorder: Applying multilevel hidden Markov modelling to intensive longitudinal ecological momentary assessment data
    Sebastian Mildiner Moraga, Utrecht University – Faculty of Social Sciences (UU-FSW); Emmeke Aarts, Utrecht University – Faculty of Social Sciences (UU-FSW)
  19. The (Great) Persuasion Divide? Gender Disparities in Debate Speeches and Evaluations
    Huyen Nguyen, Utrecht University – Faculty of Social Sciences (UU-FSW)
  20. Unveiling discrete choice modeller behaviour: Through a serious game
    Gabriel Nova, Tu Delft – Faculty of Technology Policy and Management; Sander van Cranenburgh, TU Delft – Faculty of Technology Policy and Management; Stephane Hess, TU Delft – Faculty of Technology Policy and Management
  21. Data visualization for incomplete data
    Hanne Oberman, Utrecht University – Faculty of Social Sciences (UU-FSW)
  22. Modeling Interactions in Product Reviews: Utilizing Formal Argumentation to Assess Strengths and Weaknesses of Products Across Different Aspects
    Ji Qi, Netherlands eScience Center (NLeSC); Atefeh Keshavarzi Zafarghandi, Centrum Wiskunde & Informatica (CWI), VU Amsterdam – Faculty of Science (VU-Science); Davide Ceolin, Centrum Wiskunde & Informatica (CWI)
  23. Big data governance on digital twin technology for smart and sustainable tourism
    Eko Rahmadian, University of Groningen, Campus Fryslan
  24. The Survey Quality Predictor 3.0
    Lydia Repke, GESIS – Leibniz-Institut für Sozialwissenschaften; Cornelia Neuert, GESIS – Leibniz-Institut für Sozialwissenschaften; Wiebke Weber, GESIS – Leibniz-Institut für Sozialwissenschaften
  25. Consequences of delayed care in chronic disease management programs in Dutch patients with diabetes during the COVID-19 pandemic: a registry-based observational study
    Corinne Rijpkema, Netherlands Institute for Health Services Research (Nivel); Lotte Ramerman, Netherlands Institute for Health Services Research (Nivel); Isabelle Bos, Netherlands Institute for Health Services Research (Nivel); Robert Verheij, Netherlands Institute for Health Services Research (Nivel)
  26. How do regions adopt new technologies? AI adoption in the Netherlands
    Harm-Jan Rouwendal, University of Groningen, Faculty of Spatial Sciences (RUG-FRW); Sierdjan Koster, University of Groningen – Faculty of Spacial Science (RUG-FRW); Tersa Farinha, UNU-MERIT
  27. Who is misinformed, who is a conspiracist? Investigation of psychological factors associated with susceptibility to misinformation and conspiracy theories with a network analysis approach
    Selin Topel, Leiden University – Faculty of Social Sciences (UL-FSW); Ili Ma, Leiden University – Faculty of Social Sciences (UL-FSW); Ellen de Bruijn, Leiden University – Faculty of Social Sciences (UL-FSW)
  28. Better Together: Exploring the complex dynamics of classroom interaction through social network analysis
    Nina van Graafeiland, Utrecht University – Faculty of Social Sciences (UU-FSW); Mahdi Shafiee Kamalabad, Assistant Professor; Nienke Smit, Utrecht University – Faculty of Social Sciences (UU-FSW)
  29. Rising through the Ranks: Firms and Intergenerational Mobility
    Oskar Veerhoek, Radboud University Nijmegen – Faculty of Management Sciences (RU-FdM)
  30. COMMA: An agent-based micro-simulation model to study mental health outcomes during COVID-19 lockdowns
    Eva Viviani, Netherlands eScience Center
  31. Setting up a Governance Framework for Secondary Use of Routine Health Data in Nursing Homes: Development Study Using Qualitative Interviews
    Yvonne Wieland-Jorna, Netherlands Institute for Health Services Research (Nivel); Robert A Verheij, Netherlands Institute for Health Services Research (Nivel); Anneke L Francke, Netherlands Institute for Health Services Research (Nivel); Marit Tomassen, Netherlands Institute for Health Services Research (Nivel); Max Houtzager; Karlijn J Joling; Mariska G Oosterveld-Vlug
  32. Maximizing Panel Research Project Management with Cutting-Edge Panel Management Software
    Arnaud Wijnant, Centerdata
  33. The Contagion of Collective Action: Analyzing Spatial Diffusion of Citizen Care Collectives in The Netherlands
    Kevin Wittenberg, Utrecht University – Faculty of Social Sciences (UU-FSW); Rense Corten, Utrecht University – Faculty of Social Sciences (UU-FSW)
  34. How Parental Unemployment Influences Children‘s Educational Attainment and Income: The role of Parental Economic Resources and Parental Mental Health
    Flora Zhou, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)

Abstracts

Using Language Models to Improve Regulatory Compliance: A Study on Detecting Sponsored Content on Instagram
Thales Bertaglia, Maastricht University – Faculty of Science and Engineering (UM-FSE); Stefan Huber, Maastricht University – Faculty of Science and Engineering (UM-FSE); Catalina Goanta, Utrecht University – Faculty of Law, Economics and Governance (UU-REBO); Gerasimos Spanakis, Maastricht University – Faculty of Science and Engineering (UM-FSE); Adriana Iamnitchi, Maastricht University

As the scale and influence of social media marketing increase, regulatory bodies worldwide grapple with the challenges of monitoring and enforcing transparency. An inherent part of this task is accurately detecting sponsored content. Current research approaches this issue through machine learning techniques, which often struggle with inconsistent labelling and low inter-annotator agreement. We propose an innovative computational method to enhance annotation accuracy. We employed chatGPT to augment annotation with automatically-generated relevant features and contextual explanations. We conducted a user study to evaluate how our approach can help detect sponsored content. In our user study, participants with varying expertise in annotating sponsored content on social media labelled 200 Instagram posts as Sponsored or Non-Sponsored. We split the participants into three expertise groups: novices with no prior annotation experience, intermediate annotators with previous experience but no formal training, and legal experts knowledgeable in social media regulations. Additionally, we split these groups into those annotating with and without the aid of generated explanations from chatGPT. Our experiments resulted in a substantial increase in inter-annotator agreement and annotation accuracy across all levels of expertise. Moreover, user experience surveys indicate that explanations enhance the annotators’ confidence, thus improving the quality of their decision-making process in regulatory compliance contexts. These findings underscore the potential of integrating advanced language models within regulatory monitoring procedures, demonstrating the crucial role of computational social science in addressing complex digital phenomena. Despite the need for future research to explore potential biases and extend to a broader range of tasks, our work presents an innovative approach towards improving digital enforcement and increasing transparency in online advertising. We have made our dataset, labels, and GPT predictions publicly available, offering a valuable resource for future investigations in the field.

Less happy with the same: The role of migrant composition in shaping the migrant gap in neighborhood satisfaction in the Netherlands
Weiyi Cao, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG)

Lower levels of housing and neighborhood satisfaction have been found to be related to constraints in income, housing tenure, household type, housing attributes, and neighborhood conditions. However, socioeconomic profiles and objective housing and neighborhood characteristics cannot explain all the variations in the satisfaction gap between migrant and native households. This paper sheds new light on the role of neighborhood attributes, specifically the migrant composition, in shaping the nativity gap in neighborhood satisfaction in the Netherlands by investigating whether there exists an interactive effect of migrant status by migrant composition. The study runs ordinal logistic regressions using the 2018 wave of the comprehensive national survey on housing conditions and satisfaction (WoON) in the Netherlands. The result shows that migrant households are similarly satisfied with the living environment when living in a home and neighborhood of the same quality as non-migrant households. However, when the interaction term between the migrant households dummy and neighborhood migrant composition is included, the key results indicate that the nativity gap in neighborhood satisfaction varies as the migrant composition changes: migrant households express lower satisfaction with the neighborhood’s living environment than their native counterparts when the share of migrants in the neighborhood is less than 17%. Constraints in migrant households’ accessibility to particular amenities and social interactions are possible explanations for this satisfaction gap in native-majority neighborhoods. The paper contributes to the housing satisfaction literature by explaining the nativity gap in residential satisfaction with neighborhood characteristics, which has implications for immigrants’ demands for particular amenities, constraints in social interactions, and their integration into the settlement society.

Paying more to live in native neighborhoods: migrant price differentials in the home-buying market of the Netherlands
Weiyi Cao, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Nico Heerink, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Eveline van Leeuwen, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG)

Few studies investigated racial or ethnic price differentials in the European context. This paper investigates the patterns of intra- and inter-neighborhood price differentials between migrant and native households buying similar housing in similar neighborhoods in the Netherlands by using the pooled national surveys of Housing Research (WoON) 2015 and 2018 combined with micro-data. Parallel to U.S. studies that have distinguished neighborhoods as Black, integrated, and White neighborhoods, this study categorizes neighborhoods as native, integrated, and migrant ones. Results suggest that some patterns of price differentials found in the Netherlands resemble those in the U.S., yet others are different. First, the negative associations between the migrant share in the neighborhood and housing prices, interpreted as effects of prejudice against ethnic minorities, are also present in all three types of native, integrated, and migrant neighborhoods in the Netherlands; yet the negative relation is much stronger in native neighborhoods than in integrated and migrant neighborhoods. Second, the tipping points pushing prices upward when neighborhood migrant share reaches certain cut-off points, which are interpreted as the result of segregation in the U.S. empirical literature, are not found in this study. In addition, we further test whether intra-neighborhood price differentials due to supplier discrimination vary (1) across three types of neighborhoods of different migrant shares and (2) across homebuyers of different income groups, both of which have been rarely done in previous studies. The key results indicate that migrant households who buy housing in native neighborhoods on average pay a significantly higher price of 9.86%. For high-income migrant households, the price premium declines slightly to 7.88%. While both differentials are only found in native neighborhoods, the magnitudes are higher than those found in U.S. studies, calling for studies to further examine the issue of discrimination against foreign-born population in the housing market of the Netherlands.

Offline trumps online: name recognition effects in the Five Star Movement 2012 Primary Election
Giovanni Cassani, Tilburg University, Tilburg School of Humanities and Digital Sciences (TiU-TSH); Francesco Marolla, Tilburg University, Tilburg School of Social and Behavioural Sciences (TiU-TSB); Maria Lucia Miotto, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)

We study the impact of (i) online engagement on the MeetUp platform and (ii) previous candidacies in local elections on the electoral prospects of the candidates in the 2012 online primaries of the Italian populist party Five Star Movement (FSM).
While previous research has produced mixed findings regarding the influence of extensive digital media use on intra-party democracy, our study addresses a critical research gap by examining how digital tools and previous candidacies shape the distribution of power within a political party.
Leveraging data from the FSM’s online galaxy, we derive the online network of party members and explore the relationship between their network centrality (to quantify levels of online engagement) and their result in the 2012 primary election. We hypothesise that higher network centrality correlates with better electoral performances, especially considering the emphasis the FSM put on grass-root online activism. We further control for documented influences on the voting process by the voting platform design, especially ballot order effects. We also look at whether candidates who ran in previous local elections (but lost) were advantaged due to name recognition.
Our results show that online activity had no correlation with electoral outcomes, with no differences between candidates who were or were not on MeetUp. Network centrality was also uncorrelated with the election outcome. Contrarily, the few candidates who already ran for local office received a consistent advantage, which remains significant after controlling for the influence of the voting platform.
While we considered a single platform to quantify online activity and plan to extend our study to other arenas where the party’s political discourse was articulated, our results suggest that online campaigning efforts were not rewarded, whereas electoral results were heavily conditioned by flaws in the platform design and party endorsements in local elections.

Assessing bidirectional relationships between noise annoyance and health: a cross-lagged panel analysis
Lion Cassens, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM); Sander van Cranenburgh, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM); Simeon Calvert, TU Delft – Faculty Of Civil Engineering and Geosciences (TUD-CEG); Maarten Kroesen, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM)

Noise pollution causes severe health implications, including hearing loss, insomnia, and cardiovascular diseases. Many of these health effects are mediated through increased annoyance and stress levels. With an estimated 22 million cases of high and chronic noise annoyance in Europe, the relationship between noise annoyance and health becomes even more important. While it is often assumed that noise annoyance leads to health implications, some studies have shown that health implications can increase our sensitivity to noise. Due to this increase in noise sensitivity, deteriorating health conditions may lead to increased self-reported noise annoyance. Yet, estimations of health effects from noise almost exclusively rely on cross-sectional data, which cannot account for this possible reverse causality. This study assesses potential bidirectional relationships between noise annoyance and health outcomes using random intercept cross-lagged panel models (RI-CLPM) on LISS panel data. RI-CLPMs are structural equation models specifically designed to analyse panel data regarding the influence that multiple variables have on each other over time. This makes them well suited for an analysis of the bidirectional relationships between noise annoyance and health outcomes. The LISS panel consists of a representative sample of Dutch households and surveys, among others, the health of household members and their housing conditions, including self-reported noise annoyance from neighbours, aviation, and street traffic. By analysing this data, we clarify to what extent health implications affect noise annoyance over time and vice versa.

From Finding to Re-using Confidential Data
Freek Dijkstra, SURF; Emma Schreurs, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG)

The Research Data Exchange (RDX) allows researchers to share confidential data in a controlled and secure manner, while also adhering to legal requirements and institutional policies. The RDX is a prototype that integrates existing data repositories with algorithm-to-data solutions. The prototype was build by SURF in collaboration with the UvA. At first glance, it simply provides glue that allows a seamless workflow from ODISSEI data portal to ODISSEI secure analysis environment. Yet, it is more profound: by validating data sharing conditions and providing logging and monitoring of all activities on a confidential dataset, it builds trust into the system, and helps to solve the Open Science Dilemma, by both allowing open science and on the other hand also solving sovereignty issues can limit the extent to which data can be openly shared.

Mental health workforce in primary care and the allocation of depression treatment: a data linkage study
Jesper Dros, Netherlands Institute for Health Services Research (Nivel); Christel van Dijk, Zorginstituut Nederland; Koen Böcker, Zorginstituut Nederland; Robert Verheij, Netherlands Institute for Health Services Research (Nivel); Bert Meijboom, Tilburg University; Jan-Willem Dik, Zorginstituut Nederland; Isabelle Bos, Nivel

Introduction: In 2014, major reforms took place in Dutch mental healthcare. The mental health nurse, introduced in 2008 as additional healthcare provider for individuals in need of mental healthcare in general practices, was expected to substitute treatments from general practitioners and providers in basic- and specialized mental healthcare (psychologists, psychotherapists, psychiatrists, etc.). The goal of this study is to investigate the association between the degree of deployment of mental health nurses and treatment by other mental healthcare providers.
Method: We conducted an observational study based on pseudonymized claims data of Dutch health insurers combined with electronic health records from general practices. Healthcare utilization patterns of individuals with depression treated in general practice or basic- or specialized mental healthcare were analyzed between 2014 and 2019 (N=32200). Both the proportion of individuals treated and time to treatment after depression onset were assessed in association with the degree of deployment of the mental health nurse in general practices.
Results: The proportion of individuals with depression treated by the GP (OR 0.94, 95% CI 0.90 – 0.98), in basic mental healthcare (OR 0.85, 95% CI 0.82 – 0.89) and in specialized mental healthcare (OR 0.90, 95% CI 0.87 – 0.94) was lower when mental health nurse deployment was high. Individuals from general practices with high nurse deployment are treated in basic mental healthcare at a later point in time. However, they remain in specialized mental healthcare for approximately the same duration and are not referred back to primary care earlier, irrespective of mental health nurse deployment at the GP.
Conclusion: Treatments for individuals with depression have shifted from the general practitioner and both basic- and specialized mental healthcare to the mental health nurse. Thereby, general practitioners can dedicate more time to other patients. Future research should focus on referral from specialized care back to basic- and primary care, as this anticipated shift has not been found.

On Text-based Personality Computing: Challenges and Future Directions
Qixiang Fang, Utrecht University – Faculty of Social Sciences (UU-FSW); Anastasia Giachanou, Utrecht University – Faculty of Social Sciences (UU-FSW); Ayoub Bagheri, Utrecht University – Faculty of Social Sciences (UU-FSW); Laura Boeschoten, Utrecht University – Faculty of Social Sciences (UU-FSW); Erik-Jan van Kesteren, Utrecht University – Faculty of Social Sciences (UU-FSW); Mahdi Shafiee Kamalabad, Utrecht University – Faculty of Social Sciences (UU-FSW); Daniel Oberski, Utrecht University – Faculty of Social Sciences (UU-FSW), UMCU

Text-based Personality Computing (TPC) refers to automatic personality assessment based on text data (e.g., tweets, essays). Many empirical studies and datasets, plus a few survey papers exist. However, there is not yet a position paper that reflects on the quality of current TPC research, suggests solutions and future directions, and combines perspectives from NLP and social sciences. In this paper, we review existing TPC research and identify 15 challenges deserving the attention of the research community. We organize these 15 challenges into the following 6 topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. Furthermore, in light of these challenges, we offer concrete recommendations for future TPC research:

– Personality taxonomies: Choose Big-5 over MBTI; Try modelling facets and using other taxonomies like HEXACO where appropriate.
– Measurement quality: Pay attention to measurement error in personality measurements, be they based on questionnaires or models; Try to reduce measurement error by design (e.g., choose higher-quality instruments; use better data collection practices); Provide quality evaluation (i.e., validity and reliability) for any new (and also existing) approaches.
– Datasets: Make TPC datasets shareable, which should also contain fine-grained personality measurements and descriptions of the target population;
– Performance evaluation: Report a diverse set of performance metrics; Report R2 for a regression task.
– Modelling choices: Make use of their psychometric properties when modelling personality traits (e.g., use joint modelling; modify the loss function to preserve the covariance information); For even better predictions, try incorporating personality questionnaire texts, applying data augmentation and dimensionality reduction techniques, as well as incorporating more personality-related variables.
– Ethics and fairness: Avoid unnecessary TPC; Apply TPC to clinical, professional and educational settings; Investigate fairness.
– Lastly, engage in (interdisciplinary) research work with survey methodologists, psychologists, and psychometricians.

The Hidden Divide: School Segregation of Teachers in the Netherlands
Rafiq Friperson, VU Amsterdam – School of Business and Economics (VU-SBE); Hessel Oosterbeek, University of Amsterdam – Faculty of Economics and Business (UvA-FEB);Bas van der Klaauw, VU Amsterdam – School of Business and Economics (VU-SBE)

We use Dutch register data to document the understudied phenomenon of teacher segregation. We show that teachers in primary and secondary schools in the four largest cities of the country – Amsterdam, Rotterdam, The Hague and Utrecht – are segregated in terms of their migration and social backgrounds. While segregation by social background is not much higher than what would be expected under random teacher-school assignment, segregation by migration background is substantial even after accounting for randomness. Relating schools’ teacher composition to their student composition, we find in most cases that schools with a high proportion of teachers from a particular background tend to have a high proportion of students from that same background.

Bridging Questionnaires and AI Models for Insights into the Intercorrelations of Big Five Traits
Anastasia Giachanou, Utrecht University – Faculty of Social Sciences (UU-FSW); Yucheng Chen, Utrecht University – Faculty of Social Sciences (UU-FSW); Qixiang Fang, Utrecht University – Faculty of Social Sciences (UU-FSW)

In recent years, the field of personality computing has witnessed significant progress, due to the integration of machine learning and natural language processing (NLP). While traditional approaches have relied on questionnaire-based methods, the advent of NLP has brought new opportunities for automated personality detection by analysing user-generated text. In particular, recent models such as BERT (Bidirectional Encoder Representations from Transformers) have demonstrated impressive performance across various tasks including automated personality trait assessment. Automated models are mainly evaluated based on their effectiveness to capture personality traits, while their intercorrelations are hardly discussed. This comes in contrast to the traditional approaches which have extensively analysed those intercorrelations. For example, positive correlations have been found between openness, extraversion, and conscientiousness, while neuroticism has shown negative correlations with conscientiousness. Comparing the traits’ intercorrelations when predicted by automated models with established psychological findings, we can examine whether the automated assessments align with the existing knowledge in the field. Our study aims to investigate and compare the performance and intercorrelations among the Big Five personality traits using two models: the Robustly Optimised BERT Pretraining Approach (RoBERTa), and Bi-directional Long Short-Term Memory (Bi-LSTM). Our experiments on PAN 2015 and PANDORA showed that RoBERTa outperforms Bi-LSTM in predicting personality traits. However, the performance of the models varied across the two datasets that suggests a diversity in the expression of personality traits. In addition, RoBERTa captured most of the sign (positive or negative) correlations present in the original datasets. We only observed differences in the pairings of extraversion-openness and neuroticism-conscientiousness in the PAN 2015 dataset. This indicates that RoBERTa primarily learned the correlations present in the annotated data. Our findings indicate that there are many limitations in the data collection process that need to be addressed in order to create more robust models for personality prediction.

Farmers’ Adaptation to Climate Change in Different Regions of the World: A Natural Language Processing Systematic Review
Sofia Gil-Clavel, TU Delft – TPM; Tatiana Filatova, TU Delft – TPM

Climate Change is expected to affect the global agriculture in negative ways if farmers fail to incrementally adapt early in the twenty-first century, or transformationally adapt during the second half of the century. Many publications from anthropology, sociology, economics, and geography discuss the underlying mechanism of farmers’ adaptation to Climate Change. However, researchers have not systematically analyzed their findings, mainly because this literature is primarily qualitative and vast. This work aims to analyze the factors that drive farmers’ adaptation to Climate Change in different regions of the world. Specifically, we aim to answer the following research questions: What drives farmers to adapt to climate change? Are there patterns in the mechanisms found in different regions of the world? Are there differences in the mechanisms behind farmers’ incremental adaptation and transformational adaptation? For this, first, we perform an active learning analysis of publications contained in Scopus in August 2022. Second, we analyze the articles’ findings using semi-supervised natural language processing. Finally, we derive a database that can be used as an input for generalized linear models. Preliminary results show that climate change evaluation factors (knowledge, believe, and concern) together with individual costs and benefits seem to be worldwide associated with farmers’ transformational adaptation to Climate Change.

Migration Policies and Immigrants’ Language Acquisition in EU-15: Evidence from Twitter
Sofia Gil-Clavel, TU Delft – TPM; André Grow; Maarten J. Bijlsma

In response to the increasingly complex and heterogeneous immigrant communities settling in Europe, European countries have adopted various civic integration measures. Measures aiming to facilitate language acquisition are considered crucial for integration and cooperation between immigrants and natives. Simultaneously, the rapid expansion of social media usage is believed to change the factors affecting immigrants’ language acquisition. However, only a few previous studies have analyzed whether this is the case. This article uses a novel longitudinal data source derived from Twitter to (1) analyze differences in the pace of immigrants’ language acquisition depending on the migration policies of destination countries and (2) study how the relative sizes of the migrant groups in destination countries, and the linguistic and geographical distances between origin and destination countries, are associated with language acquisition. Results show that immigrants who live in countries with strict language acquisition requirements for immigrants and conservative citizenship policies have the highest median times until language acquisition. Based on Twitter data, we also find that language acquisition is associated with classic explanatory variables, such as the size of the immigrant group in the destination country and the linguistic and geographical distance between origin and destination country similar to the previous studies.

Education, unions, and physiology: explaining the gap between intended family size and completed fertility for the 1974-1984 birth cohort of Dutch women
Rolf Granholm, University of Groningen – Faculty of Behavioural and Social Sciences (RUG-FGMW); Gert Stulp, University of Groningen – Faculty of Behavioural and Social Sciences (RUG-FGMW); Anne Gauthier, Netherlands Interdisciplinary Demographic Institute (NIDI)

Women have fewer children than they intend to in Europe, resulting in a gap between intended family size and competed fertility. The gap is mainly a result of first-pregnancy attempts being postponed to older reproductive ages, when physiological constraints make successful conception and birth increasingly difficult. Expansion of higher education and current union formation and -dissolution trends are considered two important factors behind this first-pregnancy postponement, which is why they are the focus of this study. Understanding the fertility gap is key to understanding why fertility is declining in Europe, as intended family size has been quite stable around two children per woman, whereas completed cohort fertility is declining. Event history analysis techniques commonly used in fertility research are ill equipped to model and explain this gap, because they are unable to explicitly model the physiological constraints on fertility. Our novel microsimulation approach grounded in human physiology overcomes this problem. With our model we measure how much education and recent union trends contributed to the fertility gap of a recent cohort of Dutch women. To do this we simulate the entire individual reproductive life courses of Dutch women born between 1974 and 1984, who recently have or soon will complete their fertility. We use mainly GGS I and LISS panel data for the behavioural inputs in our model. The physiological inputs are based on extensive modelling work by demographers and clinical literature. We validate our model with cohort data from the Human Fertility Database. By comparing our simulation estimates with regression estimates using our simulated data, we investigate whether failing to explicitly model the reproductive process leads to incorrect inferences. Our study brings us closer to understanding the gap between intended family size and completed fertility, as well as measuring how much fertility is affected by its individual determinants.

Synthetic Instagram Post Generation for Social Media Research
Lily Heisig, Maastricht University – Faculty of Science and Engineering (UM-FSE); Sander Lardinois, Maastricht University – Faculty of Science and Engineering (UM-FSE); Thales Bertaglia, Maastricht University – Faculty of Science and Engineering (UM-FSE); Adriana Iamnitchi, Maastricht University – Faculty of Science and Engineering (UM-FSE)

Social media platforms, such as Instagram, are valuable data sources for research. However, these platforms often restrict data sharing, making research replication difficult. Additionally, privacy concerns and ethical considerations complicate data sharing further. Moreover, real data is difficult to collect for infrequent instances, such as undisclosed sponsored content. Synthetic data provides a solution to these challenges, but only if it accurately represents the real data. This research uses large language models to generate synthetic Instagram posts and proposes evaluation measures to assess synthetic data quality. We focus on Instagram captions representative for distinguishing sponsored from non-sponsored content. We use GPT-3.5 to generate captions and assign authorship to such posts. We explore different prompt engineering techniques and suggest best practices for synthetic data generation. Our proposed evaluation metrics include simple statistical analysis, topic modelling, text complexity measures, sentiment analysis and measures extracted from the emerging networks of co-occurring hashtags, tagged users, and authors. We find that GPT-3.5 generates very realistic individual Instagram captions that include appropriate emojis, URLs, hashtags and user tags. However, it stays very generic, thus failing to provide (without explicit prompting) the special cases in which we are particularly interested. More importantly, it fails to output realistic distributions across text properties (such as sentiment) and network properties (such as hashtag co-occurrence). We develop a set of dataset characterisation metrics and a methodology for curating the synthetic dataset to better fit the metrics of real datasets.

Implicit Association Tests: Stimuli Validation from Participant Responses
Sally Hogenboom, Open University (OU); Katrin Schulz, University of Amsterdam – Faculty of Humanities; Leendert van Maanen, Utrecht University – Faculty of Social Sciences (UU-FSW)

The Implicit Association Test (IAT, Greenwald et al., 1998) is a popular instrument for measuring attitudes and (stereotypical) biases. Greenwald et al. (2021) proposed a concrete method for validating IAT stimuli: appropriate stimuli should be familiar and easy to classify – translating to rapid (response times < 800 ms) and accurate (error < 10%) participant responses. We conducted three preregistered analyses to explore the theoretical and practical utility of these proposed validation criteria. We first applied the proposed validation criteria to the data of 15 IATs that were available via Project Implicit. A bootstrap approach with 10,000 ‘experiments’ of 100 participants showed that 5.85 % of stimuli were reliably valid: we are more than 95% confident that a stimulus will also be valid in a new sample of 18-25 year old US participants. Most stimuli (78.44%) could not be reliably validated, indicating a less than 5% certainty in the outcome of stimulus (in)validity for a new sample of participants. We then explored how stimulus validity differs across IATs. Results show that only some stimuli are consistently (in)valid. Most stimuli show between-IAT variances which indicate that stimulus validity differs across IAT contexts. In the final analysis we explored the effect of stimulus type (images; nouns; names; adjectives) on stimulus validity. Stimulus type was a significant predictor of stimulus validity. Although images attain the highest stimulus validity, raw data shows large differences within stimulus types. Together the results indicate a need for revised validation criteria. We finish with practical recommendations for stimulus selection and (post-hoc) stimulus validation.

The Amsterdam Social Network Interventions for Health Simulator (ASNIHS): Building the evidence base to improve the resiliency and health of Amsterdam residents
Jiri Kaan, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Kristina Thompson, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Annemarie Wagemakers, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Spencer Moore, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG)

Recent advances in social network research have highlighted a variety of strategies for leveraging social networks to improve community-based interventions. Nonetheless, the role of social networks in interventions is frequently under-theorized or taken for granted. Furthermore, predicting which type of social network intervention would work best in which neighbourhood is sometimes difficult to test in practice. Therefore, in our study, we will develop an agent-based model to create an intervention planning tool called the Amsterdam Social Network Interventions for Health Simulator (ASNIHS). The ASNIHS will be developed in part in collaboration with community members and policy stakeholders, for example, through network mapping or interviews, to improve the development and adoption of social network strategies in Amsterdam health promotion initiatives. Other data would come from pre-existing data sources, such as ODISSEI and HELIUS. The ASNIHS will enable researchers and stakeholders to virtually test the impact of different social network intervention strategies in neighbourhoods on the resilience, health, and well-being of their residents; either with observed or hypothetical network characteristics. Simultaneously, the model will account for multiscale interactions and bidirectional feedback loop interactions, such as those originating from the communities’ geo-spatial environment. By making the agent-based model available for stakeholders via workshops, we hope to serve the community better and gain feedback that will help us improve the ASNIHS for future use. Understanding the efficacy of different social network interventions across neighbourhoods may contribute to improving population health while reducing health disparities.

Cardiometabolic diseases in autistic adults: research based on a nationwide population-based cohort study from the Netherlands
Yiran Li, University of Groningen – University Medical Center Groningen (RUG-UMCG); Tian Xie, University of Groningen – University Medical Center Groningen (RUG-UMCG); Lin Li, Örebro University – School of Medical Sciences; Harold Snieder, University of Groningen – University Medical Center Groningen (RUG-UMCG); Catharina Hartman, University of Groningen – University Medical Center Groningen (RUG-UMCG)

Previous studies suggest that Autism Spectrum Disorder (ASD) is associated with cardiometabolic diseases, but little is known about the full risk profile of cardiometabolic diseases in autistic men and women in different parts of the lifespan. We are currently conducting a longitudinal, population-based cohort study to investigate the prospective associations between ASD and cardiometabolic diseases including diabetes, hypertension, dyslipidemia, stroke, angina pectoris, myocardial infarction, and heart failure in adults with and without ASD. Additionally, we will explore if the associations differ by sex and age. Using nationwide data from the Central Bureau of Statistics (CBS), adults born between January 1, 1920, and December 31, 1996, are identified without pre-existing cardiometabolic diseases. The study period is from January 1, 2014 to December 31, 2020. Incident cardiometabolic disease events are identified according to diagnoses in CBS. Medication dispensation is additionally used to identify diabetes, hypertension, and dyslipidemia. ASD is identified according to the Diagnostic and Statistical Manual of Mental Diseases 4th edition (DSM-IV) diagnostic code in CBS. Hazard ratios (HR) with 95% confidence intervals (CI) will be calculated using Cox proportional hazards regression model, with ASD as a time-varying exposure. Sex, birth year, highest education level, income, obesity, tobacco use, and comorbid psychiatric disorders will be adjusted for in multiple steps of model fitting. Stratified analyses by sex and age bands (18-34, 35-64, and ≥65 years) will be conducted. The findings may inform the diagnoses, monitoring, and interventions to reduce cardiometabolic risk in autistic adults. On the 2nd of November, we will be able to present the results of this study during the ODISSEI conference.

Data linkage of health care registries to study silicone breast implants adverse events
Annemiek Lieffering, Netherlands Institute for Health Services Research (Nivel); Lotte Ramerman, Netherlands Institute for Health Services Research (Nivel); Juliëtte Hommes, Zuyderland Medical Center; Robert Verheij, Netherlands Institute for Health Services Research (Nivel); René van der Hulst, Maastricht University Medical Center; Hinne Rakhorst, Medical Spectrum Twente; Marc Mureau, Erasmus MC Cancer Institute

Objective: To assess the association of silicone breast implants (SBIs) with non-specific health symptoms and medical specialist care utilization. Design: Retrospective cohort studies (2013-2021) using data linkage with 1) the Dutch Breast Implant Registry, which includes almost all patients undergoing breast implant surgery, 2) Nivel Primary Care Database, containing routine electronic health records from over 500 general practices, 3) the Dutch health insurance claims database, including virtually all medical specialist care, and 4) Statistics Netherlands, providing socioeconomic indicators. Participants: Women with SBIs for cosmetic reasons and several control groups. Outcomes: The occurrence of thirteen non-specific health symptoms presented in general practice one year before implantation until three years after implantation. Visits to medical specialists were examined over a three-years period prior to explantation of SBIs. Results: Women with SBIs had an increased odds of non-specific health symptoms after implantation compared to before implantation (OR second year after implantation 2.13, 95% CI 1.22 to 3.72), as well as compared to women without SBIs (adjusted OR 1.44, 95% CI 1.04 to 1.98). Women who underwent explantation of SBIs were more likely to have visited >5 different medical specialties in comparison with control groups of women who underwent SBI replacement surgery (16.1% vs. 10.5%; p<0.001) and women without SBIs (16.1% vs 3.7%; p<0.001). Among explantation patients, women who underwent explantation because of non-specific health symptoms were more likely to have visited >5 different medical specialties compared to women who underwent explantation because of other reasons (31.6% vs. 14.6%; p<0.001). Conclusions: The research findings suggest an association between women who have SBIs for cosmetic reasons and non-specific health symptoms, that may be related to a high medical specialist care utilization. Data linkage of health care data has been demonstrated to be useful in epidemiological studies.

Evidence for severe mood instability in patients with bipolar disorder: Applying multilevel hidden Markov modelling to intensive longitudinal ecological momentary assessment data
Sebastian Mildiner Moraga, Utrecht University – Faculty of Social Sciences (UU-FSW); Emmeke Aarts, Utrecht University – Faculty of Social Sciences (UU-FSW)

Bipolar disorder (BD) is a chronic psychiatric condition characterized by large shifts in mood, energy, and cognitive functioning. Recently, the conceptualization of BD has shifted from alternating discrete episodes to a model of chronic cyclical mood instability. Recognizing and quantifying this mood instability may improve care and calls for high-frequency measures coupled with advanced statistical models. To uncover empirically derived mood states, a multilevel hidden Markov model (HMM) was applied to 4-month ecological momentary assessment (EMA) data in twenty patients with BD. EMA data comprised self-reported questionnaires (5 per day) measuring manic and depressive constructs using 12 items. Manic and depressive symptoms were further assessed by weekly administered self-reported questionnaires (i.e., Altman Self-Rating Mania Scale and Quick Inventory for Depressive Symptomatology Self-Report). Alignment between uncovered mood states and weekly questionnaires was assessed with a multilevel linear model. HMM uncovered four mood states (euthymic, manic, mixed, and depressive) that aligned with weekly symptom scores. On average, the duration of the states was <24h: patients remained in the mixed state the shortest (8.8h), followed by the manic (9.1h), depressive (16.3h), and euthymic (17.1h) states. States switched more frequently than weekly data suggested. In almost half of the patients, significant mood instability was observed. Large individual differences were observed in state duration and switching. The results suggest that mood instability could be a key feature of BD, which should be considered in theoretical and clinical conceptualizations of the disorder. Quantifying mood instability has the potential to improve the care of patients with bipolar disorder on a very individual scale.

The (Great) Persuasion Divide? Gender Disparities in Debate Speeches and Evaluations
Huyen Nguyen, Utrecht University – Faculty of Social Sciences (UU-FSW)

Do men and women persuade differently? Are they evaluated differently? This research investigates spoken verbal tactics across genders and their impacts on performance evaluations using a novel data set of 1517 speech transcripts, evaluation scores, and demographic data from the highest-profile inter-varsity debate tournaments between 2008 and 2018. Female debaters use a more personal and disclosing speaking style, with more hedging phrases and non-fluencies in their speeches. In their answers to opponents’ questions, women negate less, while having longer and more vague answers. In terms of evaluation, across debates, having a less analytical speaking style and more positive sentiment is associated with higher scores for speeches by women, but not by men. Within debates, except for non-fluencies, there is no robust evidence of gender-specific evaluation standards. These findings suggest that the gender score gap in top-tier debate tournaments is due to gender differences in persuasion tactics, rather than discrimination.

Data visualization for incomplete data
Hanne Oberman, Utrecht University – Faculty of Social Sciences (UU-FSW)

Missing data are ubiquitous in the social and human data sciences. Computational pipelines that accommodate incomplete data require exploration and evaluation of the missingness. The R package [ggmice](amices.org/ggmice) enables data analysts to visualize incomplete and imputed data.

Modeling Interactions in Product Reviews: Utilizing Formal Argumentation to Assess Strengths and Weaknesses of Products Across Different Aspects
Ji Qi, Netherlands eScience Center (NLeSC); Atefeh Keshavarzi Zafarghandi, Centrum Wiskunde & Informatica (CWI), VU Amsterdam – Faculty of Science (VU-Science); Davide Ceolin, Centrum Wiskunde & Informatica (CWI)

Product reviews serve as critical sources of information for potential buyers and valuable feedback for reviewed entities. Recognizing their pivotal role, companies increasingly appreciate the efficacy of helpful reviews as a marketing tool. Extensive interdisciplinary research, ranging from philosophy to artificial intelligence, has investigated factors influencing the prediction of helpful reviews. This study focuses on the creation of a valuable resource for new users by effectively analyzing product reviews and highlighting the strengths and weaknesses associated with different aspects of a product. To achieve this, we employ a multi-step approach. Firstly, we adopt a review segregation technique, dividing each review into discrete chunks, typically clauses, wherein each chunk represents a specific aspect of the product. Next, we apply a modified version of the Textrank algorithm to assign a numerical rank to each chunk, reflecting its relative importance within the overall review. We further employ topic modeling using BERT-based transformers to group chunks into aspects. By merging the chunks back into reviews, we ascertain the aspects covered in each review. Subsequently, we analyze the attacking relationships between reviews pertaining to specific aspects of the product. Leveraging argument mining approaches, we identify more reliable reviews based on the attack network of reviews, enabling us to discern the strengths and weaknesses of the product across different aspects through these reliable reviews. To evaluate the proposed approach, we have developed a series of GUI software tools within the scientific workflow platform, Orange3, and applied them to the Amazon Product Review dataset. The implementation in Orange3 offers intuitive graphical interfaces, tunable components, and visualization tools, enhancing user understanding of the underlying mechanisms and the significance of the output. The proposed framework contributes to enhancing the understanding of product interactions in reviews, aiding both consumers and businesses in making informed decisions.

Big data governance on digital twin technology for smart and sustainable tourism
Eko Rahmadian, University of Groningen, Campus Fryslan

We are witnessing a massive increase in the use of digital technologies. As one of the emerging concepts in artificial intelligence, machine learning, and the Internet of Things (IoT), Digital Twin (DT) technology can predict system responses before they occur. DT can be described as a virtual representation that defines the comprehensive physical and functional characteristics of the life cycle of a product. DT applications have been implemented in many sectors, including smart cities. Considering the rapid growth of new ICT applications in the tourism industry and digitization through IoT, we suggest that DT has the potential to be implemented in smart and sustainable tourism. By utilizing big data and other supporting resources, stakeholders will be able to create a virtual representation of a relevant region both by analyzing the flow of visitor activity and by determining the impact of their geographic and temporal patterns on other aspects and policies as a leverage on the use of big data for statistical products. However, we are also aware that compliance with regulations and communication among stakeholders such as data scientists, data analysts, statisticians, managers, and business analysts have become important issues. There are also numerous concerns about security, privacy, and trust. Therefore, our paper proposes a conceptual framework for DT on smart and sustainable tourism and a documentation framework for architectural decisions as a recommended way of governing such systems. This documentation framework provides benefits that shape how each stakeholder communicates and interacts using the system while adhering to rules and regulations to ensure trustworthiness, accountability, and transparency. With a theoretical case study and three case scenarios on the use of mobile positioning data in Indonesia as examples, we intend to demonstrate the applicability of our work.,

Consequences of delayed care in chronic disease management programs in Dutch patients with diabetes during the COVID-19 pandemic: a registry-based observational study
Corinne Rijpkema, Netherlands Institute for Health Services Research (Nivel); Lotte Ramerman, Netherlands Institute for Health Services Research (Nivel); Isabelle Bos, Netherlands Institute for Health Services Research (Nivel); Robert Verheij, Netherlands Institute for Health Services Research (Nivel)

Background: During the COVID-19 pandemic, chronic disease management programs (CDMP) at the general practitioner (GP) for Dutch patients with chronic conditions, such as diabetes, were delayed/postponed. This may have resulted in patients requiring unplanned care in other healthcare settings due to the development of acute complaints. Therefore, we investigated the changes in contact rates for patients with diabetes in 2020-2021 compared to 2019 regarding 1) CDMP care, 2) additional care in general practices, 3) care at out-of-hours GP services and 4) hospital care. Method: We linked GP data from the Nivel Primary Care Database with hospital data from Statistics Netherlands (CBS), to analyze proportional differences in contact rates across all four types of care, shown for each quarter of 2020-2021 compared to 2019. Results: Preliminary results indicated that for patients with diabetes, the care from CDMP decreased in 2020 and 2021 compared to 2019. In Q1-Q3 of 2020 and Q3 of 2021, GPs were contacted less often for additional diabetes care in general practices, but in the other quarters, there was increased GP care for diabetes compared to the same periods in 2019. During the pandemic, patients with diabetes visited out-of-hours GP services more often for their diabetes, except for Q3 of 2020 and 2021. These patients also had increased hospital care for diabetes in 2020, except for a brief decrease in Q2 compared to 2019. Conclusion: Delayed/postponed CDMP care for patients with diabetes may have led to increased diabetes-related care at out-of-hours GP services and hospitals in most quarters of 2020 and 2021 compared to 2019. It is crucial to maintain monitoring due to potential long-term consequences of COVID-19 beyond 2021.

How do regions adopt new technologies? AI adoption in the Netherlands
Harm-Jan Rouwendal, University of Groningen, Faculty of Spatial Sciences (RUG-FRW); Sierdjan Koster, University of Groningen – Faculty of Spacial Science (RUG-FRW); Tersa Farinha, UNU-MERIT

How do regions adopt new digital technologies: by following general trends or first movers? This paper studies AI adoption in regional labour markets. We use the demand for AI-related skills in online job vacancies for the period 2010-2020 for the Netherlands as a measure of AI adoption in firms. We document a rapid increase of AI related vacancies in the past decade, especially in the professional and technical occupations and in urban areas. Moreover, we find spatial differences in AI adoption that can only be partly explained by sector structure and the effect of agglomeration economies. We hypothesize that firms are more prone to adopt AI if similar firms in same regions already use AI. This would indicate that local knowledge spillovers remain important in the regional adoption of new digital technologies and the subsequent automation process.

Who is misinformed, who is a conspiracist? Investigation of psychological factors associated with susceptibility to misinformation and conspiracy theories with a network analysis approach
Selin Topel, Leiden University – Faculty of Social Sciences (UL-FSW); Ili Ma, Leiden University – Faculty of Social Sciences (UL-FSW); Ellen de Bruijn, Leiden University – Faculty of Social Sciences (UL-FSW)

In today’s digital era, global events like the COVID-19 pandemic, the war in Ukraine, and climate change have the potential to trigger swift dissemination of misinformation through social media platforms. During times of heightened uncertainty, individuals may also resort to conspiracy thinking as a maladaptive way to cope with anxiety and uncertainty. Unfortunately, this widespread misinformation and endorsement of conspiracy theories can have severe negative consequences on both personal and societal levels, including increased polarization, mental health problems, and rising aggression. Despite the evident dangers of misinformation spread and conspiracy thinking, our understanding of specific factors that either escalate or mitigate susceptibility to these tendencies remains limited. Past research has investigated various psychological factors, such as Intolerance of Uncertainty, information processing biases, and personality traits, but often in isolation or with a narrow focus. The multifaceted nature of these tendencies demands an interdisciplinary approach. In this study, we aimed to uncover individual and psychological factors influencing susceptibility to misinformation and conspiracy thinking. We collected data from a sample of 214 healthy adults (ages 18-35) residing in the UK through an online study. Participants completed self-report questionnaires and an information sampling task. By constructing a network based on the various questionnaire constructs and computational model parameters, we gain insights into the interactions between psychological factors linked to misinformation and conspiracy thinking. This integrative approach provides a more comprehensive understanding of the complex relationships among these factors. Our study contributes to the advancement of knowledge in this critical area of research and can potentially inform strategies to address susceptibility to misinformation and conspiracy thinking. Understanding the underlying factors in the context of their interplay is essential for developing targeted interventions to promote a better informed and resilient society.

Better Together: Exploring the complex dynamics of classroom interaction through social network analysis
Nina van Graafeiland, Utrecht University – Faculty of Social Sciences (UU-FSW); Mahdi Shafiee Kamalabad, Assistant Professor; Nienke Smit, Utrecht University – Faculty of Social Sciences (UU-FSW)

The quality of classroom interaction is an important factor in the effectiveness of language lessons, but managing and sustaining interaction in large, diverse classrooms is a complex and dynamic process. This means that the process is constantly changing and is impacted by multiple internal and external components. A computational method that can possibly be used to investigate these complex dynamic systems is social network analysis. The application of social network models to classroom data allows us to quantify how students and teachers interact in real-time and unravel the drivers behind this behavior. The first study in this project compared the assumptions and principles of complex dynamic systems theory as a metatheory and social network analysis as a toolkit and investigated how a specific social network model (i.e., the Relational Event Model (REM)) can be applied to classroom data in order to better explore this complex dynamic system. In this study, we generated synthetic network data to test the use of this model in an educational context, in preparation of real-life data-analysis that will be conducted in the following studies. The REM can provide accurate estimations of how specific drivers (e.g., timing, inertia, reciprocity, and gender) might shape which, and when interactions occur. In turn, it can inform us on the evolution of the system over time.

Rising through the Ranks: Firms and Intergenerational Mobility
Oskar Veerhoek, Radboud University Nijmegen – Faculty of Management Sciences (RU-FdM)

Across the developed world income inequality is on the rise whereas intergenerational mobility is in decline. Current generations will likely experience less equality of opportunity than any generation since the Second World War. Many studies of intergenerational mobility focus on the transmission of human capital from parents to children, but few consider the role of firms. This project is made possible by the ODISSEI Microdata Access Grant, which funds access to CBS microdata. With recently published CBS microdata files, this project sets out to study the effect of firms on intergenerational mobility in the Netherlands. A conceptual model of firm impact on mobility is developed based on insights from sociology and economics. Firms are expected to affect intergenerational mobility through a combination of firm pay premia and socioeconomic inclusiveness of hiring and promotion. The results of this study will add to the burgeoning interdisciplinary literature on intergenerational mobility. They will also have policy implications for both firms and governments.

COMMA: An agent-based micro-simulation model to study mental health outcomes during COVID-19 lockdowns
Eva Viviani, Netherlands eScience Center

The COVID-19 pandemic has led to an increase in known risk factors for mental health problems. This evidence has underscored an urgent need for models that can project mental health outcomes over time, explore lockdown scenarios, and target the needs of specific populations. Here we describe COMMA (COvid Mental-health Model with Agents), a new open-source microsimulation model developed to help address these questions. COMMA takes as input demographic information on age structures, population size, etc; and lockdown policies operationalised as a set of action probabilities. The lockdown policies affect the likelihood of individuals developing mental health issues, notably depression, based on demographic profile. Implemented purely in Python, COMMA has been designed with equal emphasis on performance, ease of use, and flexibility. Users can customise lockdown scenarios and population characteristics and execute simulations on a standard laptop within minutes. In a collaboration between the Netherlands eScience center and Wageningen university, COMMA has already been employed to assess the impact of various lockdown strategies on the mental well-being of a population resembling the demographic profiles of the inhabitants of the Groningen area. 

Setting up a Governance Framework for Secondary Use of Routine Health Data in Nursing Homes: Development Study Using Qualitative Interviews
Yvonne Wieland-Jorna, Netherlands Institute for Health Services Research (Nivel); Robert A Verheij, Netherlands Institute for Health Services Research (Nivel); Anneke L Francke, Netherlands Institute for Health Services Research (Nivel); Marit Tomassen, Netherlands Institute for Health Services Research (Nivel); Max Houtzager; Karlijn J Joling; Mariska G Oosterveld-Vlug

Background: In the nursing home sector, reusing routinely recorded EHR data for knowledge development and quality improvement is in its infancy. A data governance framework (who may access the data, under what conditions, and for what purposes) can help obtaining trust in appropriate and responsible data reuse. Little guidance is available on development and implementation of data governance frameworks in practice. Objective: To describe the development process of a governance framework for the “Registry Learning from Data in Nursing Homes” – a Dutch national registry for EHR data on care delivered by nursing home physicians. Methods: Interviews were conducted with stakeholders representing practices, policies, and research in the nursing home sector. Main aim was to explore perspectives regarding the Registry’s aim, data access criteria, and governing bodies’ tasks and composition. Interview topics and analyses were guided by 8 health data governance principles. Interview results, together with legal advice and consensus discussions were used to shape the rules, regulations, and governing bodies of the framework. Results: Stakeholders saw the involvement of nursing home residents, nursing home physicians, boards of directors, and scientists as a prerequisite for a trustworthy data governance framework. For the Registry, their involvement can be achieved through a consent procedure, transparency, and a position in a governing body. In addition, a data request approval procedure indicates that data reuse by third parties aligns with the aims of the Registry, benefits the nursing home sector, and protects the privacy of data subjects. Conclusions: Stakeholders’ views, expertise, and knowledge of other frameworks and relevant legislation serve to inform the application of governance principles to the contexts of both the nursing home sector and the Netherlands. Engagement of the full range of stakeholders in an early stage of governance framework development is important to generate trust in appropriate and responsible data reuse.

Maximizing Panel Research Project Management with Cutting-Edge Panel Management Software
Arnaud Wijnant, Centerdata

Effective panel research requires the seamless integration of various components, from data collection methods such as CAWI (Computer-Assisted Web Interviewing) and PAPI (Paper and Pencil Interviewing) to robust software solutions, efficient helpdesk support, incentivization strategies, stringent security measures, and comprehensive monitoring of privacy. This paper explores the critical role of panel management software in facilitating successful panel research projects while ensuring optimal data quality and participant satisfaction based on the experiences we have with projects like the LISS panel project. Panel management software serves as the cornerstone of panel research, offering invaluable features to researchers. It enables the creation and management of online surveys using CAWI, allowing for efficient data collection and analysis. Additionally, it facilitates the seamless transition to PAPI when needed, ensuring flexibility in diverse research scenarios. To ensure smooth operations, a helpdesk support system is part of panel management. Assistance plays a vital role in resolving technical issues, addressing participant queries, and ensuring uninterrupted research activities. Incentives are integral to maintaining panelist engagement and motivation. The software enables the implementation of incentives. These incentives not only enhance participant retention but also encourage active participation, yielding reliable and high-quality data. Security and privacy are of paramount importance in panel research. Robust software solutions employ stringent security measures to safeguard participant information, ensuring compliance with data protection regulations. Monitoring plays an important role in maintaining data quality and panel integrity. Effective panel management software offers comprehensive monitoring capabilities, allowing researchers to track participant activity, detect anomalies, and identify potential sources of bias or fraudulent behavior. This monitoring ensures the reliability and credibility of research outcomes. In conclusion, panel management software forms the backbone of successful panel research projects. By leveraging these essential components, researchers can conduct rigorous and reliable studies while upholding participant privacy and data integrity.

The Contagion of Collective Action: Analyzing Spatial Diffusion of Citizen Care Collectives in The Netherlands
Kevin Wittenberg, Utrecht University – Faculty of Social Sciences (UU-FSW); Rense Corten, Utrecht University – Faculty of Social Sciences (UU-FSW)

In recent years, The Netherlands has seen a resurgence of collective action by citizens to take charge and organize the provision of (in)formal care services among themselves. To date, efforts to understand which communities are able to mobilize themselves for such collective action have been scarce and yield low explanatory power. Existing research predominantly takes the perspective that the emergence of collectives can be explained based on static characteristics of residents and their wider community. In this paper, we expand beyond this perspective, and argue that the emergence of care collectives may also be explained endogenously, which means that citizen collectives in one area can shape the conditions for collectives to emerge in its vicinity. We argue that this may be caused by direct contact between citizens and collectives, but also indirectly through legitimization processes and improved understanding between citizens and local government. We take a two-step approach to test our hypotheses. First, we use a data-driven approach to predict the emergence of citizen collective for care based on administrative data. We do so to adjust for potential confounding causes of spatial clustering. We then investigate the spatial correlation of the residuals of this model to test for diffusion of collective action. We differentiate between within-municipality and between-municipality diffusion. Preliminary results indicate that citizen collectives are indeed spatially clustered, even when accounting for demographic, political and geographic characteristics, which supports the notion that citizen collectives shape conditions for new collectives to emerge in their surroundings. The explainability of collectives based on administrative data at the zipcode level is low. We are still investigating the stability of the results under various model specifications, exploring the addition of more data in the predictive model, and alternative measurement of what constitutes a neighboring region.

How Parental Unemployment Influences Children‘s Educational Attainment and Income: The role of Parental Economic Resources and Parental Mental Health
Flora Zhou, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)

Although the negative impact of parental unemployment on children’s outcomes has been investigated in previous work, the mechanisms underlying the intergenerational effect still remain unclear and the heterogeneous effect of parental unemployment in families with different income levels is not fully addressed yet. This research examined the effect of parental unemployment on children in the Netherlands, focusing on children’s highest educational attainment and early-career income. We explored whether the intergenerational effect of unemployment is explained by two essential mechanisms, namely parental economic resources and parental mental health. Furthermore, we investigated how the buffering effect of family incomes works in the process of the intergenerational effect of parental unemployment. We linked data from the Longitudinal Internet Studies for the Social Science (LISS) and the Social Statistical Data sets from Statistics Netherlands (SSD), deriving children’s information from the SSD and their parents’ information from the LISS panel. In the end, we have 1283 children who were born between 1989-1996 in our sample. Given that we aim to estimate the effect of parental unemployment on children’s educational attainment and income when they were between 23 and 30 years old, we only measured parental information before children were at their age of 23 to address the potential causal issues. The Structural Equation Modeling (SEM) results indicate that parental unemployment does not influence children’s educational attainment and early-career income after controlling for parental economic resources and parental mental health before the incidence of unemployment. Despite the lack of evidence on the intergenerational effect of unemployment, we found that unemployment significantly decreases paternal income and maternal mental health. Our results also suggested that maternal mental health after unemployment positively influences children’s educational attainment.

13.00-14.00 – Parallel session 2

Chair: Adriana Iamnitchi

  • Social networks’ influence on planetary diet adoption: An agent-based modelling approach
    Anh Pham, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Kristina Thompson, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Spencer Moore, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Roger Cremades, Utrecht University – Faculty of Geosciences (UU-Geo); Dr. Laura Bouwman, Wageningen University
  • Modelling urban economic segregation
    Clémentine Cottineau, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE)
  • Conditional Effects of Media and Interpersonal Communication: An Agent-Based Approach to Opinion Formation on Brexit in London
    Isabela Zeberio, University of Amsterdam – Faculty of Humanities, European Studies

Social networks’ influence on planetary diet adoption: An agent-based modelling approach
Anh Pham, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Kristina Thompson, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Spencer Moore, Wageningen University & Research – Wageningen Social Science Group (WUR-SSG); Roger Cremades, Utrecht University – Faculty of Geosciences (UU-Geo); Dr. Laura Bouwman, Wageningen University
The current global food system is both a major contributor to greenhouse gas emissions, and has propagated an obesogenic environment. Wider adoption of planetary diets, which focus on eating locally, seasonally, and plant-based, may help improve both human and environmental health. Evidence suggests that planetary diets may lower the risk of several common chronic diseases, and may help nurture sustainable food systems, improve biodiversity and reduce use of pesticides and fertilizers. To foster a broader shift toward planetary diets, a promising approach is using social networks to encourage positive food choices. In theory, people’s behaviors are strongly influenced by social norms and the behaviors of other. However, previous studies have mainly investigated social networks in terms of negative health behaviors, e.g. obesity. There is scant evidence of the role social networks may play in the adoption of positive health behaviors. This study helps to address this gap. We study how social networks may influence the diffusion of planetary diets. To that end, we are developing an agent-based model (ABM) to simulate the reluctance of individual food choices driven by their social networks’ influences. Using ABM will allow us to test different hypothesis in social contagion and understand how social phenomena, specifically, the shift towards plant-based diet could emerge from individual interactions.To ensure a realistic simulation structure, we put forth an unprecedented approach for this line of research, which is using residential demographics and social networks data to model agents’ attributes and interactions. The data was collected in Northern cities of the Netherlands by Lifelines.With this model, we seek to understand the degree to which social network could encourage the adoption of planetary diets. We will test and observe social shifts in our synthetic population’s diets under varying durations and sequence of social ties. Ultimately, we aim to estimate the extent to which broader planetary diet adoption is plausible.

Modelling urban economic segregation
Clémentine Cottineau, TU Delft – Faculty of Architecture and the Built Environment (TUD-ABE)
Urban economic segregation is socially harmful, driven (among other factors) by increasing economic inequality, although the two concepts are studies in different disciplines (respectively geography/sociology and economics) and at distinct geographical scales (respectively the urban and the national scale). In this presentation, I would like to discuss the opportunity of integrating multidisciplinary knowledge about economic inequality and urban segregation in causal models belonging to two complementary epistemologies: statistical modelling and agent-based modelling. I will first present the preliminary results of a multidisciplinary systematic review of the literature on the links between economic inequality and urban segregation, focusing on the causal paths and hypothesized channels and mechanisms through which they impact one another. I will then present the empirical and simulation strategies considered to identify the causal factors of urban economic segregation in the Netherlands, and the role of CBS microdata in this exercise.

Conditional Effects of Media and Interpersonal Communication: An Agent-Based Approach to Opinion Formation on Brexit in London
Isabela Zeberio, University of Amsterdam – Faculty of Humanities, European Studies
This study aims to explore the complex dynamics of media and interpersonal communication in order to gain a deeper understanding of their role in shaping public opinion. It focuses specifically on the unique scenario of the Brexit campaign in London, where the decision of the United Kingdom to leave the European Union not only highlighted the significance of public opinion on European integration but also revealed the fragility of EU regime support. Despite being a pro-Remain city within a pro-Leave country, London’s exceptional case has been overlooked in the existing literature that attributes the outcome of Brexit to the influence of Eurosceptic media. This raises the question of whether the influence of media has been overstated, as studies often examine its role without considering the broader context and without sufficiently balancing the potential counter-effects, such as interpersonal communication. To address these gaps, we proposes the application of a data-driven agent-based modeling (ABM) framework using GAMA (GIS Agent-based Modeling Architecture) platform. The ABM framework is based on a simple political information flow model that acknowledges both media and interpersonal communication as important sources of political information, and on the filter-hypothesis, which suggests that the discussion network has the ability to moderate the effect of the media depending on the composition of the network (homogenous or heterogenous) and the content of the message (congruent or dissonant with respect to prior attitudes). The results are not conclusive at this stage, as the work is still in progress. The primary contribution of this article is the use of ABM built in GAMA to simulate the conditional effect of media and interpersonal communication. This study aims to demonstrate the analytical value of computational social science approaches in addressing the limitations of traditional experimental and observational studies.

Chair: Marjolijn Das, Statistics Netherlands (CBS), School of Social and Behavioural Sciences (EUR-ESSB)

  • A Longitudinal Whole Population Network for the Netherlands
    Edwin de Jonge, Statistics Netherlands (CBS); Jan van der Laan, Statistics Netherlands (CBS); Marjolijn Das, Statistics Netherlands (CBS); Vincent de Heij, Statistics Netherlands (CBS); Daniëlle ter Haar, Statistics Netherlands; Lucille Mattijsse, Statistics Netherlands; Marieke de Vries, Statistics Netherlands
  • Decline in Education Segregation between 2009 and 2020
    Marieke de Vries, Statistics Netherlands (CBS); Marjolijn Das, Statistics Netherlands (CBS); Daniëlle ter Haar, Statistics Netherlands (CBS); Vincent de Heij, Statistics Netherlands (CBS); Edwin de Jonge, Statistics Netherlands (CBS); Jan van der Laan, Statistics Netherlands (CBS); Lucille Mattijssen, Nederlandse Arbeidsinspectie
  • You as well? Separation of Couples and the Prevalence of Union Dissolution in Their Social Network
    Willem Vermeulen, Netherlands Interdisciplinary Demographic Institute (NIDI); Marjolijn Das, Statistics Netherlands (CBS)

A Longitudinal Whole Population Network for the Netherlands
Edwin de Jonge, Statistics Netherlands (CBS); Jan van der Laan, Statistics Netherlands (CBS); Marjolijn Das, Statistics Netherlands (CBS); Vincent de Heij, Statistics Netherlands (CBS); Daniëlle ter Haar, Statistics Netherlands; Lucille Mattijsse, Statistics Netherlands; Marieke de Vries, Statistics Netherlands
In 2020 CBS developed a network covering the entire Dutch population of October 2018 (Van der Laan et al. 2023). Since then, the network has been developed further, and now covers the period of 2009 up until 2020. For each year the network covers the entire Dutch population on the 1st of January of each year. The network consists of five main types of relations: family, household, neighbours and neighbourhood, colleagues, and class mates. Each of the layers has improvements over its previous version. For each year the network consists of approximately two billion relations. The fact that the network data describes more than a decade allows for new interesting research in demographic developments.
We will describe the rules used to derive the network with focus on the differences with the previous version of the network and will describe properties of the network.

Decline in Education Segregation between 2009 and 2020
Marieke de Vries, Statistics Netherlands (CBS); Marjolijn Das, Statistics Netherlands (CBS); Daniëlle ter Haar, Statistics Netherlands (CBS); Vincent de Heij, Statistics Netherlands (CBS); Edwin de Jonge, Statistics Netherlands (CBS); Jan van der Laan, Statistics Netherlands (CBS); Lucille Mattijssen, Nederlandse Arbeidsinspectie
Statistics Netherlands (CBS) has constructed a network encompassing all Dutch residents. This network, consisting of five layers – household members, family, neighbours, colleagues, and classmates – has been derived from administrative data. Various choices and methods have been employed to identify these relationships.
This year, the CBS has released an Education Segregation dashboard based on this network. The dashboard provides insights into the level of segregation among different educational levels. Education segregation occurs when individuals with the same educational background cluster within the network. Strong segregation in a society is typically viewed as unfavourable, as it can reinforce social polarization and hinder social mobility. Segregation within the network is measured using a random walk, followed by adjustments for the person’s surrounding environment.
The dashboard shows that segregation between individuals with different educational levels in the Netherlands has decreased between 2009 and 2020. This decrease was particularly prominent among individuals with lower levels of education: their network included relatively fewer individuals with the same educational level and more individuals with different educational backgrounds. In 2020, segregation was highest among individuals holding a master’s degree (from a university or hbo) and lowest among those with a bachelor’s degree.
The dashboard provides insights into education segregation at the municipal level. It shows for example that segregation in the Randstad region is generally higher compared to the rest of the Netherlands. This allows policymakers and other interested parties to obtain a more localized understanding of education segregation in their respective regions.

You as well? Separation of Couples and the Prevalence of Union Dissolution in Their Social Network
Willem Vermeulen, Netherlands Interdisciplinary Demographic Institute (NIDI); Marjolijn Das, Statistics Netherlands (CBS)
Previous union dissolution research shows that couples are more likely to separate when their parents or siblings have separated before. While it is likely that social norms regarding union dissolution are not only shaped by family members, it remains unclear whether earlier separations among other types of connections can be linked to higher risks of union dissolution as well. This study investigates if, and to what extent, couples are more likely to separate when union dissolution is more prevalent in their broader social network. We examine this question in the Dutch context, using a sample based on Dutch Register Data, which includes Dutch 25-to-50-year-old married and cohabiting couples (N=961,255), whom we follow between 1 October 2018 and 1 March 2020. For each couple, the Dutch Population Network 2018 (derived by Statistics Netherlands) is used to identify family members, neighbors, coworkers, and the parents of their children’s classmates. We use logistic regressions to test our hypotheses, taking into account the type of connection to other households, as well as how long ago union dissolution in these households occurred. Preliminary results indicate that couples are more likely to separate when union dissolution is more prevalent in their network, and that more recent union dissolutions in a couple’s network are associated with a larger increase in the couple’s union dissolution risk than less recent union dissolutions.

Chair: Daniel Oberski

  • A platform to configure and monitor data donation studies
    Laura Boeschoten, Utrecht University – Faculty of Social Sciences (UU-FSW); Theo Araujo, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Adrienne Mendrik, Eyra Leap B.V.; Niek de Schipper, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Heleen Janssen, University of Amsterdam; Emiel van der Veen, Eyra Leap B.V.
  • FIRMBACKBONE
    Wolter Hassink, Utrecht University – Faculty of Law, Economics and Governance (UU-REBO); Wim Coreynen, Zhejiang University School of Management; Peter Gerbrands, Utrecht University – Faculty of Law, Economics and Governance (UU-REBO); Daniel Oberski, Utrecht University – Faculty of Social Sciences (UU-FSW); Rutger Schilpzand; Arjen van Witteloostuijn; Zaman Ziabakhshganji
  • Teaching computational methods and programming with Jupyter Notebooks
    Caspar van Leeuwen, SURF

Abstracts

A platform to configure and monitor data donation studies
Laura Boeschoten, Utrecht University – Faculty of Social Sciences (UU-FSW); Theo Araujo, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Adrienne Mendrik, Eyra Leap B.V.; Niek de Schipper, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Heleen Janssen, University of Amsterdam; Emiel van der Veen, Eyra Leap B.V.
Digital traces left by citizens during the natural course of modern life hold an enormous potential for social-scientific discoveries, because they can measure aspects of our social life that are difficult or impossible to measure by more traditional means. As of May 2018, the EU General Data Protection Regulation obliges any entity, public or private, that processes personal data of citizens of the European Union to provide that data to the data subject (the person to whom the data pertains) upon their request, in digital format. Most major private data processing entities, comprising social media platforms as well as internet service providers, search engines, photo storage providers, e-mail providers, banks, energy providers, and online shops comply with this right to data access, by providing the data subjects with so-called ‘Data Download Packages’ (DDPs).
We have introduced a workflow and corresponding software to allow the collection and analyses of digital traces on the DDPs, while preserving the right to privacy and data protection of research participants.However, as a researcher interested in preparing a data donation study, expertise on various domains is required, such as on IT and programming to configure the study, but also on how to preserve privacy, ethics and the use of an appropriate methodology. To guide and assist researchers through this challenging process, we are developing an online platform allowing researchers to configure, host and monitor their data donation studies. During this presentation, I discuss the key functionalities of this platform such as data extraction, data storage and progress monitoring, and how they align with the GDPR and ethical requirements.

FIRMBACKBONE
Wolter Hassink, Utrecht University – Faculty of Law, Economics and Governance (UU-REBO); Wim Coreynen, Zhejiang University School of Management; Peter Gerbrands, Utrecht University – Faculty of Law, Economics and Governance (UU-REBO); Daniel Oberski, Utrecht University – Faculty of Social Sciences (UU-FSW); Rutger Schilpzand; Arjen van Witteloostuijn; Zaman Ziabakhshganji
Since the internet and the rise of new digital technologies such as cloud computing, web scraping, and machine learning, the toolbox to collect and analyze data has expanded enormously as has the complexity. With the exponential growth of digital data, universities, businesses, and governments now have access to a potentially unprecedented wealth of information. However, academic researchers and students in (applied) economics and adjacent fields (e.g., strategy, management, entrepreneurship, economic geography, history) currently lack access to a comprehensive and longitudinal data source on Dutch companies. Often-heard complaints are that such data are too expensive for researchers to obtain and not user-friendly.These developments have raised the importance of secure data management and the protection of data ownership and intellectual property rights. The aim of this presentation is twofold. First, we present a conceptual framework to develop a better understanding of the dynamic and evolutionary process of platform development for data sharing for scientific research, specifically in the social sciences, business, and economics. Second, we illustrate the case of FIRMBACKBONE, which is a data infrastructure that contains information on Dutch companies. The information is derived from the Commercial Register of the Dutch Chamber of Commerce. The data are provided in a secure data environment and will be accessible for students and researchers through SANE. Currently, FIRMBACKBONE contains information on the number of employees, economic sector, region, and financial indicators. In addition, the data can be enriched by including additional information from researchers that utilize FIRMBACKBONE. Finally, by having scraped information from corporate websites, we can distill additional information through text analyses. In this presentation, we will pay attention to the architecture of the infrastructure, and we will provide some initial statistical analyses.

Teaching computational methods and programming with Jupyter Notebooks
Caspar van Leeuwen, SURF
Computational methods and programming are increasingly important to social sciences. Thus, development of these skills is getting a more prominent place in the curriculum of social studies. As researchers, many of you are also responsible for teaching. But how do you teach computational methods and programming to students? What environment would you use?Jupyter Notebooks provide an interface to easily mix code and text. They are commonly used in research, but also very practical for teaching. They provide a natural way of mixing instructions, sample code and exercises in a single file. Jupyter Notebooks can be run locally, on a laptop, but that requires each student to set up their own software environment – something that can be tricky and time consuming. The SURF Jupyter for Education service provides a Jupyter Notebook environment in which you, as a teacher, can easily share notebooks and data with your students, and where you have control over your software environment. This talk will consist of two parts. The first part will focus on the general Jupyte recosystem: how Jupyter Notebooks work, how they can be executed on a remote server and what (public) Jupyter services exist. The second part will focus more specifically on the Jupyter for Education service that SURF has developed, and how it can facilitate programming courses.

Chair: Colette Bos

  • Utility and privacy in generating synthetic social science data
    Chang Sun, Maastricht University – Faculty of Science and Engineering (UM-FSE); Michel Dumontier, Maastricht University – Faculty of Science and Engineering (UM-FSE); Flavio Hafner, Netherlands eScience Center (NLeSC)
  • Are occupations “bundles of skills”? Identifying latent skill profiles in the labour market using topic modeling
    Marie Labussière, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Thijs Bol, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG)
  • Dealing with overfitting in Sentiment classification
    Minsi Li, Twente University – Faculty of Behavioural, Management and Social Sciences (UT-BMS)

Abstracts

Utility and privacy in generating synthetic social science data
Chang Sun, Maastricht University – Faculty of Science and Engineering (UM-FSE); Michel Dumontier, Maastricht University – Faculty of Science and Engineering (UM-FSE); Flavio Hafner, Netherlands eScience Center (NLeSC)
Modern computational social science aims to gain insights into the human experience by utilizing various data sources and methods. However, the sensitivity of both qualitative and quantitative data concerning individuals due to privacy concerns and data protection issues hinders the greatest use of social science data. One promising approach to tackle this challenge is to generate synthetic social science data that is structurally and statistically similar to the real data. The synthetic data will be useful in the exploratory research phase to determine the usability of the real data for answering specific research questions. In this work, we collaborated with NRO, CBS, and SURF (OSSC) to generate realistic and privacy-preserving synthetic data using cognitive student data from CBS. We advanced the basic Generaive Adversarial Network model by consisting four components including transformation, sampling, conditioning, and networking training with differential privacy. The generator was designed to capture the relations between variables in real data and simulate the same relations in the synthetic data. To motivate the generator to create diverse and representative synthetic data, we apply Wasserstein distances with gradient penalty and then group the training samples in the discriminator. Finally, we provide a privacy guarantee through a differential privacy approach that injects Gaussian noise into the penalty gradients in the training process. Under a certain differential privacy threshold, the synthetic data will not leak sensitive information originating in the source data. We evaluated the quality of the synthetic data by comparing the analyses results on real and synthetic data and assessed the privacy risk using information disclosure meatures and attacker models. We found that stronger protection of privacy reduces quality of the synthetic data in terms of similarity to the original data, and consequently becomes less “useful” as a direct proxy to those data. Therefore, this work also discussed the trade-off between data utility and privacy in generating and using synthetic data in practical application in social science.

Are occupations “bundles of skills”? Identifying latent skill profiles in the labour market using topic modeling
Marie Labussière, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Thijs Bol, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG)
In the face of rapid technological change, a growing body of research has argued that the key determinant of employment growth is the nature of the tasks performed on the job and the extent to which they are complementary to IT devices. However, to make inferences about the task or skill content of jobs, previous studies have often relied on standard occupational classifications, thereby implicitly assuming that occupations consist of well-defined and homogeneous “bundles of tasks”. In this article, we evaluate this assumption using a unique dataset of 60 million online job postings in the UK. Based on the skill requirements of job ads, we map the skill structure of the labor market and analyze its relationship to existing occupational classifications. Rather than assuming an a priori skill structure, we identify latent skill profiles in the job postings using the biterm topic model. This allows us to define each job ad by a probability vector in a parsimonious and meaningful skills space. Then, for each occupation, we define its empirical distribution over the skill space based on the job ads categorized in that occupation. Are the empirical distributions of occupations distinct, or do they substantially overlap? We investigate this question using the Maximum mean discrepancy (MMD), a non-parametric distance between probability distributions. Our findings reveal both significant overlap between occupations and significant heterogeneity in skill content within occupations, even using the detailed 3-digit classification. These results challenge the usefulness of occupations as proxies for skills and offer new perspectives for the analysis of labor market stratification. In a final step, we propose a data-driven typology of skills using the density-based clustering method DBSCAN.

Dealing with overfitting in Sentiment classification
Minsi Li, Twente University – Faculty of Behavioural, Management and Social Sciences (UT-BMS)
Social media allows researchers to identify public emotions from social media posts and comments with sentiment classification, which can be done with a machine learning method. Researchers are expected to manually specify the types of emotion in a small set of their data, the input dataset, which is further divided into a training and testing dataset. The training dataset teaches the pre-trained language model to learn information from researchers’ datasets. The testing dataset measures the accuracy of sentiment classification of the fine-turned language model on researchers’ datasets.
However, applying a pre-trained language model to a new dataset often encounters failures, such as overfitting, which refers to the fine-tuned langue model learning too much information from the training dataset. The inaccuracy and non-transparency of sentiment classification invite scholars’ concerns.
Therefore, this study aims to provide practical tips to overcome overfitting. This study presents indicators of overfitting. Furthermore, this study proposes that researchers expand input datasets and create balanced training datasets. Imbalanced datasets refer to uneven data in each emotional category in the training dataset. For instance, to distinguish three types of emotions, positive, neutral, and negative, the training dataset is imbalanced when there are 1000 positive comments, 500 neutral, and 5000 negative ones. To address the imbalanced training dataset this study offers two methods to address the imbalanced dataset: expanding the training dataset and bootstrapping.
This study addresses the proper application of sentiment classification by detecting and avoiding overfitting. This study offers explicit guidance to fine-tune a pre-trained language model. In addition to sentiment classification, this study has implications for studies that need to classify texts. Sentiment classification is a sub-category method in text classification. These practical tips can also categorize public opinions and policy documents.

Chair: Zoltán Lippényi

  • Assessing the impact of school-based interventions using synthetic control methods
    Gijs Custers, Erasmus University Rotterdam, School of Law (EUR-ESL); Erik-Jan van Kesteren, Utrecht University – Faculty of Social Sciences (UU-FSW); Oisin Ryan, Utrecht University – Faculty of Social Sciences (UU-FSW)
  • Ethnic diversity and social trust: The role of geographical size and ethnic background
    Mathijs Kros, Utrecht University – Faculty of Social Sciences (UU-FSW); Joris Broere, Netherlands Institute for Social Research (SCP)
  • Dutch municipal social care policies and informal care relations: Linking survey and registry data to analyze the consequences for solidarity and autonomy
    Gita Huijgen, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Tom Emery, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)

Abstracts

Assessing the impact of school-based interventions using synthetic control methods
Gijs Custers, Erasmus University Rotterdam, School of Law (EUR-ESL); Erik-Jan van Kesteren, Utrecht University – Faculty of Social Sciences (UU-FSW); Oisin Ryan, Utrecht University – Faculty of Social Sciences (UU-FSW)
Evaluating policy programs in the educational field can be challenging due to the complex structure of the educational landscape. In this project, we leverage the rich CBS microdata to estimate the effectiveness of an extended school week intervention in Rotterdam South. The intervention is one of the main pillars of the National Program Rotterdam South (NPRZ) and aims to improve test scores and primary school advice in approximately 30 primary schools. To estimate the impact of the intervention, we use synthetic control methods, which allow us to not only estimate the average causal effect of the intervention, but also to investigate the variability of the intervention among schools. We discuss the application of synthetic control methods in the educational field and possible advantages and drawbacks. Finally, we consider the implications of the results for the extended school week intervention.

Ethnic diversity and social trust: The role of geographical size and ethnic background
Mathijs Kros, Utrecht University – Faculty of Social Sciences (UU-FSW); Joris Broere, Netherlands Institute for Social Research (SCP)
One of the most contentious debates in the sociological literature is whether ethnic diversity results in the erosion of social trust. This paper contributes to this debate by considering the role of contact with neighbors and comparing effects across ethnic groups and residential areas of varying sizes – neighborhoods and 50 meter radii. We studied 61,127 people in the Netherlands in over 11,000 neighborhoods, using data from 2012 until 2020. We used register data of Statistics Netherlands to calculate ethnic diversity within 50 meters of someone’s residence, as well as in their neighborhoods. First, we find that the likelihood that Dutch natives trust other people and have contact with their neighbors decreases with ethnic diversity. Second, we find that the negative effect of diversity on contact with neighbors is stronger when diversity is measured within 50 meters of someone’s residence compared to the neighborhood level, and that less contact with neighbors in turn reduces trust. Third, we find that ethnic diversity generally does not affect minorities’ likelihood to trust people or have contact with their neighbors. Yet we find that minorities’ likelihood to trust and have contact with neighbors was generally positively correlated with the proportion co-ethnics in their residential area.

Dutch municipal social care policies and informal care relations: Linking survey and registry data to analyze the consequences for solidarity and autonomy
Gita Huijgen, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Tom Emery, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)
After long-term care reforms in 2007 and 2015, responsibilities for social care provision and informal care support have been delegated to Dutch municipalities. The scope and generosity of municipal social care policies may vary geospatially and over time, resulting in different levels of support for informal caregivers and care recipients. This may in turn affect the degree of (1) solidarity and (2) autonomy exhibited by both caregiver and care recipient, two key elements underlying ambivalence in care relations, which negatively relates to psychological well-being. While earlier research shows that long-term care policies may shape informal care in terms of its prevalence, intensity and impact, the influence on autonomy, especially for informal caregivers, is largely neglected. The case of the Netherlands may therefore serve as a natural experiment which helps to uncover the extent to which municipal social care policies have an influence on solidarity and experienced autonomy within the social context of people in need of long-term care from 2007-2020. Because higher levels of public care and support may be especially beneficial to those who typically provide or receive informal care, we will also examine to what extent gender, poverty and migration background affect the relationship between local care policies on the one hand and solidarity and autonomy on the other. To this end, we will analyze survey data from SHARE linked to administrative data from CBS. The former will be used for measures on experienced autonomy and solidarity, while the latter allows us to identify which respondents receive what social care services, which informal caregivers are compensated via the local social care policy, and who they care for, as well as certain solidarity indicators.

14.30-15.30 – Parallel Session 3

Chair: Elizaveta Sivak

  • Accelerating progress in the social sciences: the potential of benchmarks.
    Paulina Pankowska, Utrecht University – Faculty of Social Sciences (UU-FSW); Adrienne Mendrik, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Tom Emery, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Javier Garcia – Bernardo, Utrecht University – Faculty of Social Sciences (UU-FSW)
  • What’s Next? An infrastructure for Supporting Benchmarks in the Social Sciences.
    Adrienne Mendrik, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Jeroen Vloothuis, Eyra; Neo Cheung, Eyra; Emiel van der Veen, Eyra; Tjerk Nan, Eyra; Rowdy van Looy, Eyra
  • Fertility prediction challenge, Episode I: does survey data beat population registries?
    Gert Stulp, University of Groningen; Elizaveta Sivak, University of Groningen; Malvina Nissim, University of Groningen; Tom Emery, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Javier Garcia-Bernardo, Utrecht University – Faculty of Social Sciences (UU-FSW); Adriënne Mendrik, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Paulina Pankowska, Utrecht University – Faculty of Social Sciences (UU-FSW)

Abstracts

Accelerating progress in the social sciences: the potential of benchmarks
Paulina Pankowska, Utrecht University – Faculty of Social Sciences (UU-FSW); Adrienne Mendrik, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Tom Emery, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Javier Garcia – Bernardo, Utrecht University – Faculty of Social Sciences (UU-FSW)
Social scientists aim to create explanations of the world. For each social phenomena, scientists have proposed a myriad of theories to explain its working mechanisms. Traditionally, these theories are tested by translating them into statistical models and assessing the significance of the model coefficients. This approach however is not free of shortcomings. It can result in the specification of a variety of models that represent competing theories, are based on different statistical techniques, or include different predictors of the social phenomena studied. While equally plausible and well-justified, these models often provide contradictory results and lead to inconsistent findings. Currently, there is no framework that allows for the comparison of these models and the question of which model works better under which circumstances remains unanswered. As a result, it is difficult to evaluate conflicting theories, and monitor progress.We argue that benchmarks can be used as such a standard frame of reference and accelerate progress in the field of social sciences. They have large potential for answering long standing questions in the field and can drive it forward. We define a benchmark as a standardised validation framework that allows for the direct comparison of the prediction accuracy of various models which address the same research problem. The use of benchmarks has led to progress and breakthroughs in many fields including computer and data science, physics, biomedicine, and the humanities. Whilst existing evidence does not allow us to fully comprehend the benefits of benchmarking in our field, we demonstrate its potential through the Fragile Families Challenge, as well as our own pilot benchmark challenge on predicting precarious employment. We then use these experiences to provide recommendations for the utilisation of benchmarks in the social sciences that need to be met to fully realise the potential of benchmarking in the social sciences.

What’s Next? An infrastructure for Supporting Benchmarks in the Social Sciences.
Adrienne Mendrik, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Jeroen Vloothuis, Eyra; Neo Cheung, Eyra; Emiel van der Veen, Eyra; Tjerk Nan, Eyra; Rowdy van Looy, Eyra
Benchmarks are standardized validation frameworks that allow for the direct comparison of various models that address the same research problem. In benchmarks, predefined evaluation criteria (metrics) are used to compare how well different methods can predict an outcome variable (truth criterion) given the predictors (data) at hand. The use of benchmarks has touched many fields of science, including computer and data science, physics, biomedicine, and the humanities. Participants are invited to submit their models to the benchmark, which is commonly referred to as organizing a challenge. Other frequently used terms are shared or common tasks and competitions. Within the ODISSEI benchmarking task, benchmarks were introduced in the social sciences. Funded by ODISSEI and an NWO VIDI grant awarded to Gert Stulp (RUG), Eyra developed an open source benchmark infrastructure starter kit. A fertility prediction challenge that will be organized by Elizaveta Sivak and Gert Stulp will function as a pilot for the infrastructure. A first version of the infrastructure has been released on the Next platform. An open source web platform with re-usable modules developed by Eyra that functions as an integration hub for various web applications that empower science and support the workflow of researchers. This first version was tested during the SICSS – ODISSEI Summer School 2023 with a fertility prediction pilot challenge using LISS panel data provided by Centerdata. During this presentation, we will present the infrastructure and discuss lessons learned from the summer school pilot and plans for future developments.

Fertility prediction challenge, Episode I: does survey data beat population registries?
Elizaveta Sivak, University of Groningen; Malvina Nissim, University of Groningen; Tom Emery, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Javier Garcia-Bernardo, Utrecht University – Faculty of Social Sciences (UU-FSW); Adriënne Mendrik, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Paulina Pankowska, Utrecht University – Faculty of Social Sciences (UU-FSW)
The social sciences uncovered many factors associated with fertility outcomes, but rarely assessed their predictive ability. Benchmarking predictive ability can give us insight into which factors are most important and how well we can explain fertility behaviour, and also drive scientific progress. However, prediction benchmarks in social sciences are still rare. We conducted a pilot fertility prediction benchmark at SICSS-ODISSEI. Seven teams competed to predict having a(nother) child within the next three years (2020-2022) based on data up to and including 2019. The first phase was based on the survey data from the LISS Panel, and the second phase – on administrative data collected by Statistics Netherlands (CBS). For both datasets, predictive ability is low: the best F1 score is 0.59 for LISS and 0.54 for the CBS data. The best models are only able to identify half of positive cases. In the case of LISS, the most important variable is fertility intentions, followed by several other factors related to views and behaviour (division of childcare labor, political views, frequency of participant’s contact with the mother) and several socio-demographic variables (urban place of residence, marriage status, age, dwelling type, cohabitation, having children). This demonstrates importance of parity-progression fertility intentions for predicting timing of children and also likely explains why the accuracy in the case of CBS is low despite the huge sample size: CBS datasets include many theoretically relevant socio-demographic factors but lack direct measures of attitudes, values, and behavior. The pilot’s results show modest predictability of having a new child in the next three years, with theory-identified factors being important but not very predictive. To further test fertility theories, advanced methods such as neural networks and transfer learning should be used that can leverage huge longitudinal datasets and combine strengths of survey data and administrative data.

Chair: Gabriele Mari

  • A data-driven approach shows that individuals’ characteristics are more important than their networks in predicting fertility outcomes
    Gert Stulp, University of Groningen – Faculty of Behavioural and Social Sciences (RUG-FGMW); Elizaveta Sivak, University of Groningen – Faculty of Behavioural and Social Sciences (RUG-FGMW)
  • Childcare in the Netherlands
    Tom Emery, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)
  • The impact of peers on fathers’ labour supply
    Jordy Meekes, Leiden University – Faculty of Law (UL-Law); Max van Lent, Leiden University – Faculty of Law (UL-Law)

A data-driven approach shows that individuals’ characteristics are more important than their networks in predicting fertility outcomes
Gert Stulp, University of Groningen – Faculty of Behavioural and Social Sciences (RUG-FGMW); Elizaveta Sivak, University of Groningen – Faculty of Behavioural and Social Sciences (RUG-FGMW)
Social influences on fertility behaviour are well-established. Individuals learn from, receive support from, and perceive pressure from people in their social network regarding having children. Previous research has focused on identifying specific network characteristics in small networks in relation to fertility. In this study, we take a comprehensive, data-driven approach to assess the impact of various network characteristics on people’s fertility outcomes. We use unique personal network data from Dutch women to predict different fertility outcomes and employ LASSO regression, which can handle the inclusion of multiple variables, prevent overfitting, and leads to sparse models including only the most important variables. Our models were able to explain between 0% and 40% of the out-of-sample variation in the different outcomes we used. Individual characteristics were more important for all outcomes than network variables. Network composition was also important, in particular, people in the network that wanted children and people that wanted to be childfree. Structural network characteristics, based on the relations between people in the networks, hardly mattered. We discuss to what extent our results provide support for different mechanisms of social influence, and the advantages and disadvantages of our data-driven approach in comparison to traditional approaches.

Childcare in the Netherlands
Tom Emery, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB)
Existing research has identified a large socioeconomic gradient in childcare which is not wholly explained by cost and access (Van Lancker and Ghysels 2016). The Netherlands represents an exceptionally interesting case in the development of Childcare policy and the analysis of this socio-economic gradient. Around 45% of children under the age of 3 attend formal childcare in the Netherlands, making it the fifth highest usage behind Denmark, Sweden, Belgium, and Luxembourg. Using detailed administrative data covering 10 years and 3.5 million children, this analysis uses multi-channel sequence analysis to illustrate that overarching childcare strategies differ markedly across the population and that this partially explains the low uptake of private childcare amongst low-income households. These parental employment and childcare sequences covering the first 48 months of a child’s life are then used to create a typology of 12 different childcare strategies. These strategies reveal distinct differences in the evolution of formal childcare usage and maternal employment over the four years and a clear socio-economic gradient. Specifically, the results intimate that when the mother does not have a university degree, childcare strategies either do not involve formal childcare or consist of highly fragile, volatile, and complex childcare arrangements. By contrast, children with higher educated mothers show less volatility in their formal childcare, use fewer formal childcare providers, and utilize formal childcare providers earlier in order to facilitate greater maternal employment. These problems are particularly acute for non-co-resident parents, parents with a migration background, and those working in volatile economic sectors. These findings highlight specific points of failure in the provision of formal childcare for these women, and potential policy remedies to these problems are discussed. The results can help policy-makers better understand how childcare reforms can be seeded in a population and supported through measures to accelerate diffusion.

The impact of peers on fathers’ labour supply
Jordy Meekes, Leiden University – Faculty of Law (UL-Law); Max van Lent, Leiden University – Faculty of Law (UL-Law)
Gender gaps in the Dutch labour market remain persistent. An important explanation for this observation is the slow-moving changes in gender norms and culture. Gender norms and culture are changing through peoples’ interactions with peers. So far, research on peer effects on work hours or leave taking has almost exclusively focused on mothers. New whole-population spatial network data and state-of-the-art research methods enable studying peer effects from neighbours, colleagues and family. Using advanced micro-econometric techniques and Dutch administrative microdata, this project will study how fathers’ labour supply decisions are affected by their male peers upon receiving children.

Chair: Giovanni Cassani

  • Characterizing Online Abuse and Threats to Politicians
    Isabelle van der Vegt, Utrecht University – Faculty of Social Sciences (UU-FSW)
  • Understanding intentions for retirement with computational text analysis
    Arjen De Wit, VU Amsterdam – Faculty of Social Sciences (VU-FSW); Elisabet Doodeman, VU Amsterdam – Faculty of Social Sciences (VU-FSW); John Mohan, University of Birmingham (UK)
  • Detecting Democratic Backsliding in Assessment Reports Using Computational Social Science Tools
    Asya Zhelyazkova, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Clara Egger, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Reggie Cushing, Netherlands eScience Center (NLeSC)

Abstracts

Characterizing Online Abuse and Threats to Politicians
Isabelle van der Vegt, Utrecht University – Faculty of Social Sciences (UU-FSW)
Online abuse and threats directed at politicians have been on the rise in several countries across the world. This trend raises concerns about the safety and wellbeing of politicians and the impact this phenomenon may have on democracy as a whole. The current project computationally analyses a dataset of all tweets directed at political party leaders in the Netherlands throughout the entire year of 2022. Previous research on this topic has primarily been focused on the UK, and has typically examined shorter timeframes. The effect of gender and ethnic minority status were estimated for six different linguistic measures of abuse, namely, toxicity, severe toxicity, identity attacks, profanity, insults, and threats, all obtained via the Google Perspective API. Marked differences between tweets directed at male and female politicians were found, with additional differences based on the ethnic minority status of politicians. In addition to characterizing the abuse and threats received by Dutch politicians, I critically reflect on the use of Twitter data to study this phenomenon (especially in the post-API age), as well as the challenges associated with using out-of-the-box tools such as Google Perspective to measure abusive and threatening language. Suggestions will be given to further improve the study of this worrying phenomenon.

Understanding intentions for retirement with computational text analysis
Arjen De Wit, VU Amsterdam – Faculty of Social Sciences (VU-FSW); Elisabet Doodeman, VU Amsterdam – Faculty of Social Sciences (VU-FSW); John Mohan, University of Birmingham (UK)
Across the social sciences, different computational techniques have been used to classify answers to open-ended survey questions, such as (supervised or unsupervised) LDA, Naive Bayes, Support Vector Machine and BERT. In our project we explore the performance of different computational techniques in analysing an open question in which respondents are asked to imagine their future lives.
The 1958 National Child Development Study (NCDS) in the United Kingdom tracks an original cohort of 12,000 individuals born in one week in 1958. In the 2008 wave of the survey, when respondents were 50, they were asked to write a short free-text response to a question about how they envisioned their life at age 60 (n=7,378).
We are particularly interested in distilling information about the willingness to volunteer. This specific dataset is interesting because respondents were all Boomers approaching retirement. While the Boomer generation has been referred to as a possibly untapped pool of volunteers, it is the question how large the willingness to volunteer actually is and how the importance of volunteering is ranked in relationship to other goals for one’s future life.
In this study we explore different techniques to map future intentions to volunteer based on these open answers. Topic modelling seemed the most suitable technique for our purposes but did not yield meaningful topics. A second option is supervised machine learning. Both Naive Bayes and Support Vector Machine turned out to perform adequately in terms of precision and accuracy. Manual inspection of the output gives additional insights in what the model did and did not pick up. By describing our methodological journey and reflecting on the pros and cons of different choices, we help the field to better understand the potential of computational text analysis in analysing answers to open-ended survey questions.

Detecting Democratic Backsliding in Assessment Reports Using Computational Social Science Tools
Asya Zhelyazkova, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Clara Egger, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Reggie Cushing, Netherlands eScience Center (NLeSC)
The combined influence of COVID-19 emergency measures, toxic polarization and the rise of illiberal democratic regimes have put a halt to democratic advances. Many mature and young democracies have experienced democratic backsliding. Despite initially promising signs of liberalization, countries like Poland and Hungary show authoritarian tendencies. Mature democracies such as France and UK have also recorded loss of democratic quality. In such a context, democratic backsliding has attracted the attention of international agencies or consortia, which regularly assess the quality of democracy in different countries.
Nevertheless, such attempts suffer from subjectivity bias, as they mostly rely on qualitative expert judgments. Yet, we lack a comparative view of the dimensions and quality of democratic assessments. To address this gap, our paper addresses the question: To what degree assessment reports vary in grading countries by traits of democratic quality and over time? We develop and apply computational text analysis tools that map dimensions of democratic quality in texts and assess the precision of democratic assessments. Theoretically, we focus on three well-established dimensions of democracy: “electoral”, “participatory” and “liberal”. We distinguish between country features related to free and fair elections, ‘positive’ political rights contributing to pluralism and ‘negative’ civil rights protecting institutions and individuals from the state. Empirically, we propose a taxonomy of indicators for democratic quality using the individual country reports produced by the European Commission, Freedom House, the Bertelsmann Foundation, and other international agencies. The reports cover all Council of Europe countries between 1999 and 2022. The rich data allows us to apply digital tools to detect the emphasis (i.e. coverage) of democratic quality indicators in different countries and across time. Based on the analysis, we discuss the merits and limits of computational approaches for the study of democracy.

Chair: Peter Lugtig

  • The willingness to donate DNA for science in the Dutch LISS panel
    Richard Karlsson Linnér, Leiden University – Faculty of Law (UL-Law); Manisha Jain University of Wisconsin – Madison
  • You can make a difference – but can you also not make one?
    Marika de Bruijne, Centerdata
  • Assessing Mobile Instant Messenger Networks with Donated Data
    Rense Corten, Utrecht University – Faculty of Social Sciences (UU-FSW); Laura Boeschoten, Utrecht University – Faculty of Social Sciences (UU-FSW); Stein Jongerius, Centerdata; Joris Mulder, Centerdata; Bella Struminskaya; Thijs Carrière; Adriënne Mendrik

Abstracts

The willingness to donate DNA for science in the Dutch LISS panel
Richard Karlsson Linnér, Leiden University – Faculty of Law (UL-Law); Manisha Jain University of Wisconsin – Madison
The accumulation of large genetic data is crucial for the scientific advancement of genetic testing and precision medicine. However, studies show that various participation biases threaten the validity of
genetic research. To better understand the decision to participate and its relationship with economic preferences and socioeconomic characteristics, we studied the stated willingness to donate DNA for
science among 5,366 members of the Dutch LISS panel. There were two randomized conditions, varying (i) the information on benefits and risks, and (ii) the intended financial incentive. The first condition had little effect, suggesting insensitivity to the information material. Proposing a higher incentive had a significant but modest effect, suggesting that offering higher incentives is not cost-effective. Reasons not to donate DNA were concentrated on personal risks, e.g., privacy violations and data exploitation. Accordingly, stated risk willingness was strongly associated, followed by trust and positive reciprocity. Revealed economic preferences were not associated. The study replicated findings for general health and confidence in science or societal institutions; found conflicting evidence for education and religiosity; and failed to replicate findings for age, sex, and ethnicity. We conclude by proposing strategies to encourage participation, e.g., to reallocate resources from incentives to risk-minimizing or compensatory measures.

You can make a difference – but can you also not make one?
Marika de Bruijne, Centerdata
When the goal is unity, how do you avoid creating differences? Aiming for a unified mode (unimode) design in mixed-mode surveys is a well-known strategy, yet there is limited specific guidance on implementing surveys that adhere to this principle. For large European social surveys such as ESS (European Social Survey) and SHARE (Survey of Health, Ageing and Retirement in Europe), we have developed an infrastructure to facilitate the development of multi-language mixed-mode surveys.
During this presentation, we will discuss the development of mixed-mode web and paper-and-pencil surveys. While these two self-administered survey modes share similarities, they also have fundamental differences that can lead to measurement error if overlooked. Focusing on the questionnaire development process, we will look at question wording, response formats, visual design, functional features specific to web surveys, multi-device web versus paper as user interface, and data entry. We will describe the trade-offs between different survey design strategies and demonstrate how a controlled process aids the identification of unintended differences and enforces decision making. Our findings will help you to unscramble the multitude of design choices and to better manage the questionnaire development process.

Assessing Mobile Instant Messenger Networks with Donated Data
Rense Corten, Utrecht University – Faculty of Social Sciences (UU-FSW); Laura Boeschoten, Utrecht University – Faculty of Social Sciences (UU-FSW); Stein Jongerius, Centerdata; Joris Mulder, Centerdata; Bella Struminskaya; Thijs Carrière; Adriënne Mendrik
Social media play an increasingly important role in society, as related to the diffusion of (dis)information, polarization, civic and political participation, well-being, and social cohesion. However, the vast majority of the research on social media focuses on the “traditional” social media platforms such as Twitter and Facebook but largely ignores mobile instant messenger services (MIMSs) such as WhatsApp, Signal and Telegram, even though these meanwhile rival social media platforms in popularity. A key reason for the scarcity of research is that in contrast to traditional social media platforms, MIMSs are hardly accessible to researchers since they typically lack a public web interface and any centrally collected data are proprietary. As a result, the scarce empirical research on MIMSs relies on specific publicly accessible WhatsApp groups or on surveys among small convenience samples. Consequently, this research provides little insight in the overall network topology of instant messenger networks: we lack knowledge about even the most basic topological features of the societal-scale instant messenger network.
While MIMS data are not directly accessible to researchers, the EU General Data Protection Regulation grants users themselves the right to electronic copies of their data. Taking advantage of this fact, we employ the innovative approach of data donation respondents of a high-quality panel in the Netherlands to collect user data while preserving their privacy. Focusing on WhatsApp as the most popular MIMS, this study collects the first measurement of MIMS usage on a nationally probability sample.
We report first results for data collected among respondents of the LISS panel in early 2023. We describe core features of the network topology, the group structure, and socioeconomic predictors of use patterns. Furthermore, we assess the method in terms of selectivity and validity.

Chair: Angelica Maria Maineri

  • A survey of FAIR implementation practice in the Dutch SSH communities
    Angelica Maria Maineri, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Shuai Wang, VU Amsterdam – Faculty of Science (VU-Science); Navroop Singh, VU Amsterdam – Faculty of Science (VU-Science); Elena Beretta, VU Amsterdam – Faculty of Science (VU-Science); Tobias Kuhn, VU Amsterdam – Faculty of Science (VU-Science); Ronald Siebes, VU Amsterdam – Faculty of Science (VU-Science)
  • Developing a Linked Open Data “Concept Registry” for Granular Data Documentation in the Social Sciences
    Pascal Siegers, GESIS Leibniz-Institute for Social Sciences; Antonia May, GESIS Leibniz-Institute for the Social Sciences; Dagmar Kern, GESIS Leibniz-Institute for the Social Sciences; Claudia Saalbach, German Socio-economic Panel; Jana Nebelin, German Socio-economic Panel; Andreas Daniel, German Centre for Higher Education Research and Science Studies; Ben Zapilko, GESIS Leibniz-Institute for the Social Sciences; Fakhri Momeni, GESIS Leibniz-Institute for the Social Sciences; Knut Wenzig, German Socio-economic Panel
  • Semantic search in data repositories
    Vyacheslav Tykhonov, DANS-KNAW; Jacco van Ossenbrugge, VU; Ronald Siebes, VU; Fjodor van Rijsselberg, DANS-KNAW; Thomas van Erven, DANS-KNAW; Wim Hugo, DANS-KNAW; Andrea Scharnhorst, DANS-KNAW

Abstracts

A survey of FAIR implementation practice in the Dutch SSH communities
Angelica Maria Maineri, Erasmus University Rotterdam, School of Social and Behavioural Sciences (EUR-ESSB); Shuai Wang, VU Amsterdam – Faculty of Science (VU-Science); Navroop Singh, VU Amsterdam – Faculty of Science (VU-Science); Elena Beretta, VU Amsterdam – Faculty of Science (VU-Science); Tobias Kuhn, VU Amsterdam – Faculty of Science (VU-Science); Ronald Siebes, VU Amsterdam – Faculty of Science (VU-Science)
The FAIR Expertise Hub for the Social Sciences has been established to support data communities in the social sciences with improving their compliance with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability). Despite a wide endorsement of the principles, FAIR implementation has been heterogeneous across research communities. The SSH communities, in particular, encounter challenges related to a) findability and accessibility with highly sensitive data; b) lack of semantic resources to make data interoperable; c) lack of machine actionable resources supporting reusability. Over the past months, the FAIR Expertise Hub has supported some SSH data communities at defining their FAIR Implementation Profile (FIP), or a collection of choices regarding the application of the FAIR principles. In the presentation, we will review the outcomes of the work done on the FIPs by presenting the preliminary results on the state of FAIR in the Dutch SSH communities, and discuss the challenges and bottlenecks experienced in the process as well as the plans ahead.

Developing a Linked Open Data “Concept Registry” for Granular Data Documentation in the Social Sciences
Pascal Siegers, GESIS Leibniz-Institute for Social Sciences; Antonia May, GESIS Leibniz-Institute for the Social Sciences; Dagmar Kern, GESIS Leibniz-Institute for the Social Sciences; Claudia Saalbach, German Socio-economic Panel; Jana Nebelin, German Socio-economic Panel; Andreas Daniel, German Centre for Higher Education Research and Science Studies; Ben Zapilko, GESIS Leibniz-Institute for the Social Sciences; Fakhri Momeni, GESIS Leibniz-Institute for the Social Sciences; Knut Wenzig, German Socio-economic Panel
Reusing research data is an integral part of research practice in the social and economic sci-ences. To find relevant data, researchers need adequate search facilities. However, a com-prehensive, thematic search for research data is made more difficult by inconsistent or missing semantic indexing of data at the level of social science concepts (e.g., representing the theory language). Either data are not documented at a granular level, or primary investigators use their ad-hoc terminology to describe their data. From the users’ perspective, the lack of theory language in data documentation impedes effective data searches and thus significantly limits the research potential of existing data collections. Because there is currently no semantic model for indexing the semantic content of data in the social sciences, developing a concept registry for measurement-level documentation of re-search data will improve the data’s findability and interoperability. This will also support re-search infrastructures to harmonize their indexing practices. We present results from a pilot study testing the core components of a social science concept registry. First, we developed a data model for the Concept Registry using United Modeling Language (UML). All links between are created and managed in the form of so-called RDF triples. Second, we developed an annotation application for indexing specific ques-tions/variables with social science concepts. The two SKOS-compliant thesauri, “Thesaurus Social Sciences” (TheSoz) and “Standard Thesaurus Economics” (STW) are integrated into the annotation application that could be extended to other resources like ELSST.
Third, we illustrate the empirical application of the concept registry with examples from three large-scale survey programmes (German Socio-Economic Panel, German General Social Survey, National Academics Panel Study). The initial focus is on variables and questions with overlapping content in the three survey programmes, as they form a sound basis for cross-linking with concepts.

Semantic search in data repositories
Vyacheslav Tykhonov, DANS-KNAW; Jacco van Ossenbrugge, VU; Ronald Siebes, VU; Fjodor van Rijsselberg, DANS-KNAW; Thomas van Erven, DANS-KNAW; Wim Hugo, DANS-KNAW; Andrea Scharnhorst, DANS-KNAW
This paper reports about current research about the semantic enrichment of data and the implementation of enriched metadata fields into search possibilities. Different data sources when brought together in a portal – as in the case of the Odissei Portal – come with their own metadata standards, and sometimes also datamodels. Obviously, data/metadata harmonisation is a central challenge. Thus, not surprisingly in making the Odissei portal a lot of different data stewards and data scientists work on harmonisation pathways. This paper presents one facet of this work, executed at DANS and demonstrated on the Dataverse repository platform which the Odissei portal is built on. The core of the work concerns automatically supported enrichment of metadata (fields) based on datamining of the data itself and linking to external controlled vocabularies. Semantic web technology provides the tools. We demonstrate first findings and discuss further engineering challenges.

David Lazer is University Distinguished Professor of Political Science and Computer Sciences, Northeastern University, and Co-Director of NULab for Texts, Maps, and Networks. His research focuses on the nexus of network science, computational social science, and collaborative intelligence.

The title of the presentation: ‘Models for online behavioral research’.

Room: Progress

Chair: Tom Emery

Abstract

The study of the social dimensions of the internet is at an acute crisis point. There has been a general rolling back of data access to studying what happens online, most dramatically recently with the elimination of API-based access to Twitter (now X). This talk will outline various paradigms for empirical research of studying online behaviors of people and platforms, as the field transitions from its reliance on Twitter data to an uncertain future, with a particular focus on the potential for creating an infrastructure utilizing user-sourced data on human and platform behavior.

Early-career Researcher Session is a room for aspiring academics at the beginning of their career to present their projects. This session allows them to demonstrate their innovative research questions and ideas that have the potential to shape the future of their field.

  • Neighborhoods embeddings using Street-Level Images and Human similarity perceptions
    Francisco Garrido-Valenzuela, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM); Oded Cats, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM); Sander van Cranenburgh, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM)
  • Housing crisis and mental health in the Northern Netherlands
    Agata Troost, University of Groningen – Faculty of Spatial Sciences (RUG-FRW)
  • Networks in the market for researchers
    Flavio Hafner, Netherlands eScience Center (NLeSC); Christoph Hedtrich, Uppsala University
  • The Sound of Disgust: Differentiating Disgust Vocalisations through Computational Approaches
    Roza Kamiloglu, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); Christiaan Meijer, Netherlands eScience Center (NLeSC); Disa Sauter, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Joshua Tybur, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB)

Abstracts

Neighborhoods embeddings using Street-Level Images and Human similarity perceptions
Francisco Garrido-Valenzuela, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM); Oded Cats, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM); Sander van Cranenburgh, TU Delft – Faculty of Technology, Policy and Management (TUD-TPM)
In the era of rapid urbanization and complex urban challenges, such as social inequalities and resilience, gaining a deeper understanding of urban spaces is crucial for urban planners and policymakers to maintain or improve citizens’ well-being standards. Having an accurate and nuanced understanding of the wide variety of urban spaces facilitates effective neighborhood planning, prioritizing interventions, and resource allocation. However, traditional methods to characterize neighborhoods often fail to capture the multidimensional nature and intricate relationships defining urban spaces. In particular, they do not incorporate human feeling and perceptions into their characterizations, which provide valuable insights into people’s experiences and interactions with urban environments. In this way, human perceptions support the design of inclusive, people-centered urban interventions, enhances livability, and promotes sustainable development. To address this gap, we propose a method to create urban embeddings which incorporates urban human-perceptions. Our method involves three steps. First, the city is divided into small spatial units, each representing a neighborhood with a collection of street-level images (SLI). Second, we conduct a survey using SLI in which participants must compare triplets of neighborhoods (five SLI as a neighborhood) and choose the odd-one-out. Images are drawn from millions geotagged-SLI database we have collected from across the Netherlands. Third, we generate urban vectors using a deep-embedding model. The resulting embeddings do not only encompass observable neighborhood features like building types, street furniture, and transportation infrastructure but also encompass (physically unobservable) human perceptions of neighborhoods. We (preliminarily) find context-aware representations that encompass the spatial, visual, and socio-economic complexities of neighborhoods. It is able to differentiate various sorts of residential, commercial, and rural neighborhoods. By providing insights into urban dynamics, uncovering hidden patterns, we believe our method can support urban planners and city policymakers with evidence-based decision-making to effectively address the complex challenges that cities face today.

Housing crisis and mental health in the Northern Netherlands
Agata Troost, University of Groningen – Faculty of Spatial Sciences (RUG-FRW)
There has aready been research on spatially concetrated deprivation affecting individual mental health (see eg. Visser et al, 2021, for an overview), yet little is known about the specific influence of the lack of access to affordable housing in Northern Netherlands. This research combines the large-scale administrative datasets of the Statistics Netherlands (CBS) Microdata with detailed questionnaires and measurements of mental health and well-being from the longitudinal Lifelines study (https://www.lifelines.nl/). We start with investigating which areas of Netherlands are most affected by housing crisis, or in other words, where in the Netherlands do we encounter the biggest scarcity of affordable housing. We employ various measurements of that scarcity and in doing so consider different predictors of housing (in)affordability, at different spatial scales. For creating the “bespoke neighbourhoods”, or areas of investigation other than the administrative programme units, we use GIS and other programmes such as Equipop (https://www.equipop.kultgeog.uu.se/?languageId=1).
Once we identify areas struggling with the biggest scarcity of affordable housing, considering the needs of the local population, we study the influence of housing scarcity and housing insecurity on mental health, taking into account financial insecurity. In our analysis we use longitudinal models which, in addition to modelling the influence of spatial context (competitiveness of the housing market), control for the household context and individual health history. We also theoretically explore why people settle in competitive housing market areas (eg. because of educational or work-related reasons).

Networks in the market for researchers
Flavio Hafner, Netherlands eScience Center (NLeSC); Christoph Hedtrich, Uppsala University
We study the role of networks in the labor market for young scientists in the United States. Nearly one in five PhD graduates that publish after PhD graduation do so at a university where their advisor has a former co-author; graduates that have such a connection are twice as likely to match with the university, even within fine- grained peer groups. We document a citation premium of 10 to 30% for graduates placed through the advisor’s network. Further analysis indicates that network placements are associated with private benefits on both sides of the market, with no evidence for aggregate productivity effects.

The Sound of Disgust: Differentiating Disgust Vocalisations through Computational Approaches
Roza Kamiloglu, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB); Christiaan Meijer, Netherlands eScience Center (NLeSC); Disa Sauter, University of Amsterdam – Faculty of Social and Behavioural Sciences (UvA-FMG); Joshua Tybur, VU Amsterdam – Faculty of Behavioural and Movement Sciences (VU-FGB)
Ew! Yuck! Ugh! These are vocalisations that often accompany disgust, which serves the crucial evolutionary function of motivating avoidance of the pathogens that cause infectious diseases (pathogen disgust). It can be elicited by a variety of sensory stimuli, especially those associated with smells and tastes that could indicate the presence of pathogens. For instance, we might experience pathogen disgust when we smell spoiled food or walk into a dirty public restroom. We might, however, also experience other forms of disgust, such as repugnance of violations of social norms or moral standards. Examples include watching a news report about a company exploiting its workers or witnessing dishonesty like lying or cheating might elicit disgust (moral disgust). Here, we investigate whether the vocalisations accompanying pathogen and moral disgust are acoustically distinguishable, just as their functions. We employed computational methods that allow us to capture pathogen and moral disgust vocalisations using rich audio samples from daily life. The acoustic dataset consisted of 75,504 data points: 88 acoustic features extracted from each of 858 disgust vocalisations compiled from YouTube. Six machine classifiers with 5-Fold cross-validation tested whether the two types of vocalisations can be classified based on their acoustic structure. The AUC (Area Under the Curve) with a value of 0.68, a measure of how well logistic regression model performed, indicated better-than-random classification (0.50) performance. Moreover, a pre-registered perception study demonstrated that naïve listeners (n = 200) are able to infer at better-than-random guessing levels whether the vocalising person was experiencing pathogen or moral disgust. Our approach here shows that pathogen and moral disgust vocalisations are acoustically distinct, and listeners are sensitive to these differences. We demonstrate that applying the tools of machine learning to rich body of audio data can reveal systematicity in complex and high-dimensional behavioural domains like human vocal expressions.


Photo of ODISSEI Conference 2022 by MG Fotografie / Michel Groen