Understanding and addressing the social impact of COVID-19 through Open Data Infrastructure

11th March 202114:00 (CET, Paris Time)

On March 11th, the OECD, in partnership with the Dutch Open Data Infrastructure for Social Science and Economic Innovations (ODISSEI), organised a webinar presenting several outstanding examples of innovative uses of social data during the Covid-19 pandemic and discussing which barriers for data access and use became apparent during the crisis and which obstacles were removed.

The first session on Innovative data analysis during the Covid-19 pandemic was moderated by Monika Queisser, Head of the Social Policy Division at the OECD Directorate for Employment, Labour and Social Affairs. The presenters were Hans-Martin von Gaudecker, Professor for Applied Microeconomics at the University of Bonn in Germany; Serina Chang, PhD student in Computer Science at the University of Stanford in the United States; and Isaac Delestre, Research Economist at the Institute for Fiscal Studies in the UK.

Hans-Martin von Gaudecker presented his research on labour supply effects during the initial Covid-19 lockdown in the Netherlands. The research made use of the Dutch LISS panel, a probabilistic online panel running since 2007 whose advantages include the possibility of following individuals in the long-run and the ability to link about 85 percent of the respondents to Dutch administrative microdata. A broad set of questions relevant to the pandemic were asked in six waves since March 2020. The research showed relatively small reductions in the number of employed While the working hours in the Netherlands fell substantially in March 2020, they bounced back close to the pre-lockdown level quickly by the summer; suggesting that the Dutch support programmes worked well. There were also strong reductions in mental health scores in March, but these scores were back to normal levels by May. However, there were differential effects within families, depending on which parent buffered the shock. Men who suddenly become the main caregiver suffered the strongest decline in mental health, though their score did not fall below that of women.

Serina Chang’s new mobility network model aimed to explain inequities during the Covid pandemic and implications to inform reopening of the economy. As people rapidly and unpredictably change their behaviour, answers to questions such as when schools should be reopened need to be adapted over time, and thus there is a need for granular real-time data. Serina Chang and her co-authors relied on anonymized cell phone data from SafeGraph in the United States to capture human contact with a lag of about one week. The data covers various points of interest (restaurants, grocery stores, churches, etc.), as well as neighbourhood blocks on the Census level. A machine-learning algorithm then analysed how many people of each neighbourhood block visited each point of interest and who got infected where during each hour during the first wave of the Covid-19 pandemic.  A first use case considered the re-opening of points of interest under occupancy caps; it showed that a small reduction in visits can lead to strong reduction in infections. A second use case on disparities across neighbourhoods showed that infection disparities may arise from differences in mobility alone; for example, people in lower income neighbourhoods were able to reduce their mobility less than others and were thus exposed to higher risks of infections.

Isaac Delestre presented research on the effects of Covid-19 on personal finances, using evidence from anonymised bank account data in the United Kingdom sourced from a mobile app called Money Dashboard (MDB). The anonymised transaction level dataset with a daily frequency links users’ bank accounts and automatically categorised expenses and incomes; it also includes basic demographics. Even though the sample is skewed towards a younger and more southern population than the overall UK population, it matches well to the income distribution after reweighting on the basic demographics. A particular use case of this data was studying the effects of the self-employment income support scheme (SEISS), which granted applicants 80% of their pre-pandemic profits, capped quarterly at GBP 7 500.  The data showed that the lockdown induced a clear income drop in March 2020, followed by a substantial positive spike as SEISS was paid out for the first time in May 2020. Mortgage payments and consumer spending (excluding household bills and financial products) initially decreased, but in particular spending later recovered.

Given the different types of data used by the three presenters, the data access or use barriers they encountered were also different. Hans-Martin von Gaudecker noted that the time lag for access to administrative data in the Netherlands of six to twelve months was still important. With real time admin data, it would be possible to better analyse shocks like the Covid-19 pandemic. Both Serina Chang and Isaac Delestre highlighted that there are a lot of real-time, fine grained, and scalable data available which are useful for policy insights. But Serina Chang also mentioned that standards and regulations relating to these data are missing. She sees a clear need for better legislation around data security as well as better privacy and transparency.

The panel discussion on Opportunities seized and missed in leveraging data infrastructures for policy-relevant insights during the Covid-19 pandemic was led by Tapani Piha, an advisor to the Finnish Ministry of Social Affairs and Health on digital health and the use of health data. The panellists were Pearl Dykstra, Director of the Open Data Infrastructure for Social Science and Economic Innovations in the Netherlands; Kamel Gadouche, Director of the French Centre for Secure Remote Access to Data, CASD; Amy O’Hara, Executive Director of the Georgetown Federal Statistical Research Data Center and Ben Goldacre, Director of the DataLab at Oxford University and OpenSafely.

The panellists highlighted different types of access difficulties both related and unrelated to the pandemic. Amy O’Hara explained that since access to administrative microdata in the United States is provided through a network of 31 physical locations and the number of individuals that were allowed on the premises were severely reduced (for example to 25% of usual working spaces in the case of Georgetown), fewer researchers were able to work with the data. This highlighting that the lack of virtual data access was a significant burden during the pandemic. Pearl Dykstra explained that in order to create ODISSEI, they had to convince the Dutch academic community that data are a shared good and shared responsibility. ODISSEI provides data from Statistics Netherlands, links the data to surveys, and provide support for the LISS panel. Throughout the Covid-19 pandemic, were able to provide very quick access to the LISS panel; but public health data are currently not available to them. Creating this access likely requires outreach to ministries and municipalities. Kamel Gadouche explained that a persistant problem with regular administative data is the time lag between collection and release; and that as of March 2021, there are still many researchers waiting for data covering the time before and during the pandemic. Ben Goldacre and Pearl Dykstra, however, cautioned that the need for speed should not take precedence over data security; and that many research questions in fact neither require real-time nor highly disaggregated data.

The panellists also pointed out ways in which the pandemic has pushed forward their data infrastructure projects. Ben Goldacre explained that prior to the pandemic, researcher access to health data was difficult. As a result of the the pandemic-induced sense of urgency, legal changes and pro-bono work, they were able to build a fully open source platform covering 55 million people in real time (OpenSafely). Since the OpenSafely platform does not let people download data, the analytics applications are embedded on the data platform; and every use of the data leaves a re-usable trace. The initial pushback on data access lead to strong privacy and transparency levels on the platform. Kamel Gadouche explained that the coverage and access to social data has been improved.

Turning to the outlook for the future, the panelists referred to different investment needs in order to build public trust regarding data safety. Amy O’Hara explained that database systems in the United States needed to be modernised, requiring politically challenging negotiations and substantial investments. Ben Goldacre championed open codes in combination with secured data; while Kamel Gadouche cautioned that sometimes there can be a trade-off between the speed of data availability and data quality. Pearl stressed that government should make more data available and researchers engage in more in policy-relevant research partnerships. 

However, questions raised from the audience included what post-pandemic priorities for governments should be in terms of  the real time data points that it should seek to promote and regulate, and what governments should prioritize in the interest of “catastrophe” readiness; whether a by-product of the pandemic a broader appreciation within the research and policy making community of the rich tapestry of commercial and public data sources; and whether the recent increase in online social science research panels was in some way coordinated in order to avoid duplications.

Banner Photo by Chris Liverani on Unsplash