Automated Data Linkage

A promising, though high-risk area, in the field of computer science is automated data linkage. This has the potential of strongly reducing the time needed by researchers to manually link datasets while performing analysis. Linking datasets is one of ODISSEI’s core aims. The social data within ODISSEI is highly structured and persistent identifiers and standardised codes are used pervasively. However, the degree of automated linkage is restricted by the lack of infrastructure that exists for such linkage. Computer science techniques are advancing through the use of deep learning methods which help identify and link more opaque constructs within data that are less well defined and structured such as families, social networks, neighbourhoods or even cultural groups. The potential of such techniques in the social sciences is considerable but these approaches require high levels of expertise. In this subtask, these high risk/high reward approaches will be explored and examined to see if they can complement the more functional and established manual and semi-automated linkages which will be made in the development of the ODISSEI Portal. Work will start with the construction of a knowledge graph of ODISSEI data collections and data sources in order to assess the scope and potential for automated linkage.

Project team Distributed analysis techniques: Jacco van Ossenbruggen (VU Amsterdam – Task leader).

Questions regarding Automated data linkage? Contact Lucas van der Meer (ODISSEI Coordination Team).