Pilot with a Trusted Third Party procedure in the ODISSEI Secure Supercomputer (OSSC)

18 June 2024

Written by Lucas van der Meer (CTO, ODISSEI) and Evgeniia Krichever (Communications Manager, ODISSEI)

What is the OSSC?

The ODISSEI Secure Supercomputer (OSSC) consists of an enclave of Statistics Netherlands (CBS) within the domain of SURF. This virtual IT environment offers a high performance computing environment that meets the legal, technical and security requirements of CBS. 

The OSSC has been developed with security being the hallmark of the system. Each research project has its own computing environment, which is strictly separated from other projects’ environments. Data cannot leave the environment, except for the back-and-forth exchange between CBS and SURF. Approved researchers can only access the OSSC through CBS’ existing analysis environment (the CBS Remote Access environment; RA). This way, CBS maintains full control over the OSSC environment, as mandated by the CBS law. 

Need for evaluation

Since the inception of the OSSC five years ago, changes have occurred in the way the OSSC is operated. The working arrangements between CBS, SURF and the researchers have matured to cater researchers better. In addition, the processing agreements at both CBS and SURF were updated following General Data Protection Regulation jurisprudence. The supercomputer Cartesius was replaced by Snellius. Finally, the need for a so-called Trusted Third Party procedure with the OSSC has become apparent. All this resulted in an updated legal framework for the OSSC. 

What’s new in the OSSC’s legal framework?

Most crucially, detailed and precise procedures for CBS and SURF jointly operating the OSSC are now formalised. These procedures are an integral part of the OSSC to ensure that CBS adheres to the CBS law, while allowing researchers to use SURF computing facilities on CBS microdata. In addition, the Service Level Agreements between SURF and CBS were defined. The system’s technical architecture and the security measures taken by both parties are now part of the agreement, which include an end-to-end VPN from CBS to OSSC, sandboxed separate InfiniBand partitions, filesystem audit logging, Linux system logging, and external penetration testing. And importantly, the legal framework now includes the possibility of operating a so-called Trusted Third Party procedure. 

The Trusted Third Party

The concept of a Trusted Third Party (TTP) is used frequently in cryptography when two parties want to interact with each other but do not want to disclose specific sensitive information. A third party, that is trusted by the original two parties, will facilitate the interaction. 

In the context of the OSSC, the TTP procedure facilitates researchers who want to combine their research data with CBS microdata but cannot send the research data to CBS directly. This is the case with some sensitive personal data, such as genetic data. This data is treated as strictly confidential and is generally not shared with third parties, especially in combination with identifying information. A second reason lies around data being too large to be effectively processed by CBS’ own systems.

In a TTP procedure as part of the OSSC, SURF can now act as a party that is trusted by both CBS and the researcher to operate a predefined set of data processing steps, facilitating the linking of data. 

So how does the TTP work? 

Imagine a complex research project using CBS microdata linked to genetic data from the Netherlands Twin Register (NTR). In this project, the three parties that are involved in the process are NTR as a researcher, CBS as a microdata provider, and SURF as a Trusted Third Party. 

CBS allows authorised researchers to access sensitive CBS microdata under strict conditions in a trusted research environment, as stipulated in its access conditions. In the default situation, NTR would send its data over to CBS for linkage with CBS microdata. Although NTR has formal consent from its participants to do so, NTR made agreements with them not to share genetic data with parties that could identify the subjects of the study, such as with CBS. This is the moment where the TTP comes into play: the linkage data is not shared with CBS but with SURF, which follows a carefully formulated procedure to link the datasets to each other, without accessing both parties’ sensitive data and without CBS having access to NTR’s sensitive data. 

This is done by separating CBS’ and NTR’s sensitive data from the data’s identifiers, sharing only these identifiers between CBS, NTR and SURF in a specific order, after which the identifiers are linked by SURF. Finally, the three parties send the processed identifiers and datasets to the OSSC for future analysis by the researcher (NTR in this case).

It is important to note that SURF does not become the data controller of sensitive data. In this way, all rules (the conditions by all three parties) are met: CBS only shares the sensitive data with NTR and not with SURF, NTR does not share its own data with CBS, and SURF does not have access to the sensitive data of both CBS and NTR.

The farmer, wolf, goat and cabbage

Some of this may be made clearer through an analogy with the river crossing puzzle. A farmer is on his way to the market with a wolf, a goat and a cabbage with him. He comes across a river. Luckily for the farmer, there is a boat, as he cannot swim. However, the boat can only hold one other passenger besides him. The farmer cannot leave the goat alone with the wolf (then the wolf eats the goat) but neither can the cabbage be left alone with the goat (then the goat eats the cabbage). The task is to devise a plan that meets these conditions and succeeds in as few crossings as possible.

This situation is somewhat comparable to the research case described above: the farmer, wolf, goat and cabbage all have specific rules for operating with one another, just like CBS, SURF and the researcher have. They have a mutual interest. By executing the steps in a specific order, the interest can be obtained whilst following all rules. 

To solve the problem, the man must first cross the river with the goat, leave the goat on the other bank and return alone. He then takes the wolf across, leaves it there and returns with the goat. He leaves the wolf on the exit bank, crosses over with the cabbage and returns alone. Finally, he brings the goat to the intended bank for the second time, which finally solves the problem. (An alternative solution is to swap wolf and cabbage in the above order.)

Pilot

A pilot of this TTP procedure will be carried out in the coming months by the Netherlands Twin Register (NTR). In the pilot, CBS, SURF, ODISSEI and NTR are testing whether the procedure can be executed error-free while adhering to all security measures. Moreover, the parties are investigating whether adjustments are needed to have the procedure available by default to certain OSSC projects. The pilot results are expected after the summer, followed by the blog post detailing the procedure and results.

About the partners

  • CBS
    Statistics Netherlands (CBS) collects a wide range of data for its statistical tasks, many of them microdata at the level of individual persons or organisations. Protection of the confidentiality of the data has the highest priority for CBS. 
  • SURF
    A cooperative of Dutch education and research institutions dedicated to enhancing digital services and fostering knowledge sharing through innovation
  • ODISSEI
    The national research infrastructure for the social sciences in the Netherlands, facilitating groundbreaking research through data, expertise, and resources.
  • NTR
    The Netherlands Twin Register is a national register in which twins, multiples and their parents, siblings, spouses and other family members participate. The research carried out by the Netherlands Twin Register is aimed at the role of heritability in mental and physical health.