Secure ANalysis Environment (SANE)

What is SANE?

The Secure ANalysis Environment (SANE) is a virtual computer that is fully sealed off from the outside world, housing pre-approved analysis software (like R and Jupyter notebooks), along with the possibility to authorise access to sensitive data.

This setup empowers the data provider to retain absolute control while enabling researchers to conveniently study the data. SANE, a collaborative effort by ODISSEI, CLARIAH, and SURF, provides a platform for researchers across disciplines to analyse sensitive data. 

Running on the ISO 27001-certified SURF Research Cloud platform and being thoroughly tested with an independent penetration test, SANE addresses the challenge of data providers hesitating to share sensitive datasets with researchers. With SANE, data providers remain in full control of their data, as stipulated by the General Data Protection Regulation (GDPR).

Purpose

The lack of solutions for researchers to safely analyze sensitive data while letting data providers stay in charge limits the use of non-academic datasets. SANE tackles this by following the Five Safes principles. It ensures that researchers are only able to export the output created when analysing sensitive data (after verification by the data provider), but never the data itself. Under the strictest conditions, if the data provider chooses so, researchers are not able to screen the data when working with it (for details see below).  

Currently, the infrastructure to safely provide access to data is missing. Consequently, many potential data providers, including governments, heritage institutes, and commercial entities, are hesitant to disclose their datasets, leading to valuable datasets remaining unused, despite the potential for academic breakthroughs if they were made accessible.

Difference between Tinker and Blind SANE

SANE comes in two versions: Tinker SANE and Blind SANE. 

Tinker SANE

In Tinker SANE, the researcher gets to see, experiment with, and manipulate the data. The “tinker” variant is most appropriate when the researcher combines several different data sources and where specific characteristics of the combined data determine consequent analytical steps.

Blind SANE

In Blind SANE the researcher submits an algorithm, and the data provider prevents the researcher from seeing the data. This is typical in situations with copyright barriers. The “blind” variant can be used for large datasets of which the data structure is known to the researcher, such as historical newspapers at the National Library (KB) or historical TV broadcasts at Netherlands Institute for Sound and Vision (B&G).

Why SANE?

The advantages of using SANE 

Facilitating data access

SANE broadens access to confidential datasets for researchers by providing tools that help data providers minimise the risk of confidentiality breaches. Data providers remain in full control of their data, as stipulated by the General Data Protection Regulation (GDPR). 

Extending existing datasets

With SANE offering a secure environment for data analysis and an extra layer of data protection, data providers are able to share sensitive data for research purposes without risks of disclosure. This allows researchers to use richer datasets for their projects.

Well-known analysis tools

Through SANE, the researcher can use analysis tools that he or she is used to. Tinker SANE is a standard Windows-machine and both Tinker and Blind SANE offer a wide variety of analysis tools, with Rstudio and Jupyter Notebooks being pre-installed. 

Generic solution

SANE establishes standard specifications for developers at ODISSEI and CLARIAH, enabling them to create customisable analytical tools compatible with a variety of data providers, eliminating the need for individual adjustments. This makes SANE a valuable, future-proof option independent of current institutes.

High security standards

SANE is a specific configuration of SRAM (SURF Research Access Management) and SRC (SURF Research Cloud) and passed an independent penetration test by a specialised company in November 2023. This makes SANE part of SRAM and SRC’s ISO 27001 certification. ISO 27001 is an international standard to measure information security. SURF performs periodic internal audits and assessments to ensure continuous improvements and adjustments of the Information Security Management System.

Scalable cloud infrastructure

As SANE has a cloud-based infrastructure, it scales almost infinitely. Currently running on SURF HPC Cloud, you’re just a few clicks away from using a machine with 64 GB RAM or an A10 GPU.  In the future, SANE can run on any cloud provider, including Microsoft Azure and Amazon Web Services (AWS). SANE will be able to run on-premise at the data provider as well.

Use Case

The ‘FIRMBACKBONE’ project is an initiative of Utrecht University (UU) and the Vrije Universiteit Amsterdam (VU Amsterdam).  In this project, the sensitive dataset from KvK – the Chamber of Commerce in the Netherlands is used and enriched with unstructured open data.

How to set up SANE?

Setting up SANE, a Secure Analysis Environment, typically takes about 30 minutes and involves collaboration between the data provider and a researcher. Any data provider utilizing the SURF Research Cloud can offer their data using SANE. This generally necessitates having a free-of-charge SURF Research Cloud contract in effect, a condition met by the majority of Dutch research and educational institutions. At the same time, a researcher needs to have sufficient SURF Research Cloud funding to use SANE. This can be organized through a SURF E-infra grant or an institutional agreement between SURF and the institution the researcher is affiliated with. 

Once these requirements are met, the data provider creates a Collaborative Organisation (CO) to manage users and roles, as well as the virtual analysis environment itself. This process deploys an easy off-the-shelf environment made available by SURF. For more information about setting up SANE, follow the button below.

Additional questions about using SANE can be addressed to SANE Project Manager, Lucas van der Meer (lucas@odissei-data.nl).

About the collaborators

  • ODISSEI: The national research infrastructure for the social sciences in the Netherlands, facilitating groundbreaking research through data, expertise, and resources.
  • SURF: A cooperative of Dutch education and research institutions dedicated to enhancing digital services and fostering knowledge sharing through innovation.
  • CLARIAH: A distributed research infrastructure for the humanities and social sciences, providing access to extensive digital data collections and user-friendly applications.