Enabling IP-Preserving Computations on Sensitive Data

Machine learning and AI needs domain-specific data to be trained for its various use cases. Often this data is sensitive and falls under various privacy regulations. In this article we will introduce the Apheris Compute Gateway as a solution for contributing sensitive data to ML projects.
Jan Stuecke
Product Marketing
Published 8 May 2023

Today, every organization faces more and more regulation when it comes to data. GDPR, HIPAA, PCI, CCPA to name but a few. And we all know, regulators won’t stop there. Most recently, the US published a strategy paper [1] outlining guidelines to strengthen privacy rights but also enable data collaboration on analytics and machine learning. One can perceive regulations as an unnecessary burden… or as an ask by our society to do better.

For example, GDPR formulates the perspective, that personal information of an individual is their intellectual property and must be treated, in a way, like proprietary data of a company. On the other hand, we all understand that tomorrow collaborations based on sensitive data become quite important to solve pressing questions on earth and that machine learning will play a crucial role.

TL;DR: In this post, we will look at the architecture of the Apheris platform for federated learning with a special focus on the Compute Gateway. We will see how it enables secure and privacy-preserving computations for everyone. As a Germany-based company, we had to find a powerful, scalable, and integrated yet adaptive solution, compliant even with the most restrictive regulations.

Privacy-Preserving Computations

Let’s start with defining some key terms.  

A data custodian is an organization that wants to provide data to machine learning or analytics projects without giving data away or allowing direct access.

Privacy-preserving for Apheris means that data never moves outside of a data custodians’ environment. In addition, data will never be directly accessible by the outside (not even by Apheris). On the other hand, a machine learning model is the intellectual property of a third party, and it should be treated with the same care and privacy in mind.

Federated learning with Apheris enables the separation of data, model and computation on any data while still achieving the same quality of computational results. This is what we mean by privacy-preserving computations.

This approach is already being used in the wild today in, for example, drug discovery, imaging diagnostics, conversational AIs, and risk modelling. By respecting all sides and their rights, Apheris has already enabled collaboration among direct competitors who needed each other’s data to succeed in their projects. Let’s look at the Apheris Platform in more detail to understand how this is possible.

The Apheris Platform for Federated Learning & Analytics

The Apheris Platform consists of three parts.

The Apheris Platform

A user, for example a machine learning engineer, connects to the Apheris Platform via the Apheris SDK (shown on the right side of the figure). They can easily leverage their existing code base using their favorite machine learning library by adding just a few lines of glue code.

The second part of the platform is the Compute Orchestrator which sits in the middle. After a computation request has been sent by a user it is checked for security and compliance throughout the system.  If the computation gets approved, it is pulled by the Compute Gateway to be executed against the dataset.

The Compute Gateway is key for allowing computations without making your data directly accessible to a third party. It runs within a data custodian’s environment and acts as an anonymity layer between the model a user wants to train and the data the model should be trained on.

The Compute Gateway checks at certain intervals if a compute request is cleared by the Orchestrator and then pulls this request. Once a request has been pulled, a final security and compliance check is completed against the asset policy that you, as a data custodian, defined. Only then can a computation be executed. All of this and much more ensures that you as a data custodian stay in full control of your data. Always.

Neither Apheris nor the user requesting computations will ever see the data. Of course, all communication within the whole Apheris Platform is encrypted in transit with TLS 1.2+ and data at rest via AES-256.

Security-by-design

When enabling computational access, it is important to get security right. The enterprise-grade security features within Apheris enable fine-grained, role-based access control using asset policies for each dataset as well as encryption at rest and in transit. All of this ensures robust information security. Especially when combined with security baked into the architecture of the platform. In the following sections, we will only look at some of our security features. To get the full picture, read our detailed White Paper on Security & Privacy within the Apheris Platform.

Pull Never Push

The Compute Gateway only allows egress traffic. Hence, requests can never be pushed to the Compute Gateway. It is a pull-only architecture. The Compute Gateway asks the Orchestrator in certain intervals, if a compute request is available for a dataset registered with the Compute Gateway. If this is the case, it pulls this request and assesses the individual request for compliance with the defined asset policy. If everything is ok, the compute request will be executed.

Every Data Asset Has a Policy

Access control is a crucial design aspect within privacy-preserving data collaborations. Apheris leverages asset policies that define which operations can be performed on datasets and by whom, as well as the privacy controls applied to the computation and the outputs.

Apheris Asset Policies

Apheris enables you as a data custodian to be very granular when configuring your asset policies to ensure they meet your organization’s and any regulatory compliance needs. For example, you can enable or disable computations down to the functional level, meaning you can allow certain statistical algorithms to run against a dataset or not.

Based on the usage of this privacy functionality, a data custodian can prevent reverse engineering of data. The Apheris team is more than happy to support you with such configurations. For any question you might have already, feel free to discuss your individual needs with our technical team.

Enterprise-Grade Security

Access Management

By default, any data in your organization is only accessible to the organization itself - no one else has access. You, as a data custodian, have full control to decide which data from your organization you want to contribute to a collaboration and which operations on this data you want to permit.

Within asset policies, you can specify which user from another organization or which specific collaboration you want to give the possibility to run computations on your data.

On a user level, you can specify which roles a particular user in your organization can have and therefore define the set of permissions a particular user has. Depending on the defined permission set, a user can have computational access, register datasets to the compute gateway, manage users, or manage other aspects.

Logging and Auditing

The Apheris Platform logs all operations and activities related to datasets for traceability purposes, including the actions taken, who performed them, and any artifacts generated. This is to ensure compliance with regulatory requirements or your organizations forensic needs.

Data Security

Data is encrypted both at rest and in transit. TLS 1.2+ is used for end-to-end encryption, ensuring that data is secure from man-in-the-middle or interception attacks, whether it is in transit between your computer and the Apheris Platform or within the platform between its components.

ISO27001 Certification

Apheris is ISO27001 certified, an international standard to manage information security.

Privacy Controls

The Apheris federated infrastructure provides strong privacy controls to ensure that data custodians maintain full control over their data. With privacy-enhancing technologies and computational controls, computations are performed on real data, while only computation results are shared, without revealing any private or sensitive information.

These controls allow data custodians to ensure only computations that fit their specific use case are permitted. Only results that conform to pre-defined criteria regarding the risk of reverse engineerability can leave the environment. In order to assess your individual needs and the best approach, please consider discussing your project with us directly. This mapping of access against compliance and internal IP protection requirements provides a robust privacy framework for anyone who wants to productize their data access.

In this blog, we could only take an overview tour of the most important security & privacy capabilities data custodians can leverage with the Apheris Platform and the Compute Gateway. To get a full picture, read  our White Paper on Security & Privacy and feel free to discuss your needs with us directly.

[1]: National Strategy to Advance Privacy-Preserving Data Sharing and Analytics

Security
Data & analytics
Collaboration
Privacy
Machine learning & AI
Platform & Technology
Share blog post to Linked InTwitter

Insights delivered to your inbox monthly

Related Posts