Key concepts & features🔗
The Apheris Compute Gateway is a software agent that allows data-driven algorithms to be easily and safely run on sensitive and distributed data.
To allow secure, private and governed computations, the Apheris Product is built upon three key concepts
- Data Residency: Ensuring raw, sensitive data remains secure, this principle mandates that such data never needs to leave its origin. It stays protected behind the data custodian's firewall, safeguarding against unauthorized access and breaches, thus maintaining data integrity and confidentiality.
- Granular Access Control: This concept centers on granting data access strictly for the intended computational purpose, ensuring adherence to privacy regulations and internal policies. It only permits necessary data access for specific tasks, thereby minimizing the risk of data misuse and reducing the exposure of sensitive information (data minimization). Apheris not only secures sensitive data but enables users to also align with regulatory mandates, assuring that data utilization is always both justified and transparently recorded. Thanks to the features provided, data custodians can very easily implement purpose-based access control.
- Secure and Private Federation: At the core of the Apheris Solution is a robust federation engine that encapsulates numerous security properties. We also add an additional layer of privacy and security measures to fortify the framework's resilience against threats, ensuring a secure and controlled environment for data processing and analysis.
Key components🔗
The Apheris Compute Gateway connects to two other main components
- Apheris CLI - a powerful command line interface for Data Scientists to interact with Apheris
- Federation Orchestrator - enabling the secure federation of computations and aggregation of results.
Compute Gateways reside within the environment of a data provider/data custodian organization and ensure only computations matching the data custodian's requirements can be executed on a given dataset.
Compute Gateways fetch validated computations from the Federation Orchestrator. The Orchestrator is responsible for receiving computations submitted via the Apheris CLI, managing the federation of computations and secure aggregation of results. Results are then securely returned back to the CLI. Please visit our architecture page for details.
The Compute Orchestrator is hosted on security-hardened infrastructure run by the Apheris team (for a different setup, contact us via support@apheris.com)
Main features🔗
The description of features and terms is for Apheris-specific terms only. For a glossary around machine learning specific terms, we found the Machine Learning Glossary of Google quite helpful.
Governance Portal🔗
The Governance Portal is a UI where Computational Governance configurations can be set. More specifically, the Governance Portal can be used to:
- Register data to the Compute Gateway;
- Create and manage Asset Policies;
- Monitor and trace incoming computations;
Computational Governance🔗
Computational Governance is a method to control, supervise, and track all aspects of computations on data. It works by enabling a data owner to evaluate incoming compute requests, enforce privacy and security properties and oversee the release of results. This allows them to ensure the privacy and security of their data.Â
Federated Learning🔗
Federated learning, federated evaluation, and federated analytics require infrastructure to move machine learning models back and forth, train and evaluate them on local data, and then aggregate the updated models. The Apheris Solution provides this capability in an easy, scalable, and secure way. In combination with computational governance, Apheris provides an enterprise-ready solution for data-based collaborations.
Organization🔗
An organization is a group of users, commonly representing a company, institution or subsidiary, that can be governed together. Within an organization, each user can be assigned one or more roles, to interact with the Apheris product in any way.
User🔗
A user is a person with a valid account. Each user belongs to one - and only one - organization. Users are managed using the Governance Portal.
Roles🔗
Roles are named collections of permissions. Single or multiple roles can be assigned to a user. Types of roles include:
- Owner: Has full administrative access to the entire organization, can add/remove/edit users to the organization and can add and manage datasets and create asset policies. This role type also has Data Scientist rights.
- Data Steward: Can add datasets and manage only these datasets.
- Data Scientist: Can explore accessible datasets and run federated computations.
See Managing Users for more details.
Beneficiaries🔗
A beneficiary is a user of the Apheris product who is allowed to submit computations on specific dataset(s), view metadata of a dataset and view dummy data. Beneficiaries are defined within Asset Policies
Dataset🔗
A dataset is a single object consisting of real data and dummy data and is created using the Governance Portal:
- Real data: The original data that is registered to the Apheris product as a dataset and on which Data Scientists can launch remote computations and retrieve results. Real data always stays within the organization's environment and is never shown in its raw format to other parties.
- Dummy data: This refers to non-private/non-sensitive data which are representative of the real data. It is typically derived from the real data by either de-identifying the real data, or by creating synthetic data from the real data. Beneficiaries with granted permissions on a dataset cannot view the real data but can see the dummy data. This dummy data should follow the same schema and should include the same data types and data characteristics. This enables Data Scientists to understand the characteristics of the dataset and to build and test federated computations.
For more information, see Managing datasets.
Asset policy🔗
An asset policy specifies which users (beneficiaries) in an organization can perform which computations on which assets (including datasets) in Apheris. Only users with an Owner role can add an asset policy to an organization. An asset policy is applied to datasets to allow the beneficiaries to perform permitted actions on the datasets as defined in an asset policy.
Default asset policy🔗
When a user with Owner or Data Steward role creates a dataset, a default asset policy is also always created. This asset policy contains a view permission that is granted to the creator of the dataset as well as to users with Owner roles within that organization. This means for all new datasets, all Owners in the organization are automatically granted view permissions. These permissions allow Owners and Data Stewards to view the dummy data and the metadata that was generated for the dataset. It also allows them to run test computations on the dummy data.
Model Registry🔗
The Apheris Model Registry is a repository of federation-ready models which have been ported to run on Apheris Compute Gateways. The Model Registry can contain two categories of models
- Apheris pre-defined models: These models have been ported by the Apheris team and reviewed for security, privacy and compliance aspects. If privacy-enhancing technologies (PETs) are suitable for a given model, the Apheris team will implement those PETS and enable their configuration for Data Scientists and Data Custodians respectively.
- Custom models: Custom models are models submitted by the Data Scientist organization (model creator) to the Model Registry. This allows Data Scientists the flexibility to define and run their own custom models provided they have been approved by the data custodian.
Compute Specs🔗
The compute specification (Compute Specs) is defined by the Data Scientist and specifies what computation they intend to run on which data. This includes specification on:
- a runtime environment (e.g. a model from the Model Registry or custom model)
- a (collection of) dataset(s) to be used
- maximum hardware requirements
For a Compute Spec to be executed, it must comply with the requirements defined within the asset policies governing the data.
Please see Compute Specs for more details.
Data-driven analytics🔗
Data driven analytics is an umbrella term for statistics, machine learning and deep learning.
GPU-support🔗
An Apheris model can run on a given GPU if the Docker image used does contain the CUDA drivers needed by this GPU.