Combine PETs for the Success of Your ML/AI Projects

Published 29 March 2024

Last updated 12 September 2024

Abstract

In Machine Learning (ML) and Artificial Intelligence (AI), safeguarding sensitive data emerges is a paramount concern for CTOs and product owners in digitally mature organizations.

In this article, you’ll learn about the realm of Privacy Enhancing Technologies (PETs). We’ll show you why they’re necessary in the context of ML/AI projects and usage of sensitive and regulated data.

We begin by:

Demystifying PETs
Providing tangible examples
Explaining why they are indispensable for maintaining data privacy and security in AI-driven solutions

However, the reliance on a singular PET solution often reveals limitations in scope and scalability which incurs significant costs in terms of time and resources.

We suggest a more innovative paradigm: the integration of flexible PET approaches with computational governance.

If you’re a CTOs or a product owner, you’ll get equipped with a nuanced understanding of PETs in the ML/AI sphere. This should guide you towards making informed decisions that align with your organization's digital maturity and strategic objectives.

What are privacy-enhancing technologies (PETs)?

PETs are methods to protect the privacy and security of an individual and their data in collaborative (machine learning) computations while maintaining functionality and utility.

Their primary purpose is to enable data-driven technologies and analytics while safeguarding the privacy and security of the data subjects.

PETs are particularly crucial in the field of ML and AI, where they help balance the need for large datasets with privacy concerns.

Privacy enhancing technologies examples and their pro and cons

Now, we’ll show you the most popular privacy-preserving technologies and what types of advantages and disadvantages they present.

Example PETs, their pro and cons

PET	Definition	Pros	Cons
Data anonymization and pseudonymization	Alters personal data to prevent identification	Effectively protects identity.	Risks re-identification with additional data sources obtained e.g. through advanced data mining.
Differential privacy	Offers a mathematical privacy guarantee that individual data points' contribution cannot be extracted.	Safe for aggregate analysis.	Hard to select algorithm and parameters to maintain data analysis accuracy.
Homomorphic encryption	Allows computation on encrypted data.	Ensures privacy in untrusted environments.	Involves significant computational overhead and not all computations are supported.
Secure multi-party computation (SMPC)	Enables joint computations while keeping inputs private.	Enhances privacy in collaborative settings.	Complex and computationally expensive.
Federated learning	Trains models across decentralized datasets without sharing raw data.	Reduces privacy risks by keeping data local.	Challenging to manage and still susceptible to privacy attacks.
Data masking	Obscures specific data within a database.	Allows functional use while protecting information.	Risk of reverse-engineering if not properly implemented.
Synthetic data	Artificially generated data mimicking real data's statistical properties.	Mitigates privacy risks, useful for training models.	May not accurately capture real data complexities and biases.

ML and AI systems often require vast amounts of data to improve accuracy and functionality.

So the challenge lies in harnessing this data while respecting individual privacy and adhering to stringent data protection regulations like the GDPR, HIPAA, or CCPA.

PETs help address this challenge by providing mechanisms to use and analyze data without exposing sensitive information, thereby ensuring that ML and AI applications can operate within the legal frameworks designed to protect personal privacy.

However, while PETs offer significant benefits in safeguarding data in ML applications, they are not a panacea.

As OECD shares: “PETs are at different stages of development and will likely need to be part of data governance frameworks to ensure they are used properly in line with the associated privacy risks. Many of these tools are still in their infancy and limited to specific data processing use cases.” Emerging privacy enhancing technologies: Maturity, opportunities and challenges, OECD

Shortcomings of a singular PET solution

The idea that a single PET solution can comprehensively address all privacy and compliance concerns is overly optimistic.

These technologies, each with their unique strengths and limitations, often work best in specific contexts and scenarios.

For instance, techniques like differential privacy (DP) might protect individual identities in the final result but can compromise data accuracy, impacting the effectiveness of ML models.

Similarly, approaches like homomorphic encryption offer robust security at the cost of computational efficiency during the computation, but offer no protection of the final (decrypted) result.

Additionally, the rapidly evolving landscape of data privacy regulations and the increasing sophistication of cyber threats mean that relying solely on one type of PET can leave gaps in privacy and security measures.

Therefore, it's crucial to understand that while PETs are integral tools for privacy protection in ML and AI, they must be part of a broader, multi-layered strategy tailored to the specific needs and risks of each project.

And it’s especially true in more complex enterprise projects, where usage of multiple PETs in ML/AI projects is a must. Let’s explore the enterprise angle a bit more.

Challenges in adoption of PETs in enterprise ML and AI projects

The limitations of PETs can have significant impacts on ML and AI projects, especially in regulated industries like finance or healthcare.

These limitations can affect various aspects of project development and deployment, including scalability, code portability, computational efficiency, and accuracy.

Take a look at the details below:

Scalability: As enterprises scale their ML/AI projects, the demand for data processing increases exponentially. Some PETs, like homomorphic encryption or secure multi-party computation, can introduce substantial computational overhead, making them less viable for large-scale applications. This scalability challenge can hinder the ability to process large datasets efficiently, a necessity for robust ML/AI models.
Code portability and rewrites: Implementing PETs often requires specialized knowledge and can lead to significant changes in existing codebases. For example, a custom pre-processing pipeline for anonymization and data masking might necessitate substantial code rewrites because it’s data-dependent customized code. Additionally, the portability of these solutions across different platforms or technologies can be limited, requiring additional resources for adaptation and testing.
Computational efficiency: Some PETs can dramatically reduce computational efficiency. The increased computational load can lead to longer processing times and higher operational costs. In time-sensitive applications, this can be a critical drawback, potentially making certain PETs impractical for real-time processing needs.
Accuracy in regulated industries: In sectors like healthcare, finance, and legal, where accuracy is paramount, the use of PETs can sometimes compromise the quality of insights derived from data. Techniques like data masking or synthetic data generation might not capture the intricate patterns and nuances present in the original data, leading to less accurate ML models. This trade-off between privacy and accuracy can be particularly challenging in regulated industries, where decisions based on AI models can have significant consequences.

As ICO states “PETs should not be regarded as a silver bullet to meet all of your data protection requirements…”

For example, while DP is the go-to method for mitigating attacks that extract whether a given data point was part of the training data set (membership inference attacks), it does not protect against reconstructing data that is similar to the training data.

This is due to the definition of DP.

This has been confirmed empirically by Zhang et al.. Moreover, when the model should not only be privacy-preserving but the predictions should additionally meet fairness goals, the use of DP may reduce the fairness by amplifying the bias towards the more popular training data points as shown by Bagdasaryan et al.

As a result,a PET that mitigates one privacy threat can fail to mitigate another threat or even increase the threat, thus you should consider a combination of PETs.

How to combine PETs for the success of enterprise projects

Incorporating a multi-layered, integrated approach to PETs in ML and AI is vital for robust data protection.

This strategy addresses a wider range of privacy and security concerns compared to singular PET solutions, which simply aren’t sufficient for privacy-preserving computations and, therefore, not an option.

By combining various PETs, organizations can ensure comprehensive protection, with each technology addressing specific aspects of data privacy, from securing individual identities to safeguarding data in operation.

A key advantage of this approach is the ability to balance data privacy with utility.

Different PETs can be strategically employed based on the data's sensitivity and the project's scale, optimizing the balance between maintaining data utility and ensuring privacy.

This flexibility is also crucial in adapting to diverse regulatory requirements, enabling compliance with various standards like GDPR and HIPAA.

Moreover, a multi-layered approach provides a dynamic defense against evolving digital threats and ensures scalability and computational efficiency.

It allows for the strategic use of more efficient PETs for large-scale processes and reserves resource-intensive PETs for highly sensitive operations, optimizing both performance and security.

Enhanced data privacy and security: Integrating multiple PETs offers robust protection against data breaches, crucial for handling sensitive data and avoiding legal or reputational damages.
Regulatory compliance: A multi-layered PET approach facilitates adherence to data protection regulations like GDPR and HIPAA, reducing non-compliance risks.
Operational efficiency: A combination of PETs allows enterprises to optimize resource use, balancing computational efficiency with necessary data protection for scalable ML solutions.
Balanced data utility and privacy: Multiple PETs enable better maintenance of data quality while ensuring privacy, crucial for effective ML applications.
Innovation and competitive advantage: Enterprises using a sophisticated PET approach can safely leverage more data for innovation, giving them a competitive edge.

Practical approach to PETs: federated learning & computational governance

We at Apheris adopt a practical approach to data privacy in ML and AI, utilizing multiple PETs.

This approach allows data custodians to set controls that align with specific compliance requirements. Central to their strategy is the use of federated learning, enabling collaborative ML model development across decentralized data sources without direct data sharing. This method reduces data transfer risks and upholds privacy.

Complementing this, Apheris integrates a computational governance framework which allows data custodians to define which PETs must be used, effectively enforcing compliant data processing.

To simplify the usage of these PETs, Apheris offers ready-to-use sample implementations. Additionally, the system maintains detailed logs of all activity, essential for audit trails and regulatory compliance.

The combination of clear permissions for data use, PET implementation, and logs offers a streamlined solution for enterprises to securely exploit AI and ML potential while adhering to data privacy and compliance standards.

Enhanced data privacy and security: Integrating multiple PETs offers robust protection against data breaches, crucial for handling sensitive data and avoiding legal or reputational damages.
Regulatory compliance: A multi-layered PET approach facilitates adherence to data protection regulations like GDPR and HIPAA, reducing non-compliance risks.
Operational efficiency: A combination of PETs allows enterprises to optimize resource use, balancing computational efficiency with necessary data protection for scalable ML solutions.
Balanced data utility and privacy: Multiple PETs enable better maintenance of data quality while ensuring privacy, crucial for effective ML applications.
Innovation and competitive advantage: Enterprises using a sophisticated PET approach can safely leverage more data for innovation, giving them a competitive edge.

How to use federated learning and other PETs in practice

Incorporating federated learning along with multiple PETs, offers a robust solution to ensuring privacy in ML and AI projects.

This combination addresses data privacy and security effectively, allowing the use of diverse data sources for model training without the need for data centralization.

This ensures adherence to various data privacy laws and minimizes breach risks.

An integrated governance, privacy, security layer for compliant ML. A reliable source of privacy controls helps data custodians set the appropriate level of privacy depending on specific data characteristics.

In addition, Apheris provides guidance for which PETs to use for which data and model via our Trust Center and expert advice. Our experts implement PETs or use best-in-class libraries so there is no need to integrate with poorly maintained libraries.

A critical advantage of Apheris' method is its minimal impact on existing codebases, significantly aiding in scalability.

Product builders can integrate Apheris' solutions with minimal code porting and rewrites, efficiently scaling their applications for larger and more complex data sets.

The use of multiple PETs, in addition to federated learning frameworks, allows for a more nuanced approach to data privacy. It enables data custodians to select the most appropriate technology based on specific data characteristics and compliance requirements.

This multi-PET strategy enhances the robustness and accuracy of AI models while ensuring global regulatory compliance.

Apheris' system also provides clear audit trails, essential for regulatory compliance and transparency.

The combination of federated learning, multiple PETs, and computational governance results in an agile, scalable, and cost-effective approach for developing AI-driven products.

It empowers product builders to expand their solutions securely and responsibly, addressing the contemporary challenges of data privacy in AI.

Choose the right PET strategy for your upcoming enterprise projects in ML/AI

The exploration of PETs in ML reveals that a singular PET approach, while beneficial, is insufficient for the complex demands of modern data privacy.

The multi-layered strategy, exemplified by federated learning with computational governance, offers a more comprehensive solution. This approach effectively addresses scalability, compliance, and data utility challenges in ML/AI projects.

If you’re considering a project that requires safe data usage for AI projects and you need a trusted partner, schedule a demo with our experts.

How privacy-enhancing technologies can help you achieving data privacy in enterprise ML/AI

Abstract

What are privacy-enhancing technologies (PETs)?

Privacy enhancing technologies examples and their pro and cons

Example PETs, their pro and cons

Shortcomings of a singular PET solution

Challenges in adoption of PETs in enterprise ML and AI projects

How to combine PETs for the success of enterprise projects

Practical approach to PETs: federated learning & computational governance

How to use federated learning and other PETs in practice

Choose the right PET strategy for your upcoming enterprise projects in ML/AI

Insights delivered to your inbox monthly

Related Posts

Privacy and Security Whitepaper

Guide: Intro to Federated ML and analytics with Apheris

Computational governance: The key to building safe and compliant AI

Privacy and Security Whitepaper