7 Myths About Federated Learning

In our daily work as a company that builds a platform for federated and privacy-preserving data science, we are often asked to clarify concepts around federated learning with customers. This article highlights 7 common myths about federated learning (FL) and, using practical examples, shows you exactly why they are misleading.
Marie Roehm
Marketing
Published 9 September 2022

Myth #1: Federated learning is only applicable for mobile devices

When conducting initial research into federated learning, many people first see examples involving edge devices. This is due to how federated learning developed, particularly in the earliest days of its technological development.

In 2016, a team of Google researchers introduced a paradigm-changing concept: the ability to train machine learning models from user interaction with mobile devices.

Since then, this concept has been further developed. Some papers split federated learning into two categories: "cross-device federated learning", and "cross-silo federated learning" and it is applicable to all scales of federation.

A Survey on Federated Learning Systems: Vision, Hype. and Reality for Data Privacy and Protection (Source)

At Apheris, we are differentiating between internal across several business units in a company, in a one-to-one collaboration with a partner of your supply chain, as part of multi-party industry collaboration, or even across entire data ecosystems such as those envisioned by Gaia-X or Catena-X.

Myth #2: Only data scientists have to care about federated learning

One of federated learning's key features is the ability to do data science on data that is not visible.

This sounds like it is only relevant for data scientists, doesn't it?

But federated learning is far more than that. If done right, federated learning can be a scalable, technical solution to today's legal and business problems.

When combined with other privacy-enhancing technologies, FL enables companies to solve some of the most complex and time-consuming processes within their organization. They can stay compliant, protect the IP of sensitive data assets, maintain privacy and keep data sovereignty when leveraging data to create business value.

Today, we can solve all of this with the help of only one, scalable solution.

These issues would otherwise occupy thousands of highly skilled employees in companies across all industries!

Data Scientist

  • Access to more data

  • Trains AI models of higher quality

  • Brings AI successfully in production

Compliance

  • Knows that the IP of data and models stays protected

  • Data Sovereignty is given

  • Compliance, security and privacy are maintained

Executive

  • Enhances data & partnership potential

  • Increases the ROI of AI and data science projects

  • De-risks investments into AI

Myth #3: Federated learning preserves the privacy of data

Not quite: Federated learning on its own does not protect the privacy and IP of your data sets. Of course, it does have some considerable privacy-preservation advantages compared to traditional, centralized machine learning approaches, since it enables the training of a model whilst retaining personal training data on the servers.

Nevertheless, model parameters can still carry sensitive features that can be exploited to reconstruct or infer related personal information. To solve this, additional privacy-preserving technologies, such as Differential Privacy and comprehensive IT security measures must be employed to protect the IP and privacy of data and models.

There is no generic framework for data privacy - only the combination of different privacy-enhancing approaches enables optimal implementation of real-world use cases (Source: Apheris)

Myth #4: Federated learning is a theoretical concept and not applied in production

In this paper, a team of Google researchers explain how they use federated learning in a commercial, global-scale setting to train, evaluate and deploy models to improve search suggestion quality without direct access to the underlying user data. Companies such as Apple and Samsung have already followed suit with similar use cases.

Besides that, there are plenty of other examples across all industries:

Pharma

MELLODDY is a project from a large consortium in the pharma space, involving companies such as Amgen, Bayer, MERCK, Novartis, AstraZeneca, and more. It aims to leverage the world’s largest collection of small molecules with known biochemical or cellular activity to enable more accurate predictive models and increase efficiencies in drug discovery.

Manufacturing

The European project MUSKETEER is focused on two industrial scenarios - smart manufacturing and healthcare. MUSKETEER aims to create a validated, federated, privacy-preserving machine learning platform that is interoperable, scalable, and efficient enough to be deployed in real use cases.

Healthcare

In a recent example, a team of NVIDIA researchers published a sensational use case for federated learning - predicting clinical outcomes in patients with COVID-19. They have shown that it is possible to reach an impactful result that would be otherwise unachievable if only using local data and centralized training.

Enterprise-grade Federated Learning Platforms

While there are federated learning systems used in production, the complexity of deploying and maintaining such a solution is still very high.

To date, there are only very few enterprise-grade solutions that apply this concept - one of them being Apheris.

In this context, we have made an interesting observation: People might have heard about federated learning, but they still have little knowledge of how to assess tools and platforms within this emerging discipline.

To support the evaluation process and to accelerate the industry-wide adoption of federated learning, we created the "Buyer's Guide to Secure Multi-Partner Data Collaboration". The guide contains a list of must-have features for any enterprise-grade platform, so you can make the best possible selection for your company and your use cases.

Download your copy of the Buyer's Guide here! (Source: Apheris)

Myth #5: Federated learning doesn’t work on heterogenous data and data must be highly standardized

This is only partly true.

Common data models (CDM) such as CDISC for clinical trials, OMOP for Electronic Medical Records, or OPC-UA for Industry 4.0 are definitely essential to be able to collaborate on data. Companies also need to have a sufficiently high data governance maturity, ensuring that data is of high quality and suitable for machine learning.

But the reality shows that even with CDM, data from multiple parties is rarely clean and harmonized.

Enterprise-grade platforms like Apheris open up an entirely new discipline within federated & privacy-preserving data science.

Apheris supports the entire data science workflow, from data exploration to individual preprocessing pipelines for each dataset, running statistical analysis, testing, and validation of models - and all of that in a privacy-preserving manner.

Enterprise-grade Federated Learning platforms like Apheris support the full data science workflow and the integration of any Python library to build federated computations. (Source: Apheris)

Myth #6: Federated learning requires more computational resources than centralized learning

Recent studies have shown that training large models in conventional data centers can cause a significant increase in CO2eq production. Federated learning is a more carbon-friendly way to train neural networks and can have a positive impact on reducing carbon emissions.

The website ML CO2 Impact does a great job in raising awareness around this topic and even allows you to calculate your GPU's emissions.

A calculator to compute your ML carbon impact (Source: https://mlco2.github.io/)

In comparison, a team of researchers at the University of Cambridge published a paper in July 2021 which helps to quantify the CO2eq emissions of training deep learning models either in data centers or on the edge.

On a cross-device level, the massive increase in smart IoT devices will have a significant impact on carbon emissions. Federated learning is a more sustainable way to apply artificial intelligence to the Internet of Things.

Statista estimates that there will be 75 billion connected devices in 2025 (Source: Statista.com)

In cross-silo scenarios, at Apheris we are currently working with manufacturers and their supply chain partners to help them achieve their long-term sustainability goals. With federated and privacy-preserving data science, multiple partners can securely leverage data from production and machine settings, which results in increased product sustainability and a lower CO2 footprint.

Federated learning allows securely connecting the data dots along the supply chain to optimize each stage, increase transparency and reduce waste. (Source: Apheris)

Myth #7: Open-Source frameworks allow companies to build up secure multi-partner data collaborations

Bringing federated learning to life and into production requires much more than just technology or frameworks. Especially because it operates on data - one of the most important, sensitive, and yet somehow intangible assets that companies have today.

Federated learning is an amazing tool that enables data collaborations - but you have to see it as one of many building blocks.

There are many other requirements that enterprise customers expect:

  • Secure Federated Architecture that ensures only computation results move between isolated and confidential environments

  • Support of the full data science workflow and integration of any Python library

  • Enterprise-grade security, such as access management, traceability & auditability, data encryption, and the highest of standards in development and staff security

  • Additional state-of-the-art privacy-enhancing technologies that protect data privacy and IP

  • A privacy approval process that enables optimal model utility while preserving data privacy

  • Legal & contractual support in form of comprehensive compliance frameworks and streamlined processes

Only in combination with enterprise-grade features and contractual frameworks can cross-company collaboration thrive and leverage federated learning to its full potential.

One Fact: Federated learning is a key technology of our future

Last but not least, we're going to close this article with a fact.

Mona Flores, Global Head of Medical AI at NVIDIA recently published an article that covers the results of a great paper around federated learning in healthcare.

The headline says it all: "Medical AI Needs Federated Learning, So Will Every Industry"

At Apheris, we are convinced that all projects that involve federated learning are extremely valuable. We are proud to contribute to a more ethical and sustainable future and are looking forward to furthering developments in the field.

Do you want to discuss how you can securely collaborate on data with partners? Let us know.

Data & analytics
Federated learning & analytics
Machine learning & AI
Share blog post to Linked InTwitter

Insights delivered to your inbox monthly