vertically distributed federated learning

Analysis of vertically distributed genomics and EHR data

Healthcare data is distributed across several organizations. Imagine a genomics research provider and their collection of people’s genomes and the data of the exact same people stored in their electronic health records.
The whole dataset contains the same people but multiple organizations hold different features of it. This data distribution pattern is called "vertical distribution". Bringing the data together and analyzing the sum of its parts is important to gain valuable insights. In this specific case it would be disease progression patterns based on genomics.

horizontally and vertically distributed data
At Apheris, we have developed a unique solution to make use of vertically distributed data without giving up on data privacy. We combine cryptographic techniques and data privacy technologies to build an end-to-end privacy-preserving analytics system that is composed of three major steps:

  • STEP 1 - Privacy preserving record linkage: Specialized cryptographic tools allow to link the data that “belongs together” across the parties without revealing any information beyond the fact that these data samples belong together.
  • STEP 2 - Federated and privacy-preserving analytics: The data resides at the data owner and the computation is split across them. Further privacy mechanisms such as adding differentially private noise guarantee that no data is leaked to any other party. Simple analysis tasks as well as complex tasks such as training a machine learning model can be carried out.
  • STEP 3 - New scientific insights lead to optimized treatment options tailored to each patient: With privacy-preserving access to a diverse set of genomics and EHR data, entirely new scientific insights can be unlocked. Models that predict the disease progression based on patients’ unique molecular profile are the foundation to identify the right treatment options tailored to each patient. With the Apheris Platform, the full value of distributed data is unlocked and precision medicine becomes a reality. 

Our approach is generic and can be applied to all kinds of industries in which joint computations of vertically distributed data are necessary. This includes use cases like money laundering and production data in manufacturing. The workflow as well as the privacy techniques are explained in detail in our whitepaper, both in general and using the example of a healthcare scenario with vertically distributed genomic and electronic health record (EHR) data.

Get access to our full whitepaper

If you are interested in reading the full use case and there is collaboration potential, we are happy to send you our whitepaper. Just fill out the form below.

Get more info!

If you are interested in our open source technology, make sure to check out our GitHub repository we co-developed, and read our accompanying blogpost we co-authored with OpenMined.

You have a similar use case?

Contact us if you want to find out how we can improve your models via a privacy-preserving federated learning setup.