Anomaly Detection in Medical Images via Federated Machine Learning

Machine learning models can support physicians in the detection and characterization of anomalies (e.g., tumors) in medical images. Therefore, integrated and optimized structure recognition algorithms can add significant value to imaging devices. The recognition models can be markedly improved by training on diverse, large, and labelled image datasets. By using the Apheris Platform for federated and privacy preserving data science, machine learning models can be securely trained on datasets of various sources (e.g., hospitals) without sharing any patient data.

MedTech Company

Situation

A MedTech company wants to continuously improve their medical devices by deploying AI features and services. Their products offer clinical decision support by automatically detecting anomalies in medical images during the diagnostic process. Highly accurate anomaly detection software incorporated into MedTech company A’s products will increase their value and accuracy.

Problem

Datasets containing medical images are highly sensitive and are siloed across multiple healthcare institutions (e.g., two different hospitals). They cannot share sensitive patient data because of compliance and security constraints.

Apheris solution

By using the Apheris Platform, the MedTech company can train AI models on the medical image datasets of the hospitals. Each hospital’s data remains under their full control and patients’ privacy is preserved. Because the models are trained on large and diverse data of multiple hospitals, they are robust and generalize well. The improved AI models are integrated into the MedTech company’s products and services. Via the Apheris Platform, the MedTech company can continuously develop and deploy advanced AI features and services like anomaly detection in medical images, increasing the value of their products.

Advantages of using Apheris

Make superior products

The MedTech company can continuously deploy new AI features and services to improve their products

Save time and money

The MedTech company avoids inefficiencies of complex legal and compliance reviews, thereby saving time and money

Improve diagnostics

The hospitals improve their processes and diagnostic success

Support clinical decision-making

Through federated and privacy preserving training on diverse data from multiple hospitals, the anomaly detection models are more accurate and reliable, leading to clinical decision support systems suitable for precision medicine implementations

apheris logo

Analysis of Vertically Distributed Genomics and EHR Data

Essential features that are relevant for training AI models are vertically distributed across healthcare institutions (e.g., several hospitals and laboratories own data from the same patient) and data cannot be shared between them. The Apheris Platform for federated and privacy preserving data science allows for training AI models on the combined data. Consequently, new scientific insights and novel data products can be generated.

Situation

Two large healthcare organizations separately collect data from the same patients. A genetic lab has genomics data while a hospital owns the Electronic Health Records (EHRs) of its patients. Essential features that are relevant for training AI models are distributed across the genetics lab and the hospital. This is known as vertically distributed data. A joint database would enable developing strong machine learning models that unveil the relationships between genetic variations and the corresponding disease-progression of patients who have been treated with a certain drug.

Problem

The datasets (genomic and EHR data) are protected due to their inherent value and sensitive information. Sharing these datasets is prevented by data protection regulations and intellectual property concerns.

Apheris solution

To make use of vertically distributed data without running into data privacy issues, the Apheris Platform uses a combination of cryptographic techniques, data privacy technologies, and federated machine learning. Our Platform allows the genetics lab and the hospital to collaborate on data and generate new scientific insights, while both parties’ data is fully protected and patients’ privacy is fully preserved.

Advantages of using Apheris

Improve treatment of patients

The Apheris Platform enables a more holistic patient view allowing for a personalized treatment approach (e.g., a genetic profile dependent medication choice and dosing scheme)

Unlock new scientific insights

Analyzing a broad set of EHR and genomic data combinations can reveal new scientific insights potentially leading to new therapies

Increase value of data

Institutions can leverage the value of their data while fully preserving patients’ privacy rights

apheris logo
Illustration of a group with medical masks

Our Technological Developments and Commitment in the COVID-19 Outbreak

EXECUTIVE SUMMARY

At the beginning of 2020, the COVID-19 virus causing a respiratory tract disease was observed in China. This event is fundamentally changing life as we know it, a worldwide pandemic with still no foreseeable end. Disease models about the spread of COVID-19 show: high precision isolation is the most effective measurement in a virus pandemic. The spread of the deadly coronavirus needs to be slowed down as quickly as possible, while minimizing its economic impact.

COVID-19 mobile contact tracing apps are an important factor for high precision isolation, but they need to be  adopted by most people to fulfil their purpose. As the apps capture sensitive and private data like contact traces and the infection status, it is indispensable that COVID-19 contact tracing apps are trusted and data privacy is crucial for that: privacy drives trust and trust drives adoption. Cryptographic technologies allow computations in a privacy preserving manner and are therefore one of the key technologies: Private set intersection (PSI) is a powerful cryptographic technique which allows two parties to compare data with one another without exposing their raw data to the other party.

To help reduce the spread of the coronavirus, we have developed a private set intersection library for contact tracing initiatives to incorporate in their COVID-19 apps. The necessity of privacy preservation in the case of COVID-19 contact tracing apps and how it is done, are also outlined in this article, which we published in collaboration with OpenMined.

Continue to read the full story here:

THE SPREAD OF A VIRUS

At the beginning of 2020, reports of an outbreak in China of a previously unknown respiratory tract disease with the causative agent being a virus emerged. This event is fundamentally changing life as we know it, in almost all parts of the world: a pandemic with still no foreseeable end. SARS-CoV-2 is the causative agent of the pandemic outbreak. It is a newly encountered member of the coronavirus family which belongs to the RNA-viruses and is in its behaviour comparable to influenza viruses or SARS-CoV - the causative agent of the pandemic outbreak 2002/03. As soon as virus particles get into a host (human), they start invading cells (in this case predominantly respiratory tract cells), and the host’s cells replicate the virus’s genome. Virus particles get into the host’s saliva and humans infect each other by talking to infected individuals, by touching hands and by close face-to-face interaction.

Governments among others need to learn when and how the virus is spreading to consider the appropriate measures to take. So-called COVID-19 contact tracing apps are of high importance to limit the spread of the disease.

virus

ON THE IMPORTANCE OF CONTACT TRACING APPS

Disease models about the spread of COVID-19 show: high precision isolation is the most effective measure in a virus pandemic. The spread of the deadly coronavirus needs to be slowed down as quickly as possible, while minimizing its economic impact.

COVID-19 mobile contact tracing apps are an important factor of success to achieve that. Several countries have shown that monitoring and tracking the collective movement of millions of people is necessary to cope with a pandemic. Multiple institutions and companies developed COVID-19 smartphone apps that serve the needs of the individual, including symptom analysis and exposure alerts on COVID-19 hotspots. The collected data can be used for statistical insights on symptoms but also enable high precision self-isolation as well as rapid identification of those exposed to COVID-positive people. These applications are necessary to minimize the impact on public health and the economy.

There are quite a few COVID-19 apps on the market available; many of them are completely open sourced. These apps have different functionalities but on a general note, their core workflow is similar and can be described like this: Every user collects tracing data on their mobile phones. Those generated contact IDs are stored on the users' phones only. If the health authorities diagnose a user positive with the coronavirus, the user can (but usually doesn't have to!) share their data and transfer it to a server. Any other user of the app can now learn if they have potentially been in contact with a positive tested user by comparing their data with the updated data on the server.


This workflow contains two privacy issues which need to be considered when data is shared and compared: the diagnosed patient's privacy and the user's tracing data.

central server

COVID-19 CONTACT TRACING APP PRIVACY CONCERNS

It is critical that COVID-19 contact tracing apps are adopted by most people to fulfil their purpose. As the apps capture sensitive and private data like contact traces and the infection status, it is indispensable that COVID-19 contact tracing apps are trusted and data privacy is crucial for that: privacy drives trust and trust drives adoption. Cryptographic technologies allow computations in a privacy-preserving manner and are therefore one of the key technologies to help end the pandemic:

Private set intersection (PSI) is a powerful cryptographic technique which allows two parties to compare data with one another without exposing their raw data to the other party.

private set intersection

For COVID-19 apps, PSI allows for a user to check if the tracing data they collected matches the traces of diagnosed patients, whithout revealing their private tracing data to the server. Depending on the type of PSI protocol, the client would then only learn the matching traces itself, or the count of matching traces. This prevents data from becoming publicly available and being exploited or abused. Additionally, the central server does not have to collect all user’s data but only the contact traces of infected users.

app alert

Differential Privacy is another cryptographic technique which enables private data analysis. It can be used by organizations to learn statistical information about a dataset while ensuring that the statistical results do not allow any individual’s data to be reverse engineered and identified. Differential privacy is relevant for the government to make use of the app data without exposing the individual user. For more information on the functionality of Differential Privacy, check out our blog post here.

governmental use of data

OUR CONTRIBUTION TO FIGHT COVID-19

To help reduce the spread of the coronavirus, we have developed a private set intersection library for contact tracing initiatives to incorporate in their COVID-19 apps. The necessity of privacy preservation in the case of COVID-19 contact tracing apps and how it is done, are also outlined in this article, which we published in collaboration with OpenMined. Our code is open source and you can clone the repository from GitHub here.

You can get more details on the importance of a COVID-19 app from our whitepaper which we published early in the crisis. The executive summary of it: epidemiological modelling for spread of disease shows that high-precision self-isolation is the best approach to stop the pandemic and thus to minimize its economic damage. Read more about this and our call to action for the development of a COVID-19 contact tracing app at https://www.covid-app.io/.

We partner with the largest Open Source community around privacy-preserving artificial intelligence, OpenMined. We are as well actively collaborating with the TCN-coalition which is a global coalition for privacy-first digital contact tracing protocols to fight COVID-19. We have offered our algorithmic privacy core to support COVID-19 contact tracing apps and are actively supporting several initiatives in Europe and the US with our technology. Furthermore, together with Microsoft, Amazon, Facebook, IBM, HP and Intel, we are one of the ten founding adopters of the Open Covid Pledge. You can read more about our commitment in the COVID-19 outbreak on our website.

BEYOND COVID-19

PSI is versatile in its use and not bound to a mobile phone to central server scenario like in our COVID-19 app development efforts. It is applicable whenever two or more parties have an interest in learning and comparing their data without disclosing the full data itself. Common use cases where a PSI protocol is useful include:

  • Private Contact Discovery: Users can find which of their private contacts also have a certain communication app (server),
  • DNA testing and pattern matching: A user who got her DNA sequenced can find out about sequences linked to genetic diseases which are stored on a database (server),
  • Remote diagnostics: A medical diagnostic program assigns a status (sick or not sick with a certain disease) to a vectorized patient’s (client) electronic health record. While the client learns about her sickness, the program itself remains secret and the program owner (server) does not learn anything about the client’s data,
  • Private record linkage: Two data owners hold different types of information for the same customer. To make data mining possible, both records must be linked together and made available without giving away any other private data stored,
  • Chemical compound comparison: In a federated learning setup where two companies jointly train a e.g. QSAR model on their data, the data can be pre-processed with PSI to find and eliminate data duplicates to streamline the model training.