Privacy-preserving data ecosystem in support of drug discovery

Deep Learning is being employed in all areas throughout the drug discovery pipeline. Models are being developed for generating new and novel molecular structures, to predicting compound activities towards diseases or toxicological effects, through to analysing medical imaging or drug interactions. Deep Learning requires sufficiently large and diverse datasets for training, which is problematic for typical models based on physical-chemical properties: The cost of acquiring data is high and data often biased around a certain area of 'chemical space'. Thus, the models do not generalise well. Intellectual Property (IP) and related financial interests are a major obstacle to inter-institutional cooperation on data sharing: Data is the core competitive advantage for filing patents, thus keeping it secret is indispensable for future success. An innovative way to increase data size and diversity is through federated learning, a data-private collaborative learning method, where multi-institutional data are used to train a single model.

With Apheris, we offer a solution to build and orchestrate a privacy-preserving data ecosystem in support of drug discovery. The distributed data is used jointly for model training and all data holders benefit from a significantly increased training data volume and diversity as well as
improved predictive models. Privacy-preserving data sharing encourages data and method standardisation while data remains private and cannot be accessed by any other party.
Training occurs in a decentralized manner and only encrypted model parameters are shared in a privacy-preserving way. Proprietary data never leaves the individual companies’ environments and is never revealed to any other companies nor to Apheris as an intermediary. Our privacy core ensures that it is impossible for attackers to reverse-engineer data from the trained model.
federated learning model updates
Quantitative Structure-Activity Relationship (QSAR) models are an example of models that can be developed on top of such a data sharing ecosystem. This use case demonstrates how we at Apheris make sure that our customers are empowered to use distributed data at its fullest potential for the benefit of drug discovery and ultimately patients' lives.

Get access to our full whitepaper

If you are interested in reading the full use case and there is collaboration potential, we are happy to send you our whitepaper. Just fill out the form below.

You have a similar use case?

Contact us if you want to find out how we can improve your models via a privacy-preserving federated learning setup.