Good ADMET prediction shortens DMTA cycles, surfaces liabilities before they compound, and cuts avoidable experimental iteration. The value is largest on novel chemistry where internal data coverage is thin. But in those cases uncertainty is high, and model performance degrades as a result. Addressing this requires more diverse, decision-relevant data combined with models that quantify their applicability domain, propagate uncertainty, and support active experimental follow-up.
Pre-competitive federated data networks provide the structural mechanism for this. These are partnerships in which organizations contribute proprietary chemistry and assay measurements to a shared model without raw data leaving their environment. In this piece I walk through the principles behind this argument, the published evidence behind each one, and the design choices they imply. Demonstrating that federation improves ADMET models is not sufficient; what's more impactful to understand is why it does so. Comparing model behavior, endpoint characteristics, assay quality, modelability, and reproducibility across partners allows a mature federated network to attribute uplift to its structural sources and feed those findings back into the next model-building cycle.
ADMET liabilities account for roughly 40–45% of clinical attrition (Sun et al., Acta Pharm. Sin. B, 2022), yet the most consequential ADMET decisions are made much earlier. Long before a compound reaches the clinic, medicinal chemists have already decided which scaffolds to push, which series to drop, and which liabilities to design around. Those decisions increasingly rely on predictive models, and their reliability on novel chemistry sets a practical limit on how much downstream attrition computational triage can prevent.
Modern ADMET models perform well on the chemistry they have seen and degrade quickly on the chemistry they have not. Fang et al. (J. Chem. Inf. Model., 2023)'s prospective benchmarking of seven ML algorithms on 120 internal Biogen test sets across six endpoints showed that prospective predictivity does not always improve as the training set grows, and the drop tracks a measurable decline in train–test chemical similarity (Sørensen–Dice 0.79 at 1-month retraining, 0.68 at 10-month).
Wang et al. (J. Cheminform., 2025) reproduce the pattern on Caco-2 permeability. Their model performed well on held-out public compounds but showed near-zero predictivity on internal proprietary scaffolds. Proprietary chemistry is rarely well-represented in any single public source. Most ADMET models are reliable precisely where their reliability matters least.
The Apheris ADMET Network was built around the conclusion that this is structurally a data problem, not an architecture one.
Public ADMET data is biased in chemical space and heterogeneous at the assay level. Even the most carefully curated public releases are several times smaller than a single pharma company's internal collection, and they do not cover the proprietary scaffolds that are relevant for decision-making. Assay protocols differ in ways that are rarely documented well enough for proper normalization. And random train/test splits leave high-similarity molecules on both sides, inflating reported performance.
Polaris addresses the last issue with stricter scaffold-based and diversity-based splits. On the resulting harder benchmark, Goossens et al. (Galapagos, ChemRxiv 2025) ranked 2 of 39 with carefully curated public data plus multi-task ChemProp, beaten only by an entry that used proprietary data. Their leaderboard MAEs sit within 0.01–0.07 log units of the winner across every endpoint.
That is the public-data ceiling: very close to the top, never at it. None of this is fixable by scraping more PubChem records. The chemistry, assays, and metadata required to improve ADMET prediction on proprietary scaffolds live inside pharmaceutical companies, governed by IP and regulatory constraints that make centralized aggregation a non-starter.
Once you can pool data across organizations, the obvious assumption is that total compound count is what drives predictive accuracy. The evidence, however, points the other way. In MELLODDY (Heyndrickx et al., JCIM 2023), the largest cross-pharma federated learning study to date, ADME and panel assays saw the clearest gains from federation. I saw this directly while leading AstraZeneca’s contribution to the project: the value was not that any one partner contributed more data, but that multiple pharma companies measured similar endpoints, creating overlapping assay structures and correlated chemical signal that the federated model could learn across organizations. In Figure 4B, the largest uplift appears where cross-partner assay and chemical similarity are highest. The driver is complementarity (the chemistry and biology the network covers collectively) not raw data volume.
The ChemMedChem study by Schliephacke, Kuhn and Friedrich (Merck KGaA, 2025) makes the endpoint-level point. The authors integrated Merck's internal datasets with Fang's public Biogen release across six standard ADME endpoints. On MDR1-MDCK efflux, where the public data added genuinely new chemistry, overconfident predictions halved. On endpoints where the public compounds largely overlapped the internal set, adding them gave no benefit and occasionally degraded calibration. Where added data is structurally redundant, the gains disappear.
The subtler finding is that the multi-task model's applicability domain often expanded even when headline MAE did not change. Schliephacke et al.'s similarity-stratified curves show the multi-task MAE sitting consistently below the single-source curve across nearest-neighbour Tanimoto bins, not only where any model performs well. For chemists working on novel scaffolds, that is what matters operationally. The model behaves less erratically in the early-stage chemical space where predictive models are actually used.
Two principles follow. First, the right contribution metric for a federated ADMET network is coverage of chemical and assay space, not warehouse size. Second, the right success metric is behaviour across operationally relevant regions, evaluated under scaffold- and similarity-stratified protocols rather than random splits.
The Apheris ADMET Network currently comprises five pharma companies, with two additional partners onboarding, training collaboratively across their proprietary datasets without any raw data or structures leaving their environments.
The network covers more than 100 ADMET endpoints, spanning solubility, permeability, metabolic stability, CYP inhibition, transporter liabilities, hERG and in vivo PK, with safety and Cell Painting readouts being added as the network grows.
The architecture is driven by three constraints:
Partner assays are heterogeneous
Endpoint definitions are partner-private and cannot be forced into a shared schema
Training has to converge in reasonable wall-clock time across geographically distributed compute
The ADMET Network uses a multi-task directed message-passing neural network built on ChemProp, with a shared trunk and partner-specific heads. The trunk learns a molecular representation across all partners' proprietary chemistry (a much broader chemical space than any individual partner's library) with updates aggregated across partners and no raw structures leaving any partner's environment. Heads are trained on each partner's local endpoints and remain private, avoiding forced label harmonization onto endpoints that disagree at the protocol level.
ChemProp’s D-MPNN was chosen over transformer-based foundation models for two reasons:
First, it remains highly competitive on ADMET benchmarks. In Fang et al.’s prospective study, ChemProp with RDKit descriptors outperformed Random Forest in 113 of 120 splits and was joint-best overall with LightGBM. On Polaris, Goossens et al. showed that multi-task ChemProp outperformed XGBoost, PyBoost, and Random Forest on four of five endpoints.
Second, it is a better operational fit for federation. Federated training adds synchronisation, aggregation, and communication overhead on top of standard model training, and those costs increase with model size. A model that delivers strong ADMET performance on a single mid-tier GPU per partner is therefore more practical than a foundation model that requires each organisation to provision and maintain multi-node infrastructure.
If complementarity rather than volume drives gains, contribution rules need to be designed around chemical and assay-space diversity, endpoint comparability, and assay quality; not raw record counts.
Harmonise at the protocol level, not the label level. Units are aligned (e.g., transforming public solubility from log₁₀(μg/mL) to log₁₀(mol/L)), assay variants are catalogued, incompatible variants stay separate. Poor harmonization is worse than no harmonization.
Assess contribution quality before training. Data is sliced by scaffold, assay, and activity cliffs to establish modelability. Sanity and consistency checks (duplicates, censored values, label distribution shifts) are non-negotiable. A federated round that converges on poorly curated data does not produce a useful model, regardless of architecture.
Evaluate with scaffold-stratified, multi-seed protocols. Following Ash et al. (2025) on practically significant comparisons, models are evaluated across folds and seeds with repeated-measures ANOVA and Tukey HSD applied to the resulting distributions. A single Pearson on a single split tells you almost nothing about generalisation.
Assess privacy empirically. Every federated model undergoes membership inference and data reconstruction evaluations before release. A precondition for highly regulated environments, and for partner trust.
Predictions on novel scaffolds, where most early-stage triage decisions are made, degrade more gracefully than they would for a model trained on a single organization's chemistry. The applicability domain is broader and more uniform. Uncertainty is also better calibrated on novel chemistry. Schliephacke's multi-task MDR1-MDCK ER model halved overconfident predictions versus the internal-only baseline. ADMET risk surfaces earlier in the DMTA cycle. A model whose predictive reliability extends into the chemistry a discovery team is actively exploring can drop weak series before they consume synthesis capacity, and flag liability profiles before downstream optimization locks them in.
The literature establishes that federation can improve ADMET models when partners contribute complementary chemistry and related assay signal. But the field has largely stopped at the performance delta. The model improved, applicability domain expanded, calibration became more reliable. That is where most published analyses end.
The next progression is to explain why uplift occurs. That has not been done systematically in federated drug-discovery model building. MELLODDY demonstrated that cross-pharma federation can generate uplift, especially where partners share related assay structure and chemical signal, but it was not designed to provide partner-specific, mechanism-level explanations of why individual models improved or how those insights should influence the next training round.
This is where the Apheris ADMET Network is different. As the trusted model provider operating the privacy-preserving network, Apheris is uniquely positioned to interrogate federation across partners without exposing proprietary data. We can compare each partner model with analogous models from other partners and relate performance changes to the data characteristics behind them (endpoint difficulty, assay depth, scaffold coverage, reproducibility, modelability, censoring, and related-task structure).
That is what makes explainable federation possible for the first time in this setting. Some tasks may improve because difficult endpoints learn from easier related tasks. Others may benefit because adjacent ADMET readouts act as implicit data augmentation, shared chemistry improves manifold-level representation learning, or federated regularization makes partner-specific models less brittle. The important point is that these are no longer abstract hypotheses; the network gives us the privacy-preserving evidence base to test them.
The ADMET Network trains better models across distributed proprietary data, and it tells you why those models improved. That feedback impacts each next round: how endpoints are grouped, how assays are handled, how training is structured. Performance uplift becomes something we can act on instead of just reporting on it.
This is why no single organization can solve the problem alone. Explainable federation depends on comparable, complementary, partner-distributed evidence, not only more data, but the right cross-partner structure to reveal when, where, and why transfer occurs. ADMET modelling is moving away from a paradigm where progress follows from larger internal datasets and more expressive architectures. Access to complementary chemistry and assay coverage is the binding constraint, contributed under conditions that keep IP and commercial constraints intact. Pre-competitive federated data networks are the structural mechanism that makes that contribution possible. Organizations whose proprietary chemistry is rarely complementary enough to justify bilateral data sharing find that pooling across a broader consortium crosses the complementarity threshold the evidence demands.
If you'd like to discuss participation in the ADMET network, get insights into the early results, or explore what data contributions could look like for your organisation, get in touch.
Ash, J. R. et al. (2025). Practically Significant Method Comparison Protocols for Machine Learning in Small-Molecule Drug Discovery. J. Chem. Inf. Model. https://pubs.acs.org/doi/full/10.1021/acs.jcim.5c01609
Fang, C. et al. (2023). Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective. J. Chem. Inf. Model. 63(11), 3263–3274. https://pubmed.ncbi.nlm.nih.gov/37216672/
Goossens, K., Tricarico, G., Hofmans, J., Dréanic, M.-P., de Cesco, S. & Lenselink, E. B. (2025). ChemProp multi-task models for predicting ADME properties in the Polaris challenge. ChemRxiv (CC BY 4.0). https://chemrxiv.org/engage/chemrxiv/article-details/684883a31a8f9bdab5a99e10
Heyndrickx, W. et al. (2023). MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 63(7), 2331–2344. https://pubs.acs.org/doi/10.1021/acs.jcim.3c00799
Schliephacke, P., Kuhn, D. & Friedrich, L. (2025). Improving Absorption, Distribution, Metabolism, and Excretion Property Predictions by Integrating Public and Proprietary Data. ChemMedChem. https://chemistry-europe.onlinelibrary.wiley.com/doi/10.1002/cmdc.202500713
Sun, D., Gao, W., Hu, H. & Zhou, S. (2022). Why 90% of clinical drug development fails and how to improve it? Acta Pharmaceutica Sinica B 12(7), 3049–3062. https://doi.org/10.1016/j.apsb.2022.02.002
Wang, Y. et al. (2025). ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction. J. Cheminform. 17, 5. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-025-00947-z