The goal of this experiment was to assess whether fine-tuning the public OpenFold3 model on a very small set of protein–ligand structures can meaningfully improve prediction quality for a specific drug discovery context, without degrading general performance. We focused on human phosphodiesterase 10A (PDE10A), a target class where pose accuracy and SAR alignment are critical for design decisions. This short blog documents a concrete fine-tuning experiment on human phosphodiesterase 10A (PDE10A). The goal is to:
show that small numbers of liganded structures can materially increase the capabilities of co-folding models, and
provide enough detail that a structural modelling team could reproduce or extend the experiment in their own stack.
The practical question was:
Can we correct systematic pose errors of OpenFold3 for a specific target and chemotype using only a handful of protein–ligand complexes?
We chose to evaluate OF3 finetuning on a subset of 27 structures from the PDE10A dataset published by Roche in 2022, as it met the two key requirements:
Held-out to the OF3 training set (since published after the training cutoff in 2021)
The base OF3 model was performing poorly on these structures
We treated this as a realistic “low-n” fine-tuning scenario:
Target: human PDE10A
Model: OpenFold3 (public weights as of late 2025)
Hardware: single NVIDIA H100 GPU
Total wall-clock for fine-tuning: ~20 hours
Training set (10 complexes, used for fine-tuning) PDB IDs:
5SDY, 5SIQ, 5SI7, 5SIG, 5SI5, 5SI8, 5SIY, 5SG5, 5SGL, 5SIH
Evaluation set (17 held-out complexes, no gradient updates) PDB IDs:
5SH0, 5SE0, 5SHR, 5SJL, 5SH8, 5SF4, 5SFG, 5SE5, 5SHK, 5SEE, 5SFL, 5SJU, 5SKE, 5SKU, 5SKO, 5SEA, 5SKR
We followed a minimal, single-target fine-tuning setup:
Starting weights: public OpenFold3
Objective: standard OpenFold3 structure loss on protein–ligand complexes
Steps: ~350 gradient steps over the 10 training complexes
Optimiser / schedule: Decreased EMA decay factor to 0.99 from default 0.999 to allow meaningful model change in a small number of gradient steps; decreased learning rate warmup steps from 1000 to 50; decreased learning rate from 0.0018 to 0.0003
MSAs / templates: as in baseline OpenFold3, but without use of templates
All training was run via ApherisFold in a private environment, with the PDE10A complexes staying inside the organisation’s own infrastructure.
For each complex in the 17-structure evaluation set we:
Ran 5 independent co-folding samples with the baseline OpenFold3 model.
Ran 5 independent co-folding samples with the fine-tuned model.
For each model, selected the highest-confidence sample according to the model’s internal confidence (plDDT).
Computed the following metrics vs. the experimental structure:
global GDT
intra-protein lDDT
intra-ligand lDDT
protein–ligand interface lDDT
DockQ (interface-focused composite score)
The bar plot (baseline vs fine-tuned across these metrics, with error bars across the 17 structures) summarises the results.
Qualitatively:
For 5SH8, the baseline model places the ligand in a wrong pose relative to the pocket; key ring systems are mis-registered and several interactions are missing.
The fine-tuned model produces a pose that overlays well with the crystal structure and recovers the expected interaction pattern.
Inspection shows that the fine-tuned model appears to combine elements of two training complexes (notably 5SDY and 5SI7) to achieve the correct binding mode.
The complex 5SH8 (see purple structure) was used as a central qualitative example:
Pink: baseline OpenFold3 prediction (incorrect pose)
Yellow: the significantly more accurate prediction of the finetuned model
Purple: the reference structure of complex 5SH8
Green and grey: two most similar structures to 5SH8 (based on spatial overlap) used for fine-tuning (5SDY and 5SI7) to the purple reference structure of the held-out structure. These two training structures are particularly close analogues and illustrate how the fine-tuned model “interpolates” between known binding modes when predicting 5SH8.
Quantitatively (across all 17 held-out complexes):
All structural metrics showed a clear shift in favour of the fine-tuned model.
Improvements were most pronounced at the protein–ligand interface (interface lDDT and DockQ), which is exactly where medicinal chemists care about accuracy.
Global protein metrics (GDT, intra-protein lDDT) improved slightly but were already high; the main gain was correcting ligand orientation and local interactions.
Overall, fine-tuning on 10 carefully chosen complexes was enough to move predictions for this chemotype from qualitatively unreliable to experimentally plausible for decision support.