Protenix-v1, a fully open-source co-folding model reported to reach AlphaFold3-level performance, is now available in ApherisFold. With Protenix-v1 now integrated in ApherisFold, teams can benchmark it side-by-side against OpenFold3 and Boltz-2 on in-house targets and datasets, and decide where it is reliable enough to support structure-informed choices within DMTA cycles.
Protenix-v1 is presented as the first fully open-source biomolecular structure prediction model to reach AlphaFold3-level performance under matched training data cutoff, model size, and inference budget. This matters because only under comparable training conditions can algorithmic performance be meaningfully evaluated, so observed differences are less likely to be driven by expanded or newer training datasets. Across multiple benchmarks and modalities, Protenix-v1 improves over representative public baselines (e.g., Boltz-1, Chai-1, HF3, and prior Protenix releases), including protein–protein, antibody–antigen, and protein–ligand settings. This is shown across:
Protein–protein docking
Antibody–antigen interface prediction
Protein–ligand co-folding
The paper also reports several concrete improvements relative to AlphaFold3 across key structural tasks. For example, Protenix-v1 achieves a 52.31% DockQ success rate for antibody–antigen interfaces, compared to 48.75% reported for AlphaFold3. For protein–protein docking, Protenix-v1 reports 72.70% DockQ success rate versus 71.73% for AlphaFold3. Improvements are also observed for protein–RNA complexes, where Protenix-v1 reaches 68.46% DockQ success rate compared to 65.22%, and for RNA monomer prediction, where Protenix-v1 reports 0.6547 lDDT versus 0.6140. Performance is similar in protein–ligand prediction, where AlphaFold3 reports 62.59% DockQ success rate compared to 62.54% for Protenix-v1, although Protenix-v1 slightly exceeds this when using four inference seeds. One area where AlphaFold3 remains stronger is protein–DNA docking, where it reports 75.91% DockQ success rate compared to 69.13% for Protenix-v1.
These results are reported using the expanded PXM-22to25 benchmark suites, which increase target coverage and reduce sensitivity to single-year dataset effects, providing a more stable basis for cross-model comparison in practical drug discovery settings.
1) Inference-time scaling behavior
One of the most relevant aspects for DMTA cycles is the documented inference-time scaling behavior. The paper shows that increasing the sampling budget leads to consistent improvements in DockQ success rate and lDDT, particularly for antibody–antigen complexes.
This establishes a predictable compute–accuracy trade-off. For difficult targets or ambiguous interfaces, teams can allocate additional sampling budget and obtain measurable improvements rather than relying on single-run stochastic outputs. In a program context, that makes structural evaluation more controllable and less dependent on chance variation between runs.
2) Year-stratified and expanded benchmarks
The authors introduce year-based evaluation suites (PXM-2024, PXM-2025, PXM-22to25-Antibody, PXM-22to25-Ligand) to reduce dataset bias and increase statistical power. Antibody–antigen evaluation is particularly sensitive to cluster sparsity and inconsistent subset reporting. The expanded antibody benchmark and bootstrapped variance reporting address this directly.
3) Practical feature parity
Beyond benchmark performance, Protenix-v1 includes components relevant for applied co-folding workloads and brings its inputs and training data construction closer to AlphaFold3-style pipelines while remaining fully open-source. Protenix-v1 integrates:
Protein template features
RNA MSA support
Expanded disorder-focused distillation
Large-scale monomer distillation (MGnify-based)
Together, these additions result in a model configuration that can leverage template information, RNA sequence context, and distillation-derived structural signal to improve robustness across diverse biomolecular inputs, without relying on proprietary components.
In addition to the strict cutoff model used for controlled comparison, the authors release Protenix-v1-20250630, trained on a more recent dataset to improve performance on newly released targets. This separation between benchmark-aligned and application-oriented variants reflects a practical distinction relevant for drug discovery programs: controlled evaluation versus maximum performance on contemporary structural space.
With Protenix-v1 now available alongside OpenFold3 and Boltz-2, teams can:
Benchmark models side-by-side on proprietary targets
Re-run the same evaluation setup as new checkpoints are released
Assess applicability across specific chemotypes and interface classes
Integrate predictions directly into generative design, screening, and prioritization workflows
Inspect and compare structural outputs within the same environment
For organizations operating under tight DMTA timelines, the objective is not simply higher benchmark scores. It is the ability to generate structure-informed decisions that are reproducible, comparable across model versions, and defensible when they influence compound selection or make/no-make calls. Protenix-v1 expands the model portfolio in ApherisFold with a high-performing open-source option that can now be evaluated and deployed under full IP control within real drug programs.