MSA Service🔗

The MSA (Multiple Sequence Alignment) service is an optional service of the Apheris Hub that automatically generates .a3m alignment files for protein chains.

What the MSA Service Does🔗

Whenever a protein chain needs an MSA and none has been provided, the Hub attempts to generate one automatically when MSA usage is enabled and an active server is available. Pre-computed alignment files can always be supplied instead, in which case the MSA service is not involved for those chains.

The service:

Detects protein chains missing an MSA (chains where molecule_type is protein, a sequence is present, and no alignment file is attached)
Creates one MSA job per chain, sends the sequence to the active MSA server, and stores the resulting .a3m file on the owning submission record
Updates submission execution state based on MSA job completion or failure

The MSA server is determined when jobs are created, based on the active server at that time. Changing the active server after submission does not affect jobs that are already in progress.

Configuration Paths and User Control🔗

Helm chart (operators): Configure and enable deployment-managed global MSA servers via hub.msa.*; see MSA Server Configuration.
Docker deployment script (operators): Configure the same hub.msa.* values in config.yaml, with optional APH_HUB_MSA_* environment variable overrides; see MSA Server Configuration.
Hub UI (users): Users do not create or edit server definitions. They can choose one of the administrator-configured servers, or disable MSA server usage and provide alignment files manually; see MSA Usage.

Supported Providers🔗

The MSA service supports two server types:

Provider	Type identifier	Notes
ColabFold	`colabfold`	Supports self-hosted deployments and public servers
NVIDIA NIM ColabFold	`nvidia-colabfold`	Requires a deployed NVIDIA NIM MSA Search service; see NIM MSA Server Setup

Both providers produce exactly one .a3m alignment file per protein chain.

ColabFold: Extracts the first .a3m file from the result archive. If no .a3m file is found, the job fails.
NVIDIA NIM ColabFold: Prefers the pre-merged "colabfold" alignment key from the response. If that key is missing or empty, NVIDIA NIM falls back to accepting a single unambiguous alignment key. If the response contains multiple ambiguous keys, the job fails.

MSA in the Prediction Flow🔗

When a prediction is submitted with protein chains that still need alignments, the Hub creates MSA jobs for those chains if automatic MSA usage is enabled and an active server is available. Until all MSA jobs complete, the request remains queued and appears as Pending on the Results page, then moves forward automatically. If automatic MSA usage is disabled or no active server is available, the required alignment files must be supplied manually instead. If any MSA job fails, the prediction fails with a message indicating which chain's MSA preparation failed.

MSA in the Benchmark Flow🔗

When a benchmark is submitted with structures that still need protein-chain alignments, the Hub creates the required MSA jobs if automatic MSA usage is enabled and an active server is available. The benchmark enters a preparing status while those jobs run. Once all MSA jobs complete successfully, the benchmark transitions to running and prediction requests are created for each structure. If automatic MSA usage is disabled or no active server is available, the required alignment files must already be attached to the benchmark structures. If any MSA job fails, the benchmark transitions to failed.

For benchmark structures that fail before a prediction request is created, the benchmark details show an explicit pre-request failure entry after reload. That entry identifies the failed query and chain, keeps the reported error message visible, and does not link to a prediction result page because no request exists for it. For benchmark structures that do spawn a prediction request, benchmark-visible metrics and failure details come from the request execution path after the request persists its inference-response data.

MSA in the Fine-tuning Flow🔗

Fine-tuning uses the MSA service during dataset generation, not when the fine-tuning run itself is created. When you start dataset generation, the Hub first checks the linked training and validation structures for protein chains that still need alignments.

If automatic MSA preparation is enabled and one or more protein chains are missing alignment content, the Hub creates structure-targeted MSA jobs for those chains and moves the fine-tuning run to preparing.
If all required alignments are already available, the Hub skips MSA preparation and starts dataset generation immediately.

When all fine-tuning MSA jobs complete successfully, the Hub starts dataset generation automatically and moves the fine-tuning run from preparing to validating. If any MSA job fails, the fine-tuning run transitions to failed. The fine-tuning details keep the latest MSA preparation failure visible after reload so you can inspect which query and chain failed together with the reported error.