Skip to content

Apheris Hub Kubernetes Deployment🔗

This guide covers deploying and configuring Apheris Hub on Kubernetes using Helm.

Prerequisites🔗

  • A Kubernetes cluster in a recent version (>= 1.30)
  • Helm CLI (>= v3)
  • A PostgreSQL database
  • A Storage Provisioner that supports ReadWriteMany and ReadWriteOnce accessModes
  • An Ingress or Gateway Controller that enables network access to the Kubernetes cluster
  • NVIDIA GPU support for GPU workloads on the Kubernetes cluster (required to run OpenFold3, Boltz-2, and Protenix)

1. Create a Namespace for the Apheris Hub🔗

kubectl create namespace apheris-hub

2. Request the Apheris Hub API Key🔗

You need the Apheris Hub API Key to pull model images from the Apheris image registry.

You can skip this step if you host the Apheris model images in private repositories.

Request your Apheris Hub API Key from https://www.apheris.com/applications/apherisfold or contact support@apheris.com and set it to your Helm values file:

apherisApiKey: "your-apheris-api-key"

hub:
  msa:
    enabled: true

3. Add a secret with a PostgreSQL DSN🔗

You can add a DSN for an existing PostgreSQL database with:

kubectl create secret generic hub-db-dsn --from-literal=dsn=<existing_dsn> \
  --namespace=apheris-hub

We recommend using the managed database offering of your cloud provider in its PostgreSQL flavor, for instance Amazon RDS for PostgreSQL (AWS), Google Cloud SQL for PostgreSQL (GCP) or Azure Database for PostgreSQL (Microsoft Azure).

4. Create apheris-hub-values.yaml with values for the helm release🔗

Find the complete values reference at Helm Chart Values Reference.

The following are lightly annotated values with placeholders:

# refer to section `2.` for the apherisApiKey
apherisApiKey: "your-apheris-api-key"

hub:
  postgresDsnSecretName: hub-db-dsn

  ingress:
    className: <ingress_class_name>
    hostname: <ingress_hostname>

    # TLS Termination at the ingress controller level.
    tls:
      enabled: <true|false>

      # Name of a secret that contains the certificate material in the
      # format documented in https://kubernetes.io/docs/concepts/services-networking/ingress/#tls
      secretName: <tls_secret_name>

models:
  persistence:

    # The provisioner for this `storageClass` needs to
    # support `ReadWriteMany` `accessMode`.
    #
    storageClass: <storage_class_that_supports_read_write_many>

    # This is the space available for
    # persisting prediction results.
    #
    # `50Gi` is the minimum size (and default)
    # we recommend `500Gi` if you can make that happen.
    #
    size: 50Gi

  # The `mock` model is the only default model that the chart
  # deploys with the default values.
  #
  # You can enable deployment of other default models by setting
  # `deploy.enabled=true`, so for instance
  # `models.instances.boltz2.deploy.enabled=true` or
  # `models.instances.openfold3.deploy.enabled=true` or
  # `models.instances.protenix.deploy.enabled=true`.
  #
  instances:
    boltz2:
      deploy:
        enabled: <true|false>
    mock:
      deploy:
        enabled: <true|false>
    openfold3:
      deploy:
        enabled: <true|false>
    protenix:
      deploy:
        enabled: <true|false>

Capabilities and scopes🔗

models.instances.<name>.deploy.capabilities sets the scopes available for that model deployment. Supported values are inference, which covers prediction and benchmarking, and finetuning.

OpenFold3 supports both scopes, while Boltz-2 and Protenix support inference only. For custom weights, set model_scope on each weight entry so the Hub can determine whether that weight supports inference (prediction and benchmarking), finetuning, or both.

Deploying different instances of a model with different scopes🔗

If you wish, you can deploy different instances of a model with different scopes. This allows for separation of concerns and avoiding having fine-tuning runs block the prediction or benchmarking jobs.

To do that, add a new entry under models.instances that points to an existing model. For example, if you would like to have an instance of OpenFold 3 for fine-tuning and another for inference, do:

models:
  # ...
  instances:
    openfold3:
      deploy:
        enabled: true
      capabilities:
        - inference
    openfold3-ft:
      id: openfold3-ft
      model: openfold3 # same value as models.instances.openfold3.model
      deploy:
        enabled: true
        port: 8000 # same value as models.instances.openfold3.deploy.port, unless you choose otherwise
        capabilities:
          - finetuning
        image: ... # same value as models.instances.openfold3.deploy.image, unless you choose otherwise
      # ... include also the other properties that are set by default for models.instances.openfold3 (see Helm Chart Values Reference)

Authentication and Identity Providers🔗

Set the hub.auth.* values to match your identity provider and frontend configuration. The Authentication Setup guide explains the requirements for Auth0, Microsoft Entra, and Dex and shows how those settings map back to Helm values.

Custom CA Certificates🔗

If your identity provider or external services use TLS certificates signed by a custom Certificate Authority, configure the Hub to trust those certificates. The file will be mounted to /etc/ssl/certs/custom-ca.crt in the Hub container and automatically trusted alongside system CAs.

Create the ConfigMap🔗

Create or update a ConfigMap with your CA certificate (safe to re-run):

kubectl create configmap custom-ca-certs \
  --from-file=ca.crt=/path/to/your-ca.crt \
  -n apheris-hub --dry-run=client -o yaml | kubectl apply -f -

Mount the certificate🔗

Add to your values file to mount the custom CA into the Hub container:

hub:
  extraVolumes:
    - name: custom-ca
      configMap:
        name: custom-ca-certs

  extraVolumeMounts:
    - name: custom-ca
      mountPath: /etc/ssl/certs/custom-ca.crt
      subPath: ca.crt
      readOnly: true

Apply the Helm upgrade🔗

Re-run your helm upgrade command so the pod picks up the new mount.

Verify the ConfigMap🔗

The Hub Docker image is based on scratch, so kubectl exec and kubectl cp will not work but you can always validate that the CA data stored in the ConfigMap:

# Read the CA data from the ConfigMap into a local file
kubectl get configmap custom-ca-certs -n apheris-hub -o jsonpath='{.data.ca\.crt}' > /tmp/custom-ca.crt
# Inspect the certificate content
openssl x509 -in /tmp/custom-ca.crt -noout -subject -issuer

This confirms the CA content that will be mounted into the Hub pod.

Verify the live mount (optional)🔗

If your cluster allows ephemeral debug containers, you can examine the mounted file without changing the pod:

# Print container name(s) in the Hub pod (needed for --target)
kubectl get pod -n apheris-hub <hub-pod-name> -o jsonpath='{.spec.containers[*].name}'
# Start a debug container and read the mounted CA from the target container's root
kubectl debug -n apheris-hub -it pod/<hub-pod-name> --image=alpine:3.19 --target=<container-name> -- \
  sh -c "cat /proc/1/root/etc/ssl/certs/custom-ca.crt" > /tmp/custom-ca.crt
# Inspect the certificate content copied from the pod
openssl x509 -in /tmp/custom-ca.crt -noout -subject -issuer

If debug containers are blocked by policy, rely on the ConfigMap check above and look for certificate-related errors in logs:

# Print container name(s) in the Hub pod (needed for -c)
kubectl get pod -n apheris-hub <hub-pod-name> -o jsonpath='{.spec.containers[*].name}'
# Print Hub container logs and filter for TLS/certificate errors
kubectl logs -n apheris-hub deployment/<hub-deployment-name> -c <container-name> | \
  grep -i "certificate\|tls\|x509"

MSA Server Configuration🔗

MSA servers are deployment-managed and global. Administrators define them in Helm values, and users can only select one of the configured servers (or opt out and upload .a3m files manually).

Supported MSA server types:

Provider Type identifier Notes
ColabFold colabfold Supports self-hosted deployments and public servers
NVIDIA NIM ColabFold nvidia-colabfold Requires a deployed NVIDIA NIM MSA Search service

The hub.msa.* timeout values only affect ColabFold and NVIDIA NIM ColabFold deployments:

hub:
  msa:
    enabled: true
    # How often to check if the job is done (PENDING → RUNNING → COMPLETE)
    pollInterval: "10s"   # Lower = faster feedback, more API calls
    # Per-request HTTP timeout for submit/status/download calls
    requestTimeout: "10m" # Increase for slow networks or large downloads

When hub.msa.enabled=true, you must configure hub.msa.servers with at least one server. Use defaultActive: true on exactly one server to define the deployment-level fallback server:

hub:
  msa:
    enabled: true
    servers:
      - name: "Public ColabFold"
        type: colabfold
        url: "https://api.colabfold.com"
        defaultActive: true
        config: {}
      - name: "NVIDIA ColabFold"
        type: nvidia-colabfold
        url: "https://api.nim.example.com"
        defaultActive: false
        config:
          numberOfSequences: "500"
          eValue: "0.0001"
          databases:
            - "Uniref30_2302"
        headers:
          - name: "X-Api-Key"
            valueFrom:
              secretKeyRef:
                name: "msa-auth"
                key: "api-key"
          - name: "X-Client-Id"
            valueFrom:
              configMapKeyRef:
                name: "msa-shared-config"
                key: "client-id"
          - name: "X-Source"
            value: "hub"

defaultActive is not a per-user preference. It is used by default for new users, and as fallback when a stored active selection cannot be resolved (for example after server removal or URL-identity change during deployment sync).

If a user explicitly disabled MSA usage, fallback is not applied for that user.

MSA Server Headers🔗

Use hub.msa.servers[].headers to send provider-specific headers (for example API keys, client IDs, or metadata) with every request to that server.

When possible, source sensitive values from Kubernetes Secrets:

hub:
  msa:
    enabled: true
    servers:
      - name: "NVIDIA ColabFold"
        type: nvidia-colabfold
        url: "https://api.nim.example.com"
        config:
          numberOfSequences: "500"
        headers:
          - name: "X-Api-Key"
            valueFrom:
              secretKeyRef:
                name: "msa-auth"
                key: "api-key"
          - name: "X-Client-Id"
            valueFrom:
              configMapKeyRef:
                name: "msa-shared-config"
                key: "client-id"
          - name: "X-Source"
            value: "hub"

Troubleshooting ColabFold:

  • "Failed to check job status" errors → Increase requestTimeout
  • Want faster progress updates → Decrease pollInterval (minimum ~3s recommended)

5. Install the helm release🔗

helm install apheris-hub oci://quay.io/apheris/hub-chart \
  --namespace=apheris-hub \
  --values=apheris-hub-values.yaml \
  --wait \
  --timeout=15m

6. Access the Apheris Hub installation🔗

You can now access your Apheris Hub installation via the configured ingress. For most setups, the external hostname will be the value you configured under hub.ingress.hostname.

Please do not hesitate to contact Apheris via e-mail in case you encounter any problems.

Helm Chart Values Reference🔗

Key Type Default Description
apherisApiKey string nil Apheris API key for queries to Apheris hosted MSA servers and access to Apheris hosted container images
hub.affinity object {} Affinity rules
hub.auth object {"audience":"","browserUrl":"","clientId":"","domain":"","enabled":false,"extraScopes":"","issuer":"","providerType":""} Authentication configuration (OIDC/Auth0/ForgeRock)
hub.auth.providerType string "" Provider type (supported values: "auth0", "forgerock", or empty string for generic OIDC)
hub.enabled bool true Enable Hub deployment (set to false for models-only release)
hub.env list [] Additional environment variables
hub.extraVolumeMounts list [] Extra volume mounts (e.g., for custom CA certificates)
hub.extraVolumes list [] Extra volumes (e.g., for custom CA certificates)
hub.finetuningHeartbeatTimeout string "5m" Finetuning heartbeat timeout. Example values: "5m", "300s".
hub.image.digest string nil Image digest (sha256).
hub.image.pullPolicy string "IfNotPresent" Image pull policy
hub.image.repository string "quay.io/apheris/hub" Container image repository
hub.image.tag string nil Overrides the image tag whose default is the chart appVersion
hub.imagePullSecrets list [] Image pull secrets for private registries
hub.ingress object {"annotations":{},"className":"","enabled":true,"existingGatewayName":"","gatewayNamespace":"","hostname":"","ingressPath":"/","tls":{"enabled":false,"secretName":""},"type":"ingress"} Ingress configuration (common for both Gateway API and Ingress resources)
hub.ingress.annotations object {} Additional annotations
hub.ingress.className string "" Ingress/Gateway class name
hub.ingress.enabled bool true Enable ingress (Gateway API or Ingress resource)
hub.ingress.existingGatewayName string "" Existing gateway name (if not set, a new gateway will be created)
hub.ingress.gatewayNamespace string "" Gateway namespace (if different from release namespace)
hub.ingress.hostname string "" Hostname for ingress
hub.ingress.ingressPath string "/" Ingress path
hub.ingress.tls object {"enabled":false,"secretName":""} TLS configuration
hub.ingress.tls.enabled bool false Enable TLS
hub.ingress.tls.secretName string "" TLS certificate secret name
hub.ingress.type string "ingress" Networking type (gateway, ingress)
hub.msa object {"enabled":false,"pollInterval":"5s","requestTimeout":"5m","servers":[]} MSA server configuration
hub.msa.enabled bool false Enable MSA server configuration
hub.msa.pollInterval string "5s" How frequently the application checks the status of a submitted MSA job on the ColabFold server (e.g., "5s", "10s").
hub.msa.requestTimeout string "5m" The timeout for each individual HTTP request made to the ColabFold server (e.g., "5m", "10m").
hub.msa.servers list [] Globally configured MSA servers. At least one server is required when MSA is enabled. defaultActive: true marks the deployment-level fallback server used for new users and when a stored active selection no longer resolves (for example after server removal or URL-identity change). At most one server can be marked defaultActive: true.
hub.nodeSelector object {} Node selector
hub.persistence object {"accessMode":"ReadWriteOnce","annotations":{},"enabled":true,"existingVolumeName":null,"size":"5Gi","storageClass":""} Persistence configuration
hub.persistence.accessMode string "ReadWriteOnce" Access mode for state PVC
hub.persistence.annotations object {} Annotations for state PVC
hub.persistence.enabled bool true Enable state persistence
hub.persistence.existingVolumeName string nil Existing PersistentVolume to bind to. If null, a new one will be dynamically created.
hub.persistence.size string "5Gi" Size of state PVC
hub.persistence.storageClass string "" Storage class for state PVC
hub.podAnnotations object {} Pod annotations
hub.podLabels object {} Pod labels
hub.podSecurityContext object {"fsGroup":65534,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534,"seccompProfile":{"type":"RuntimeDefault"}} Pod security context
hub.postgresDsnSecretName string nil Name of a kubernetes secret containing a postgres DSN, needs a key dsn
hub.replicaCount int 1 Number of replicas for the Hub deployment
hub.securityContext object {"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":true,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534} Container security context
hub.service object {"annotations":{},"port":8080,"type":"ClusterIP"} Service configuration
hub.service.annotations object {} Service annotations
hub.service.port int 8080 Service port
hub.service.type string "ClusterIP" Service type
hub.terminationGracePeriodSeconds int 30 Termination grace period in seconds
hub.tolerations list [] Tolerations
labels object {} Additional labels for every object created by the chart
models.imagePullRegistry string "quay.io/apheris" Registry for image pulls
models.imagePullSecrets list [] Secrets for image pulls
models.instances.boltz2 object {"deploy":{"affinity":{},"capabilities":["inference"],"enabled":false,"env":[],"extraVolumeMounts":[],"extraVolumes":[],"image":"quay.io/apheris/hub-apps:0.49.0-boltz2-by-file","nodeSelector":{},"podSecurityContext":{"fsGroup":65534,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534,"seccompProfile":{"type":"RuntimeDefault"}},"port":8000,"resources":{"limits":{"cpu":"8","memory":"64Gi","nvidia.com/gpu":1},"requests":{"cpu":"8","memory":"64Gi","nvidia.com/gpu":1}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":true,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534},"shmSize":"16Gi","tolerations":[{"effect":"NoSchedule","key":"nvidia.com/gpu","operator":"Equal","value":"true"}]},"id":"boltz2","model":"boltz2"} boltz2 requires a GPU with 40GB of memory, not enabled by default
models.instances.boltz2.deploy.capabilities list ["inference"] Model scopes for this deployment. Supported values: inference.
models.instances.boltz2.deploy.enabled bool false enable boltz2 with enabled: true
models.instances.boltz2.deploy.env list [] Additional environment variables for custom weights configuration
models.instances.boltz2.deploy.extraVolumeMounts list [] Extra volume mounts for custom weights
models.instances.boltz2.deploy.extraVolumes list [] Extra volumes for custom weights
models.instances.mock object {"deploy":{"affinity":{},"capabilities":["inference","finetuning"],"enabled":true,"env":[],"extraVolumeMounts":[],"extraVolumes":[],"image":"quay.io/apheris/hub-apps:0.49.0-mock-by-file","nodeSelector":{},"podSecurityContext":{"fsGroup":65534,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534,"seccompProfile":{"type":"RuntimeDefault"}},"port":8000,"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":true,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534}},"id":"mock","model":"mock"} lightweight mock model that does not require a GPU, enabled by default
models.instances.mock.deploy.capabilities list ["inference","finetuning"] Model scopes for this deployment. Supported values: inference, finetuning.
models.instances.mock.deploy.env list [] Additional environment variables for custom weights configuration
models.instances.mock.deploy.extraVolumeMounts list [] Extra volume mounts for custom weights
models.instances.mock.deploy.extraVolumes list [] Extra volumes for custom weights
models.instances.openfold3 object {"deploy":{"affinity":{},"capabilities":["inference","finetuning"],"enabled":false,"env":[],"extraVolumeMounts":[],"extraVolumes":[],"image":"quay.io/apheris/hub-apps:0.49.0-openfold3-by-file","nodeSelector":{},"podSecurityContext":{"fsGroup":65534,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534,"seccompProfile":{"type":"RuntimeDefault"}},"port":8000,"resources":{"limits":{"cpu":"8","memory":"64Gi","nvidia.com/gpu":1},"requests":{"cpu":"8","memory":"64Gi","nvidia.com/gpu":1}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":true,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534},"shmSize":"16Gi","tolerations":[{"effect":"NoSchedule","key":"nvidia.com/gpu","operator":"Equal","value":"true"}]},"id":"openfold3","model":"openfold3"} openfold3 requires a GPU with 40GB of memory, not enabled by default
models.instances.openfold3.deploy.capabilities list ["inference","finetuning"] Model scopes for this deployment. Supported values: inference, finetuning.
models.instances.openfold3.deploy.enabled bool false enable openfold3 with enabled: true
models.instances.openfold3.deploy.env list [] Additional environment variables for custom weights configuration
models.instances.openfold3.deploy.extraVolumeMounts list [] Extra volume mounts for custom weights
models.instances.openfold3.deploy.extraVolumes list [] Extra volumes for custom weights
models.instances.protenix object {"deploy":{"affinity":{},"capabilities":["inference"],"enabled":false,"env":[],"extraVolumeMounts":[],"extraVolumes":[],"image":"quay.io/apheris/hub-apps:0.49.0-protenix-by-file","nodeSelector":{},"podSecurityContext":{"fsGroup":65534,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534,"seccompProfile":{"type":"RuntimeDefault"}},"port":8000,"resources":{"limits":{"cpu":"8","memory":"64Gi","nvidia.com/gpu":1},"requests":{"cpu":"8","memory":"64Gi","nvidia.com/gpu":1}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":true,"runAsGroup":65534,"runAsNonRoot":true,"runAsUser":65534},"shmSize":"16Gi","tolerations":[{"effect":"NoSchedule","key":"nvidia.com/gpu","operator":"Equal","value":"true"}]},"id":"protenix","model":"protenix"} protenix requires a GPU with 40GB of memory, not enabled by default
models.instances.protenix.deploy.capabilities list ["inference"] Model scopes for this deployment. Supported values: inference.
models.instances.protenix.deploy.enabled bool false enable protenix with enabled: true
models.instances.protenix.deploy.env list [] Additional environment variables for custom weights configuration
models.instances.protenix.deploy.extraVolumeMounts list [] Extra volume mounts for custom weights
models.instances.protenix.deploy.extraVolumes list [] Extra volumes for custom weights
models.persistence object {"accessMode":"ReadWriteMany","annotations":{},"enabled":true,"existingVolumeName":null,"size":"50Gi","storageClass":""} Artifacts persistence (input/output)
models.persistence.accessMode string "ReadWriteMany" Access mode for artifacts PVC
models.persistence.annotations object {} Annotations for artifacts PVC
models.persistence.enabled bool true Enable artifacts persistence via PVC
models.persistence.existingVolumeName string nil Existing PersistentVolume to bind to. If null, a new one will be dynamically created.
models.persistence.size string "50Gi" Size of artifacts PVC
models.persistence.storageClass string "" Storage class for artifacts PVC