Deployment

Synapse publishes three Docker images to Docker Hub. Choose the deployment pattern that fits your infrastructure — from a single docker run command to a fully autoscaled Kubernetes cluster.

Docker images

Image	Purpose
`synapseorchai/synapse-ai`	Management plane — combined UI + API server. Run exactly one of these. Manages definitions, hosts the UI, syncs to Postgres, enqueues jobs.
`synapseorchai/synapse-ai-worker`	Execution tier — no UI, no frontend. Scale this horizontally. Pulls jobs from Redis and executes orchestrations and agent chats.
`synapseorchai/synapse-ai-api-server`	API tier replica — same API as the main instance, no UI. Use behind a load balancer when the API tier itself becomes the bottleneck.

All three images are published together on every release and follow semantic versioning: synapseorchai/synapse-ai-worker:1.6.6 and synapseorchai/synapse-ai-worker:latest.

Option 1 — Docker Run

The fastest path to adding a worker to an existing Synapse install. Run this on any machine that can reach your Redis and Postgres instances.

Prerequisites

Docker installed
A running Redis instance (see quick start below)
A running Postgres instance

Start Redis (if you don't have one)

docker run -d \
  --name synapse-redis \
  -p 6379:6379 \
  -v synapse-redis-data:/data \
  redis:7-alpine \
  redis-server --appendonly yes

The --appendonly yes flag enables AOF persistence — your queue survives a Redis restart.

Start the worker

docker run -d \
  --name synapse-worker-1 \
  -e REDIS_URL="redis://your-redis-host:6379/0" \
  -e SCALE_POSTGRES_URL="postgresql://user:pass@your-pg-host:5432/synapse" \
  -e WORKER_CONCURRENCY=10 \
  -e WORKER_JOB_TIMEOUT=3600 \
  -e WORKER_MAX_RETRIES=3 \
  -p 9000:9000 \
  --restart unless-stopped \
  synapseorchai/synapse-ai-worker:latest

Verify the worker is healthy

curl http://localhost:9000/health

{
  "status": "ok",
  "worker_id": "worker-a1b2c3",
  "hostname": "my-server-01",
  "address": "http://10.0.0.5:9000",
  "active_jobs": 0,
  "uptime": 18.3
}

The worker appears in the Workers panel in Settings → Scale within 30 seconds of starting.

Use an env file

For cleaner deployments, put all configuration in a .env file:

# .env
REDIS_URL=redis://your-redis:6379/0
SCALE_POSTGRES_URL=postgresql://user:pass@your-pg:5432/synapse
WORKER_CONCURRENCY=10
WORKER_JOB_TIMEOUT=3600
WORKER_MAX_RETRIES=3
S3_BUCKET=my-synapse-bucket
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

docker run -d \
  --name synapse-worker-1 \
  --env-file .env \
  -p 9000:9000 \
  --restart unless-stopped \
  synapseorchai/synapse-ai-worker:latest

Scale to multiple workers

Run the same command on additional machines, adjusting the container name and health port:

# Machine 2
docker run -d --name synapse-worker-2 --env-file .env \
  -p 9000:9000 --restart unless-stopped \
  synapseorchai/synapse-ai-worker:latest

# Machine 3
docker run -d --name synapse-worker-3 --env-file .env \
  -p 9000:9000 --restart unless-stopped \
  synapseorchai/synapse-ai-worker:latest

All workers share the same Redis queue. Jobs are distributed automatically — no additional configuration is needed.

Option 2 — Docker Compose

The docker-compose.yml in the project root supports a scale profile that starts Redis and a worker alongside the existing backend and frontend services.

Prerequisites

Docker and Docker Compose v2+
The Synapse AI repository cloned locally
A Postgres instance (external — not provisioned by Docker Compose)

Configure environment

Copy the environment template and fill in the required values:

cp .env.docker .env

Open .env and set at minimum:

# Required for all deployments
SYNAPSE_INTERNAL_TOKEN=<output of: openssl rand -hex 32>

# Required for scale mode
SCALE_POSTGRES_URL=postgresql://user:pass@your-pg-host:5432/synapse

Start with the scale profile

docker compose --profile scale up -d

This starts four services:

Service	Purpose	Default port
`synapse-backend`	API server	`8765`
`synapse-frontend`	Web UI	`3000`
`synapse-redis`	Redis job queue (auto-provisioned with persistence)	`6379`
`synapse-worker`	ARQ worker	health on `9000`

Scale to multiple workers

docker compose --profile scale up -d --scale worker=4

note

When scaling beyond 1 replica, remove the container_name: synapse-worker line from docker-compose.yml. Docker Compose requires unique container names — with --scale, it generates names automatically (synapse-worker-1, synapse-worker-2, etc.).

Stop all services

docker compose --profile scale down

To also remove the Redis volume (deletes queued jobs):

docker compose --profile scale down -v

Enterprise profile — observability stack

docker compose --profile enterprise up -d

Adds full observability infrastructure:

Service	URL	Purpose
Jaeger	`http://localhost:16686`	Distributed tracing UI
Prometheus	`http://localhost:9090`	Metrics collection and alerting
Grafana	`http://localhost:3001`	Dashboards and visualisations
PgBouncer	`localhost:6432`	Postgres connection pooling proxy

After starting the enterprise profile, set OTLP_ENDPOINT=http://jaeger:4317 in your .env and restart to enable distributed tracing.

Option 3 — Kubernetes

Pre-built Kubernetes manifests live in infra/k8s/ in the repository. They are designed for production deployments with KEDA-based autoscaling and Postgres connection pooling.

Prerequisites

Kubernetes cluster 1.24+
kubectl configured and targeting the correct cluster
KEDA installed in the cluster (for worker autoscaling)
External Redis and Postgres instances accessible from inside the cluster

Install KEDA

kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.13.0/keda-2.13.0.yaml

Verify KEDA is running before proceeding:

kubectl get pods -n keda

Create secrets

kubectl create secret generic synapse-secrets \
  --from-literal=redis-url="redis://your-redis:6379/0" \
  --from-literal=postgres-url="postgresql://user:pass@your-pg:5432/synapse"

If using S3 for shared vault storage, add the credentials to the same secret:

kubectl create secret generic synapse-secrets \
  --from-literal=redis-url="redis://your-redis:6379/0" \
  --from-literal=postgres-url="postgresql://user:pass@your-pg:5432/synapse" \
  --from-literal=s3-access-key-id="AKIAIOSFODNN7EXAMPLE" \
  --from-literal=s3-secret-access-key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

Apply the manifests

kubectl apply -f infra/k8s/

This creates:

Manifest	What it deploys
`api-deployment.yaml`	`synapse-api` Deployment (3 replicas) + ClusterIP Service + HPA (scales 2–20 replicas at 70% CPU)
`worker-deployment.yaml`	`synapse-worker` Deployment (initial 2 replicas, overridden by KEDA ScaledObject) + Service
`worker-scaledobject.yaml`	KEDA ScaledObject — scales workers 1–100 based on Redis queue depth
`pgbouncer-deployment.yaml`	PgBouncer connection pooler Deployment + Service — proxies Postgres connections

Verify the deployment

kubectl get deployments
kubectl get pods
kubectl get scaledobjects

Check worker logs to confirm successful registration:

kubectl logs -l app=synapse-worker --tail=50

Check API server logs:

kubectl logs -l app=synapse-api --tail=50

Expose the API with an Ingress

The synapse-api Service is a ClusterIP on port 8765. Add an Ingress to expose it externally:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: synapse-ingress
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
  rules:
    - host: api.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: synapse-api
                port:
                  number: 8765

tip

The proxy-read-timeout and proxy-send-timeout annotations must be set to at least 3600 seconds. SSE connections for long-running orchestrations stay open for minutes or hours — the default nginx timeout of 60 seconds will prematurely close them.

Customising the manifests

Pin workers to a specific image version:

kubectl set image deployment/synapse-worker worker=synapseorchai/synapse-ai-worker:1.6.6

Temporarily fix worker count (disables KEDA autoscaling):

kubectl patch scaledobject synapse-worker-scaledobject \
  -p '{"spec":{"minReplicaCount":3,"maxReplicaCount":3}}'

Watch a rolling update in progress:

kubectl rollout status deployment/synapse-worker

Production checklist

Work through this list before going live with a scale deployment.

Infrastructure

Redis persistence — enable AOF (appendonly yes) on your Redis instance. Without persistence, a Redis restart drops all queued jobs.
Redis high availability — use Redis Sentinel or Redis Cluster for production. A single Redis node is a single point of failure for your entire job queue.
Postgres backups — the orchestration_runs and chat_sessions tables grow over time. Configure daily backups and set RUNS_RETENTION_DAYS to clean up old records automatically.
Postgres connection pooling — deploy PgBouncer in front of Postgres when running more than ~10 worker instances. Each worker holds up to WORKER_CONCURRENCY Postgres connections simultaneously.

Workers

At least 2 worker replicas — prevents a complete execution outage if one worker instance crashes.
KEDA autoscaling — queue-depth-based autoscaling is far more responsive than CPU-based for this workload. Use the provided worker-scaledobject.yaml.
Sufficient memory — workers load LLM model contexts, tool code, and MCP server processes into memory. Start with the 2Gi limit in the manifest and increase if you see OOM kills.
Graceful termination period — ensure terminationGracePeriodSeconds: 60 is set in the worker pod spec (it is in the provided manifest). This allows in-flight jobs to complete before the pod is terminated.

Shared storage

S3 for vault files — if your orchestrations read from or write to vault files, configure S3_BUCKET so all workers share the same storage. Without it, vault writes from one worker are invisible to the next.

Observability

OTLP tracing — set OTLP_ENDPOINT so every run generates a distributed trace. This is the most effective tool for debugging slow steps and cross-worker issues.
Prometheus metrics — configure METRICS_TOKEN and scrape GET /api/v2/metrics (API server) and GET /metrics on port 9000 (workers). The infra/prometheus.yml file has the scrape configuration.
Grafana dashboards — import the provided dashboards from infra/grafana/ to monitor queue depth, worker throughput, error rates, cost per run, and per-tenant usage.

Security

imagePullPolicy: Always — set on all Deployments (already set in the provided manifests) so new image versions roll out automatically on pod restart.
Network policy — restrict worker pods so they can only reach Redis, Postgres, S3, and external LLM API endpoints. Workers do not need to accept inbound connections except on port 9000.
Rotate SYNAPSE_INTERNAL_TOKEN — the internal token authenticates frontend-to-backend requests. Generate a new one (openssl rand -hex 32) and update it in your secrets on a regular schedule.

Upgrading workers

Docker Run

# Pull the new image
docker pull synapseorchai/synapse-ai-worker:latest

# Stop and remove the old container
docker stop synapse-worker-1 && docker rm synapse-worker-1

# Start a new container with the same configuration
docker run -d --name synapse-worker-1 --env-file .env \
  -p 9000:9000 --restart unless-stopped \
  synapseorchai/synapse-ai-worker:latest

Jobs that were in-flight when the old container stopped remain in the Redis ARQ queue with status queued and are picked up automatically by the new worker.

Docker Compose

docker compose --profile scale pull worker
docker compose --profile scale up -d worker

Docker Compose performs a stop-then-start replacement of the worker container. In-flight jobs are requeued automatically.

Kubernetes

# Update to a specific version tag
kubectl set image deployment/synapse-worker worker=synapseorchai/synapse-ai-worker:1.7.0

# Or, to pull the latest tag (requires imagePullPolicy: Always)
kubectl rollout restart deployment/synapse-worker

Watch the rolling update progress:

kubectl rollout status deployment/synapse-worker

Kubernetes performs a rolling update by default — new pods start and pass their readiness probe before old pods are terminated, resulting in zero downtime. In-flight jobs on terminating pods complete within terminationGracePeriodSeconds before the pod is force-killed.

Docker images​

Option 1 — Docker Run​

Prerequisites​

Start Redis (if you don't have one)​

Start the worker​

Verify the worker is healthy​

Use an env file​

Scale to multiple workers​

Option 2 — Docker Compose​

Prerequisites​

Configure environment​

Start with the scale profile​

Scale to multiple workers​

Stop all services​

Enterprise profile — observability stack​

Option 3 — Kubernetes​

Prerequisites​

Install KEDA​

Create secrets​

Apply the manifests​

Verify the deployment​

Expose the API with an Ingress​

Customising the manifests​

Production checklist​

Upgrading workers​

Docker Run​

Docker Compose​

Kubernetes​

Docker images

Option 1 — Docker Run

Prerequisites

Start Redis (if you don't have one)

Start the worker

Verify the worker is healthy

Use an env file

Scale to multiple workers

Option 2 — Docker Compose

Prerequisites

Configure environment

Start with the scale profile

Scale to multiple workers

Stop all services

Enterprise profile — observability stack

Option 3 — Kubernetes

Prerequisites

Install KEDA

Create secrets

Apply the manifests

Verify the deployment

Expose the API with an Ingress

Customising the manifests

Production checklist

Upgrading workers

Docker Run

Docker Compose

Kubernetes