Skip to main content

Deployment

Synapse publishes three Docker images to Docker Hub. Choose the deployment pattern that fits your infrastructure — from a single docker run command to a fully autoscaled Kubernetes cluster.


Docker images

ImagePurpose
synapseorchai/synapse-aiManagement plane — combined UI + API server. Run exactly one of these. Manages definitions, hosts the UI, syncs to Postgres, enqueues jobs.
synapseorchai/synapse-ai-workerExecution tier — no UI, no frontend. Scale this horizontally. Pulls jobs from Redis and executes orchestrations and agent chats.
synapseorchai/synapse-ai-api-serverAPI tier replica — same API as the main instance, no UI. Use behind a load balancer when the API tier itself becomes the bottleneck.

All three images are published together on every release and follow semantic versioning: synapseorchai/synapse-ai-worker:1.6.6 and synapseorchai/synapse-ai-worker:latest.


Option 1 — Docker Run

The fastest path to adding a worker to an existing Synapse install. Run this on any machine that can reach your Redis and Postgres instances.

Prerequisites

  • Docker installed
  • A running Redis instance (see quick start below)
  • A running Postgres instance

Start Redis (if you don't have one)

docker run -d \
--name synapse-redis \
-p 6379:6379 \
-v synapse-redis-data:/data \
redis:7-alpine \
redis-server --appendonly yes

The --appendonly yes flag enables AOF persistence — your queue survives a Redis restart.

Start the worker

docker run -d \
--name synapse-worker-1 \
-e REDIS_URL="redis://your-redis-host:6379/0" \
-e SCALE_POSTGRES_URL="postgresql://user:pass@your-pg-host:5432/synapse" \
-e WORKER_CONCURRENCY=10 \
-e WORKER_JOB_TIMEOUT=3600 \
-e WORKER_MAX_RETRIES=3 \
-p 9000:9000 \
--restart unless-stopped \
synapseorchai/synapse-ai-worker:latest

Verify the worker is healthy

curl http://localhost:9000/health
{
"status": "ok",
"worker_id": "worker-a1b2c3",
"hostname": "my-server-01",
"address": "http://10.0.0.5:9000",
"active_jobs": 0,
"uptime": 18.3
}

The worker appears in the Workers panel in Settings → Scale within 30 seconds of starting.

Use an env file

For cleaner deployments, put all configuration in a .env file:

# .env
REDIS_URL=redis://your-redis:6379/0
SCALE_POSTGRES_URL=postgresql://user:pass@your-pg:5432/synapse
WORKER_CONCURRENCY=10
WORKER_JOB_TIMEOUT=3600
WORKER_MAX_RETRIES=3
S3_BUCKET=my-synapse-bucket
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
docker run -d \
--name synapse-worker-1 \
--env-file .env \
-p 9000:9000 \
--restart unless-stopped \
synapseorchai/synapse-ai-worker:latest

Scale to multiple workers

Run the same command on additional machines, adjusting the container name and health port:

# Machine 2
docker run -d --name synapse-worker-2 --env-file .env \
-p 9000:9000 --restart unless-stopped \
synapseorchai/synapse-ai-worker:latest

# Machine 3
docker run -d --name synapse-worker-3 --env-file .env \
-p 9000:9000 --restart unless-stopped \
synapseorchai/synapse-ai-worker:latest

All workers share the same Redis queue. Jobs are distributed automatically — no additional configuration is needed.


Option 2 — Docker Compose

The docker-compose.yml in the project root supports a scale profile that starts Redis and a worker alongside the existing backend and frontend services.

Prerequisites

  • Docker and Docker Compose v2+
  • The Synapse AI repository cloned locally
  • A Postgres instance (external — not provisioned by Docker Compose)

Configure environment

Copy the environment template and fill in the required values:

cp .env.docker .env

Open .env and set at minimum:

# Required for all deployments
SYNAPSE_INTERNAL_TOKEN=<output of: openssl rand -hex 32>

# Required for scale mode
SCALE_POSTGRES_URL=postgresql://user:pass@your-pg-host:5432/synapse

Start with the scale profile

docker compose --profile scale up -d

This starts four services:

ServicePurposeDefault port
synapse-backendAPI server8765
synapse-frontendWeb UI3000
synapse-redisRedis job queue (auto-provisioned with persistence)6379
synapse-workerARQ workerhealth on 9000

Scale to multiple workers

docker compose --profile scale up -d --scale worker=4
note

When scaling beyond 1 replica, remove the container_name: synapse-worker line from docker-compose.yml. Docker Compose requires unique container names — with --scale, it generates names automatically (synapse-worker-1, synapse-worker-2, etc.).

Stop all services

docker compose --profile scale down

To also remove the Redis volume (deletes queued jobs):

docker compose --profile scale down -v

Enterprise profile — observability stack

docker compose --profile enterprise up -d

Adds full observability infrastructure:

ServiceURLPurpose
Jaegerhttp://localhost:16686Distributed tracing UI
Prometheushttp://localhost:9090Metrics collection and alerting
Grafanahttp://localhost:3001Dashboards and visualisations
PgBouncerlocalhost:6432Postgres connection pooling proxy

After starting the enterprise profile, set OTLP_ENDPOINT=http://jaeger:4317 in your .env and restart to enable distributed tracing.


Option 3 — Kubernetes

Pre-built Kubernetes manifests live in infra/k8s/ in the repository. They are designed for production deployments with KEDA-based autoscaling and Postgres connection pooling.

Prerequisites

  • Kubernetes cluster 1.24+
  • kubectl configured and targeting the correct cluster
  • KEDA installed in the cluster (for worker autoscaling)
  • External Redis and Postgres instances accessible from inside the cluster

Install KEDA

kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.13.0/keda-2.13.0.yaml

Verify KEDA is running before proceeding:

kubectl get pods -n keda

Create secrets

kubectl create secret generic synapse-secrets \
--from-literal=redis-url="redis://your-redis:6379/0" \
--from-literal=postgres-url="postgresql://user:pass@your-pg:5432/synapse"

If using S3 for shared vault storage, add the credentials to the same secret:

kubectl create secret generic synapse-secrets \
--from-literal=redis-url="redis://your-redis:6379/0" \
--from-literal=postgres-url="postgresql://user:pass@your-pg:5432/synapse" \
--from-literal=s3-access-key-id="AKIAIOSFODNN7EXAMPLE" \
--from-literal=s3-secret-access-key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

Apply the manifests

kubectl apply -f infra/k8s/

This creates:

ManifestWhat it deploys
api-deployment.yamlsynapse-api Deployment (3 replicas) + ClusterIP Service + HPA (scales 2–20 replicas at 70% CPU)
worker-deployment.yamlsynapse-worker Deployment (initial 2 replicas, overridden by KEDA ScaledObject) + Service
worker-scaledobject.yamlKEDA ScaledObject — scales workers 1–100 based on Redis queue depth
pgbouncer-deployment.yamlPgBouncer connection pooler Deployment + Service — proxies Postgres connections

Verify the deployment

kubectl get deployments
kubectl get pods
kubectl get scaledobjects

Check worker logs to confirm successful registration:

kubectl logs -l app=synapse-worker --tail=50

Check API server logs:

kubectl logs -l app=synapse-api --tail=50

Expose the API with an Ingress

The synapse-api Service is a ClusterIP on port 8765. Add an Ingress to expose it externally:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: synapse-ingress
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
rules:
- host: api.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: synapse-api
port:
number: 8765
tip

The proxy-read-timeout and proxy-send-timeout annotations must be set to at least 3600 seconds. SSE connections for long-running orchestrations stay open for minutes or hours — the default nginx timeout of 60 seconds will prematurely close them.

Customising the manifests

Pin workers to a specific image version:

kubectl set image deployment/synapse-worker worker=synapseorchai/synapse-ai-worker:1.6.6

Temporarily fix worker count (disables KEDA autoscaling):

kubectl patch scaledobject synapse-worker-scaledobject \
-p '{"spec":{"minReplicaCount":3,"maxReplicaCount":3}}'

Watch a rolling update in progress:

kubectl rollout status deployment/synapse-worker

Production checklist

Work through this list before going live with a scale deployment.

Infrastructure

  1. Redis persistence — enable AOF (appendonly yes) on your Redis instance. Without persistence, a Redis restart drops all queued jobs.
  2. Redis high availability — use Redis Sentinel or Redis Cluster for production. A single Redis node is a single point of failure for your entire job queue.
  3. Postgres backups — the orchestration_runs and chat_sessions tables grow over time. Configure daily backups and set RUNS_RETENTION_DAYS to clean up old records automatically.
  4. Postgres connection pooling — deploy PgBouncer in front of Postgres when running more than ~10 worker instances. Each worker holds up to WORKER_CONCURRENCY Postgres connections simultaneously.

Workers

  1. At least 2 worker replicas — prevents a complete execution outage if one worker instance crashes.
  2. KEDA autoscaling — queue-depth-based autoscaling is far more responsive than CPU-based for this workload. Use the provided worker-scaledobject.yaml.
  3. Sufficient memory — workers load LLM model contexts, tool code, and MCP server processes into memory. Start with the 2Gi limit in the manifest and increase if you see OOM kills.
  4. Graceful termination period — ensure terminationGracePeriodSeconds: 60 is set in the worker pod spec (it is in the provided manifest). This allows in-flight jobs to complete before the pod is terminated.

Shared storage

  1. S3 for vault files — if your orchestrations read from or write to vault files, configure S3_BUCKET so all workers share the same storage. Without it, vault writes from one worker are invisible to the next.

Observability

  1. OTLP tracing — set OTLP_ENDPOINT so every run generates a distributed trace. This is the most effective tool for debugging slow steps and cross-worker issues.
  2. Prometheus metrics — configure METRICS_TOKEN and scrape GET /api/v2/metrics (API server) and GET /metrics on port 9000 (workers). The infra/prometheus.yml file has the scrape configuration.
  3. Grafana dashboards — import the provided dashboards from infra/grafana/ to monitor queue depth, worker throughput, error rates, cost per run, and per-tenant usage.

Security

  1. imagePullPolicy: Always — set on all Deployments (already set in the provided manifests) so new image versions roll out automatically on pod restart.
  2. Network policy — restrict worker pods so they can only reach Redis, Postgres, S3, and external LLM API endpoints. Workers do not need to accept inbound connections except on port 9000.
  3. Rotate SYNAPSE_INTERNAL_TOKEN — the internal token authenticates frontend-to-backend requests. Generate a new one (openssl rand -hex 32) and update it in your secrets on a regular schedule.

Upgrading workers

Docker Run

# Pull the new image
docker pull synapseorchai/synapse-ai-worker:latest

# Stop and remove the old container
docker stop synapse-worker-1 && docker rm synapse-worker-1

# Start a new container with the same configuration
docker run -d --name synapse-worker-1 --env-file .env \
-p 9000:9000 --restart unless-stopped \
synapseorchai/synapse-ai-worker:latest

Jobs that were in-flight when the old container stopped remain in the Redis ARQ queue with status queued and are picked up automatically by the new worker.

Docker Compose

docker compose --profile scale pull worker
docker compose --profile scale up -d worker

Docker Compose performs a stop-then-start replacement of the worker container. In-flight jobs are requeued automatically.

Kubernetes

# Update to a specific version tag
kubectl set image deployment/synapse-worker worker=synapseorchai/synapse-ai-worker:1.7.0

# Or, to pull the latest tag (requires imagePullPolicy: Always)
kubectl rollout restart deployment/synapse-worker

Watch the rolling update progress:

kubectl rollout status deployment/synapse-worker

Kubernetes performs a rolling update by default — new pods start and pass their readiness probe before old pods are terminated, resulting in zero downtime. In-flight jobs on terminating pods complete within terminationGracePeriodSeconds before the pod is force-killed.