Deployment
Synapse publishes three Docker images to Docker Hub. Choose the deployment pattern that fits your infrastructure — from a single docker run command to a fully autoscaled Kubernetes cluster.
Docker images
| Image | Purpose |
|---|---|
synapseorchai/synapse-ai | Management plane — combined UI + API server. Run exactly one of these. Manages definitions, hosts the UI, syncs to Postgres, enqueues jobs. |
synapseorchai/synapse-ai-worker | Execution tier — no UI, no frontend. Scale this horizontally. Pulls jobs from Redis and executes orchestrations and agent chats. |
synapseorchai/synapse-ai-api-server | API tier replica — same API as the main instance, no UI. Use behind a load balancer when the API tier itself becomes the bottleneck. |
All three images are published together on every release and follow semantic versioning: synapseorchai/synapse-ai-worker:1.6.6 and synapseorchai/synapse-ai-worker:latest.
Option 1 — Docker Run
The fastest path to adding a worker to an existing Synapse install. Run this on any machine that can reach your Redis and Postgres instances.
Prerequisites
- Docker installed
- A running Redis instance (see quick start below)
- A running Postgres instance
Start Redis (if you don't have one)
docker run -d \
--name synapse-redis \
-p 6379:6379 \
-v synapse-redis-data:/data \
redis:7-alpine \
redis-server --appendonly yes
The --appendonly yes flag enables AOF persistence — your queue survives a Redis restart.
Start the worker
docker run -d \
--name synapse-worker-1 \
-e REDIS_URL="redis://your-redis-host:6379/0" \
-e SCALE_POSTGRES_URL="postgresql://user:pass@your-pg-host:5432/synapse" \
-e WORKER_CONCURRENCY=10 \
-e WORKER_JOB_TIMEOUT=3600 \
-e WORKER_MAX_RETRIES=3 \
-p 9000:9000 \
--restart unless-stopped \
synapseorchai/synapse-ai-worker:latest
Verify the worker is healthy
curl http://localhost:9000/health
{
"status": "ok",
"worker_id": "worker-a1b2c3",
"hostname": "my-server-01",
"address": "http://10.0.0.5:9000",
"active_jobs": 0,
"uptime": 18.3
}
The worker appears in the Workers panel in Settings → Scale within 30 seconds of starting.
Use an env file
For cleaner deployments, put all configuration in a .env file:
# .env
REDIS_URL=redis://your-redis:6379/0
SCALE_POSTGRES_URL=postgresql://user:pass@your-pg:5432/synapse
WORKER_CONCURRENCY=10
WORKER_JOB_TIMEOUT=3600
WORKER_MAX_RETRIES=3
S3_BUCKET=my-synapse-bucket
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
docker run -d \
--name synapse-worker-1 \
--env-file .env \
-p 9000:9000 \
--restart unless-stopped \
synapseorchai/synapse-ai-worker:latest
Scale to multiple workers
Run the same command on additional machines, adjusting the container name and health port:
# Machine 2
docker run -d --name synapse-worker-2 --env-file .env \
-p 9000:9000 --restart unless-stopped \
synapseorchai/synapse-ai-worker:latest
# Machine 3
docker run -d --name synapse-worker-3 --env-file .env \
-p 9000:9000 --restart unless-stopped \
synapseorchai/synapse-ai-worker:latest
All workers share the same Redis queue. Jobs are distributed automatically — no additional configuration is needed.
Option 2 — Docker Compose
The docker-compose.yml in the project root supports a scale profile that starts Redis and a worker alongside the existing backend and frontend services.
Prerequisites
- Docker and Docker Compose v2+
- The Synapse AI repository cloned locally
- A Postgres instance (external — not provisioned by Docker Compose)
Configure environment
Copy the environment template and fill in the required values:
cp .env.docker .env
Open .env and set at minimum:
# Required for all deployments
SYNAPSE_INTERNAL_TOKEN=<output of: openssl rand -hex 32>
# Required for scale mode
SCALE_POSTGRES_URL=postgresql://user:pass@your-pg-host:5432/synapse
Start with the scale profile
docker compose --profile scale up -d
This starts four services:
| Service | Purpose | Default port |
|---|---|---|
synapse-backend | API server | 8765 |
synapse-frontend | Web UI | 3000 |
synapse-redis | Redis job queue (auto-provisioned with persistence) | 6379 |
synapse-worker | ARQ worker | health on 9000 |
Scale to multiple workers
docker compose --profile scale up -d --scale worker=4
When scaling beyond 1 replica, remove the container_name: synapse-worker line from docker-compose.yml. Docker Compose requires unique container names — with --scale, it generates names automatically (synapse-worker-1, synapse-worker-2, etc.).
Stop all services
docker compose --profile scale down
To also remove the Redis volume (deletes queued jobs):
docker compose --profile scale down -v
Enterprise profile — observability stack
docker compose --profile enterprise up -d
Adds full observability infrastructure:
| Service | URL | Purpose |
|---|---|---|
| Jaeger | http://localhost:16686 | Distributed tracing UI |
| Prometheus | http://localhost:9090 | Metrics collection and alerting |
| Grafana | http://localhost:3001 | Dashboards and visualisations |
| PgBouncer | localhost:6432 | Postgres connection pooling proxy |
After starting the enterprise profile, set OTLP_ENDPOINT=http://jaeger:4317 in your .env and restart to enable distributed tracing.
Option 3 — Kubernetes
Pre-built Kubernetes manifests live in infra/k8s/ in the repository. They are designed for production deployments with KEDA-based autoscaling and Postgres connection pooling.
Prerequisites
- Kubernetes cluster 1.24+
kubectlconfigured and targeting the correct cluster- KEDA installed in the cluster (for worker autoscaling)
- External Redis and Postgres instances accessible from inside the cluster
Install KEDA
kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.13.0/keda-2.13.0.yaml
Verify KEDA is running before proceeding:
kubectl get pods -n keda
Create secrets
kubectl create secret generic synapse-secrets \
--from-literal=redis-url="redis://your-redis:6379/0" \
--from-literal=postgres-url="postgresql://user:pass@your-pg:5432/synapse"
If using S3 for shared vault storage, add the credentials to the same secret:
kubectl create secret generic synapse-secrets \
--from-literal=redis-url="redis://your-redis:6379/0" \
--from-literal=postgres-url="postgresql://user:pass@your-pg:5432/synapse" \
--from-literal=s3-access-key-id="AKIAIOSFODNN7EXAMPLE" \
--from-literal=s3-secret-access-key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
Apply the manifests
kubectl apply -f infra/k8s/
This creates:
| Manifest | What it deploys |
|---|---|
api-deployment.yaml | synapse-api Deployment (3 replicas) + ClusterIP Service + HPA (scales 2–20 replicas at 70% CPU) |
worker-deployment.yaml | synapse-worker Deployment (initial 2 replicas, overridden by KEDA ScaledObject) + Service |
worker-scaledobject.yaml | KEDA ScaledObject — scales workers 1–100 based on Redis queue depth |
pgbouncer-deployment.yaml | PgBouncer connection pooler Deployment + Service — proxies Postgres connections |
Verify the deployment
kubectl get deployments
kubectl get pods
kubectl get scaledobjects
Check worker logs to confirm successful registration:
kubectl logs -l app=synapse-worker --tail=50
Check API server logs:
kubectl logs -l app=synapse-api --tail=50
Expose the API with an Ingress
The synapse-api Service is a ClusterIP on port 8765. Add an Ingress to expose it externally:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: synapse-ingress
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
rules:
- host: api.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: synapse-api
port:
number: 8765
The proxy-read-timeout and proxy-send-timeout annotations must be set to at least 3600 seconds. SSE connections for long-running orchestrations stay open for minutes or hours — the default nginx timeout of 60 seconds will prematurely close them.
Customising the manifests
Pin workers to a specific image version:
kubectl set image deployment/synapse-worker worker=synapseorchai/synapse-ai-worker:1.6.6
Temporarily fix worker count (disables KEDA autoscaling):
kubectl patch scaledobject synapse-worker-scaledobject \
-p '{"spec":{"minReplicaCount":3,"maxReplicaCount":3}}'
Watch a rolling update in progress:
kubectl rollout status deployment/synapse-worker
Production checklist
Work through this list before going live with a scale deployment.
Infrastructure
- Redis persistence — enable AOF (
appendonly yes) on your Redis instance. Without persistence, a Redis restart drops all queued jobs. - Redis high availability — use Redis Sentinel or Redis Cluster for production. A single Redis node is a single point of failure for your entire job queue.
- Postgres backups — the
orchestration_runsandchat_sessionstables grow over time. Configure daily backups and setRUNS_RETENTION_DAYSto clean up old records automatically. - Postgres connection pooling — deploy PgBouncer in front of Postgres when running more than ~10 worker instances. Each worker holds up to
WORKER_CONCURRENCYPostgres connections simultaneously.
Workers
- At least 2 worker replicas — prevents a complete execution outage if one worker instance crashes.
- KEDA autoscaling — queue-depth-based autoscaling is far more responsive than CPU-based for this workload. Use the provided
worker-scaledobject.yaml. - Sufficient memory — workers load LLM model contexts, tool code, and MCP server processes into memory. Start with the
2Gilimit in the manifest and increase if you see OOM kills. - Graceful termination period — ensure
terminationGracePeriodSeconds: 60is set in the worker pod spec (it is in the provided manifest). This allows in-flight jobs to complete before the pod is terminated.
Shared storage
- S3 for vault files — if your orchestrations read from or write to vault files, configure
S3_BUCKETso all workers share the same storage. Without it, vault writes from one worker are invisible to the next.
Observability
- OTLP tracing — set
OTLP_ENDPOINTso every run generates a distributed trace. This is the most effective tool for debugging slow steps and cross-worker issues. - Prometheus metrics — configure
METRICS_TOKENand scrapeGET /api/v2/metrics(API server) andGET /metricson port 9000 (workers). Theinfra/prometheus.ymlfile has the scrape configuration. - Grafana dashboards — import the provided dashboards from
infra/grafana/to monitor queue depth, worker throughput, error rates, cost per run, and per-tenant usage.
Security
imagePullPolicy: Always— set on all Deployments (already set in the provided manifests) so new image versions roll out automatically on pod restart.- Network policy — restrict worker pods so they can only reach Redis, Postgres, S3, and external LLM API endpoints. Workers do not need to accept inbound connections except on port 9000.
- Rotate
SYNAPSE_INTERNAL_TOKEN— the internal token authenticates frontend-to-backend requests. Generate a new one (openssl rand -hex 32) and update it in your secrets on a regular schedule.
Upgrading workers
Docker Run
# Pull the new image
docker pull synapseorchai/synapse-ai-worker:latest
# Stop and remove the old container
docker stop synapse-worker-1 && docker rm synapse-worker-1
# Start a new container with the same configuration
docker run -d --name synapse-worker-1 --env-file .env \
-p 9000:9000 --restart unless-stopped \
synapseorchai/synapse-ai-worker:latest
Jobs that were in-flight when the old container stopped remain in the Redis ARQ queue with status queued and are picked up automatically by the new worker.
Docker Compose
docker compose --profile scale pull worker
docker compose --profile scale up -d worker
Docker Compose performs a stop-then-start replacement of the worker container. In-flight jobs are requeued automatically.
Kubernetes
# Update to a specific version tag
kubectl set image deployment/synapse-worker worker=synapseorchai/synapse-ai-worker:1.7.0
# Or, to pull the latest tag (requires imagePullPolicy: Always)
kubectl rollout restart deployment/synapse-worker
Watch the rolling update progress:
kubectl rollout status deployment/synapse-worker
Kubernetes performs a rolling update by default — new pods start and pass their readiness probe before old pods are terminated, resulting in zero downtime. In-flight jobs on terminating pods complete within terminationGracePeriodSeconds before the pod is force-killed.