ECS Deployment
Developer Reference
This page covers internal implementation details. It is not included in the User Guide.
Deploy the Escher backend on AWS ECS Fargate using the v2-deployment-ecs shell scripts and the v3-escher-deployment CI/CD pipeline.
Verified vs illustrative
Authoritative commands and topology in this page come from v2-deployment-ecs/README.md. Specifics:
- The end-to-end deploy script is
./deploy-all.sh(notdeploy.sh)../deploy.sh setup-clusterexists for cluster-only setup. - Default region is
us-west-1orus-west-2depending on environment. - Internal Route53 zone is
escher-prod.internal. - Topology includes API Gateway (Lambda) for
cloudwatch-logsandfeedbackservices in addition to the ECS Fargate tasks for Gateway / UI Agent / Playbook Store.
The exact YAML task definitions, Cognito UserPoolArn, and security-group rules below are illustrative — they reflect the design pattern in the README but the precise values for your tenant come from v2-deployment-ecs/configs/. Confirm against that repo before applying.
Architecture
Internet
│
▼
ALB (HTTPS :443)
Cognito OIDC listener rule — validates access_token before forwarding
│
▼
Gateway (ECS Fargate — us-west-1 — escher-v2 cluster)
Port 8080, arm64, 2048 CPU / 4096 MB
Health: GET /q/health → 200
│ private subnet ─────────────────────────────────────────────────────┐
▼ │
Analysis Agent Playbook Agent │
Port 8081 Port 8082 │
arm64 arm64 │
2048 CPU / 4096 MB 2048 CPU / 4096 MB │
│
Context Engine (Port 8001) ─────▶ Asset Store (Port 8000) │
arm64, 1024 CPU / 2048 MB arm64, 1024 CPU / 2048 MB │
│ │
▼ │
SurrealDB │
(ECS or managed) ◀──────────────┘All backend services run in private subnets (no direct internet access). Only the ALB is internet-facing. Services discover each other via AWS Cloud Map DNS (*.escher.internal).
Prerequisites
- AWS account with ECS, ECR, ALB, IAM, and Route53 Cloud Map permissions
- Docker images built and pushed to ECR (done by
v3-escher-deploymentCI/CD) - VPC with private subnets for ECS tasks and a public subnet pair for the ALB
- Cognito User Pool and App Client created (for ALB auth rule)
- Secrets Manager entries for
ANTHROPIC_API_KEY
Cluster and VPC configuration
| Setting | Value |
|---|---|
| Cluster name | escher-v2 |
| Region | us-west-1 |
| VPC CIDR | 10.0.0.0/16 |
| Capacity provider | FARGATE |
| Container runtime | arm64 (Graviton) |
# Create cluster (one-time)
aws ecs create-cluster \
--cluster-name escher-v2 \
--capacity-providers FARGATE \
--region us-west-1Service task definitions
Each service has a task definition in v2-deployment-ecs/services/. Key parameters per component:
Gateway
# services/gateway-service.yaml
family: gateway
cpu: "2048"
memory: "4096"
requiresCompatibilities: [FARGATE]
networkMode: awsvpc
runtimePlatform:
operatingSystemFamily: LINUX
cpuArchitecture: ARM64
containerDefinitions:
- name: gateway
image: ACCOUNT.dkr.ecr.us-west-1.amazonaws.com/escher-gateway:TAG
portMappings:
- containerPort: 8080
protocol: tcp
healthCheck:
command: ["CMD-SHELL", "curl -f http://localhost:8080/q/health || exit 1"]
interval: 30
timeout: 5
retries: 3
environment:
- name: ANALYSIS_AGENT_URL
value: http://analysis-agent.escher.internal:8081
- name: PLAYBOOK_AGENT_URL
value: http://playbook-agent.escher.internal:8082
- name: CONTEXT_ENGINE_URL
value: http://context-engine.escher.internal:8001
secrets:
- name: COGNITO_JWKS_URL
valueFrom: arn:aws:secretsmanager:us-west-1:ACCOUNT:secret:escher/cognito-jwks-urlAnalysis Agent
family: analysis-agent
cpu: "2048"
memory: "4096"
requiresCompatibilities: [FARGATE]
networkMode: awsvpc
runtimePlatform:
cpuArchitecture: ARM64
containerDefinitions:
- name: analysis-agent
image: ACCOUNT.dkr.ecr.us-west-1.amazonaws.com/escher-analysis-agent:TAG
portMappings:
- containerPort: 8081
environment:
- name: CONTEXT_ENGINE_URL
value: http://context-engine.escher.internal:8001
secrets:
- name: ANTHROPIC_API_KEY
valueFrom: arn:aws:secretsmanager:us-west-1:ACCOUNT:secret:escher/anthropic-api-keyContext Engine + Asset Store
# context-engine: 1024 CPU / 2048 MB
# asset-store: 1024 CPU / 2048 MB
# Both connect to SurrealDB at SURREAL_URLLoad balancer configuration
# Create ALB target group for Gateway
aws elbv2 create-target-group \
--name escher-gateway-tg \
--protocol HTTP \
--port 8080 \
--vpc-id vpc-XXXXXX \
--target-type ip \
--health-check-path /q/health \
--health-check-interval-seconds 30 \
--region us-west-1
# Add Cognito auth rule on HTTPS listener
aws elbv2 create-rule \
--listener-arn arn:aws:elasticloadbalancing:... \
--priority 1 \
--conditions '[{"Field":"path-pattern","Values":["/*"]}]' \
--actions '[
{
"Type": "authenticate-cognito",
"AuthenticateCognitoConfig": {
"UserPoolArn": "arn:aws:cognito-idp:us-west-1:ACCOUNT:userpool/POOL_ID",
"UserPoolClientId": "CLIENT_ID",
"UserPoolDomain": "escher-auth.auth.us-west-1.amazoncognito.com",
"OnUnauthenticatedRequest": "deny"
},
"Order": 1
},
{
"Type": "forward",
"TargetGroupArn": "arn:aws:elasticloadbalancing:...",
"Order": 2
}
]'IAM roles
Each ECS task uses a dedicated task role and a shared task execution role.
Task execution role (shared)
Allows ECS agent to pull images from ECR and fetch secrets:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents",
"secretsmanager:GetSecretValue"
],
"Resource": "*"
}
]
}Task roles (per service)
| Service | Extra permissions needed |
|---|---|
| Gateway | None (routes internally only) |
| Analysis Agent | None (calls Anthropic via internet egress) |
| Context Engine | None (calls Asset Store internally) |
| Asset Store | None (talks to SurrealDB internally) |
No service requires AWS API permissions — all cloud estate operations run on the user's machine via the desktop app's local Tauri plugins, not on the backend.
Security groups
escher-alb-sg
Ingress: 0.0.0.0/0 → 443/tcp
Egress: escher-gateway-sg → 8080/tcp
escher-gateway-sg
Ingress: escher-alb-sg → 8080/tcp
Egress: escher-agent-sg → 8081-8082/tcp
escher-ce-sg → 8001/tcp
escher-agent-sg
Ingress: escher-gateway-sg → 8081-8082/tcp
Egress: escher-ce-sg → 8001/tcp
0.0.0.0/0 → 443/tcp (Anthropic API egress)
escher-ce-sg
Ingress: escher-gateway-sg → 8001/tcp
escher-agent-sg → 8001/tcp
Egress: escher-asset-sg → 8000/tcp
escher-asset-sg
Ingress: escher-ce-sg → 8000/tcp
Egress: escher-surrealdb-sg → 4000/tcpDeployment
The v2-deployment-ecs/deploy.sh script handles image tagging, task definition registration, and service update:
cd v2-deployment-ecs
# Deploy a single service
./deploy.sh gateway v2.15.0
# Deploy all services (ordered by dependency)
./deploy.sh all v2.15.0The script:
- Tags the ECR image with the release version
- Registers a new task definition revision
- Calls
aws ecs update-service --force-new-deployment - Waits for the service to reach steady state
In production, deployments are triggered automatically by the CI/CD pipeline. See CI/CD.
Health checking all services
# From inside the VPC (bastion or SSM session)
for svc in gateway:8080/q/health analysis-agent:8081/health playbook-agent:8082/health context-engine:8001/health asset-store:8000/health; do
host="${svc%:*}"
path="${svc#*:}"
echo "$host: $(curl -sf http://$host.escher.internal:$path || echo FAILED)"
done
# From internet (ALB endpoint, after auth)
curl -H "Authorization: Bearer $TOKEN" https://api.escher.tessell.com/q/healthRollback
# Roll back a service to the previous task definition revision
aws ecs update-service \
--cluster escher-v2 \
--service escher-gateway \
--task-definition gateway:PREVIOUS_REVISION \
--force-new-deployment \
--region us-west-1Next steps
- CI/CD — Automated deployments via
v3-escher-deployment - Docker Compose — Local development setup
- Authentication — Cognito and security group model