A2A Relay — Deployment Guide
Overview
The relay runs on Google Cloud Run (project: alanblount-demo, region: us-central1).
Container image is stored in Artifact Registry and built via Cloud Build.
Service URLs
| Environment | URL |
|---|---|
| Cloud Run (prod) |
https://a2a-relay-462816930018.us-central1.run.app |
| Local dev |
http://localhost:8765 |
GCP Setup (one-time)
SA Permissions Required
The service account general-agent-access@alanblount-demo.iam.gserviceaccount.com has these roles (confirmed working):
roles/aiplatform.user
roles/artifactregistry.reader
roles/artifactregistry.writer
roles/cloudbuild.builds.editor
roles/iam.securityReviewer ← read own IAM policy
roles/run.admin ← deploy + set allUsers invoker
roles/run.developer
roles/secretmanager.admin
roles/secretmanager.secretAccessor
roles/storage.objectAdmin
roles/storage.objectCreator
Grant the essential non-AI-platform ones:
SA="general-agent-access@alanblount-demo.iam.gserviceaccount.com"
PROJECT="alanblount-demo"
for role in \
roles/artifactregistry.writer \
roles/cloudbuild.builds.editor \
roles/iam.securityReviewer \
roles/run.admin \
roles/secretmanager.admin \
roles/secretmanager.secretAccessor \
roles/storage.objectAdmin; do
gcloud projects add-iam-policy-binding $PROJECT \
--member=serviceAccount:$SA \
--role=$role
done
roles/iam.securityReviewer is the minimum for the SA to read its own IAM policy (get-iam-policy). Without it, every IAM diagnosis requires you to run gcloud manually.
The Cloud Run service agent also needs secret access (granted automatically during deploy if you run the deploy command below):
# Cloud Run service agent format: service-PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com
# Project number for alanblount-demo: 462816930018
gcloud secrets add-iam-policy-binding RELAY_ADMIN_KEY \
--project=alanblount-demo \
--member=serviceAccount:service-462816930018@serverless-robot-prod.iam.gserviceaccount.com \
--role=roles/secretmanager.secretAccessor
APIs to Enable
gcloud services enable \
run.googleapis.com \
cloudbuild.googleapis.com \
secretmanager.googleapis.com \
artifactregistry.googleapis.com \
--project=alanblount-demo
Artifact Registry Repo
gcloud artifacts repositories create a2a-relay \
--repository-format=docker \
--location=us-central1 \
--project=alanblount-demo
Secret Manager
# Create the admin key secret (store a strong random value)
echo -n "your-strong-admin-key-here" | \
gcloud secrets create RELAY_ADMIN_KEY \
--data-file=- \
--project=alanblount-demo
Build & Deploy
Local Auth Setup
# Copy SA key to node-readable path and activate
cp /secrets/credentials/alanblount-demo-bf573405ac1f.json \
/home/node/.config/gcloud/alanblount-demo-sa.json
gcloud auth activate-service-account \
--key-file=/home/node/.config/gcloud/alanblount-demo-sa.json
gcloud config set project alanblount-demo
Build Image
cd /shared/workspace/open-source/a2a-relay
gcloud builds submit \
--project=alanblount-demo \
--tag=us-central1-docker.pkg.dev/alanblount-demo/a2a-relay/relay:latest \
--suppress-logs
Build takes ~3–5 minutes. Image is ~340MB (Python 3.11-slim + sklearn).
Deploy to Cloud Run
gcloud run deploy a2a-relay \
--project=alanblount-demo \
--region=us-central1 \
--image=us-central1-docker.pkg.dev/alanblount-demo/a2a-relay/relay:latest \
--platform=managed \
--allow-unauthenticated \
--set-secrets=RELAY_ADMIN_KEY=RELAY_ADMIN_KEY:latest \
--set-env-vars=RELAY_DB_PATH=/tmp/relay.db,RELAY_LOG_LEVEL=INFO \
--memory=512Mi \
--cpu=1 \
--min-instances=0 \
--max-instances=3 \
--port=8080
Note: Cloud Run injects PORT=8080 automatically. The Dockerfile CMD uses ${PORT:-8080} but in practice Cloud Run always sets PORT. The CMD in Dockerfile is:
CMD ["uvicorn", "relay.main:app", "--host", "0.0.0.0", "--port", "8080"]
(Hardcoded to 8080 — Cloud Run always uses 8080 anyway.)
Verify Health
curl https://a2a-relay-462816930018.us-central1.run.app/health
# Expected: {"status": "ok", "agents": N, "messages": M}
Local Development
cd /shared/workspace/open-source/a2a-relay
# Install deps (uv required)
uv sync --extra dev
# Copy and edit env
cp .env.example .env
# Set RELAY_ADMIN_KEY in .env
# Start relay
uv run uvicorn relay.main:app --port 8765 --reload
# Or use the CLI
uv run python -m relay.cli --help
Running Tests
uv run pytest # all tests
uv run pytest -k test_mailbox # specific module
uv run pytest -v --tb=short # verbose
Test status: 129 passed, 1 skipped, 1 xfail (test_callback_delivery — callback push not yet implemented, documented as xfail).
Running the Demo
# Against local relay
uv run python demo/run_demo.py
# Against Cloud Run
uv run python demo/run_demo.py \
--relay-url https://a2a-relay-462816930018.us-central1.run.app \
--admin-key $RELAY_ADMIN_KEY
Demo scenarios:
- Echo agent round-trip — register, send, poll, verify echo
- Counter agent state — send multiple increments, verify count
- Multi-agent routing — two agents exchange messages via relay
- Concurrent delivery — parallel sends, verify all delivered
Architecture Notes
See ARCHITECTURE.md for full design.
Key decisions:
-
Store-and-forward: Messages queued in SQLite (
aiosqlite), agents poll/mailbox/{agent_id} - Per-agent auth: Each agent gets a Bearer token; admin approves registrations
-
Unified registry:
approveendpoint auto-creates agent record (prevents “not registered” errors) - No callback push yet: Documented as future work; agents must poll
Dockerfile Pitfall (RESOLVED)
python -m relay.main exits immediately — Python’s -m flag doesn’t trigger if __name__ == "__main__" blocks in the same way when there’s no __main__.py. Always use:
CMD ["uvicorn", "relay.main:app", "--host", "0.0.0.0", "--port", "8080"]
Cloud Run Env Var Pitfall
${PORT:-8080} shell expansion does not work in Dockerfile CMD ["exec", "form"] (JSON array). Use a shell form or hardcode the port:
# ❌ Does NOT expand PORT in exec form
CMD ["uvicorn", "relay.main:app", "--port", "${PORT:-8080}"]
# ✅ Works — shell expands $PORT
CMD uvicorn relay.main:app --host 0.0.0.0 --port ${PORT:-8080}
# ✅ Also works — hardcode (Cloud Run always uses 8080)
CMD ["uvicorn", "relay.main:app", "--host", "0.0.0.0", "--port", "8080"]
Troubleshooting
Build PERMISSION_DENIED
ERROR: (gcloud.builds.submit) PERMISSION_DENIED
SA needs roles/cloudbuild.builds.editor AND roles/storage.objectAdmin (not just objectCreator).
Cloud Run TCP probe fails / no logs
Usually means the container exited immediately. Check:
-
Is the Dockerfile CMD correct? (
uvicornnotpython -m relay.main) - Is PORT hardcoded or using shell form?
-
Check logs:
gcloud run services logs read a2a-relay --project=alanblount-demo --region=us-central1 --limit=50
“Agent not registered” despite valid auth
Fixed in relay/routers/admin.py — the approve endpoint now calls db.register_agent. If you see this on a clean deploy, check that the admin approval step was completed (not just token creation).
SQLite on Cloud Run
Cloud Run is ephemeral — /tmp/relay.db is wiped on each revision deployment and may vary across instances. For production:
-
Use a single instance (
--min-instances=1 --max-instances=1) to avoid split-brain - Or migrate to Cloud SQL / Firestore for true multi-instance support
Git Workflow
cd /shared/workspace/open-source/a2a-relay
# Always commit as Alan Blount
git -c user.name="Alan Blount" -c user.email="alan@zeroasterisk.com" \
commit -m "feat: your message"
git push origin master
Remote: https://github.com/zeroasterisk/a2a-relay