A2A Relay — Architecture Decision Record
Date: 2026-05-27
Status: Accepted
Author: Hermes (Alan Blount, sponsor)
Problem Statement
We need a relay/mailbox service so that any AI agent (Hermes, OpenClaw, Claude Code, Antigravity, etc.) can send and receive A2A messages regardless of NAT, firewalls, or whether the target agent is online at the time of sending. The service must:
- Allow agents behind NAT to receive inbound A2A tasks via polling or push
- Allow any agent to send A2A tasks to named agents without knowing their IP
- Be deployable to Cloud Run (zero-ops, scales to zero)
- Be AI Catalog / Agent Finder compliant for federated discovery
- Be testable with mocked LLM responses (BDD + TDD, record/replay)
- Have a clean path to add chat platform bridges later
Decision: Build a Thin Python Service — Do Not Adopt Any Existing Repo As-Is
Options Evaluated
❌ agentgateway/agentgateway — SKIP as primary
- Rust binary, no Python extensibility
- Production-grade proxy, 2.9k stars, actively maintained
- Right answer for enterprise proxy layer — keep as a stretch goal or sidecar option
- Does not do mailbox/async queuing — it’s a synchronous proxy
- Cloud Run compatible but overkill for our initial need
- Use later: once we have multiple agents and need auth/policy/observability at the edge
⚠️ eliasecchig/a2a-gateway — Adopt patterns, not codebase
-
Python FastAPI, correct A2A usage, Cloud Run-native, elegant
/pushoutbound API - Excellent reference architecture — we will copy its channel adapter pattern
- But: alpha (7 stars, 1 contributor, CI failing), no durable queue, all channel SDKs always installed
- Missing: async mailbox — agents must be synchronously reachable
- Action: Fork mentally; use its A2A client code and push-API design as reference
❌ s-hiraoku/synapse-a2a — SKIP
- Local-only PTY-wrapping tool for developer machines
- No NAT relay, no Cloud Run, no cloud deployment path
- Wrong scope entirely
✅ Build: a2a-relay — Thin Python FastAPI mailbox + relay
- ~800 lines of Python (our own, no bus-factor risk)
- Async SQLite mailbox (aiosqlite) for durable message queuing
- Agents poll or receive webhook callbacks when messages arrive
-
A2A protocol compliant (uses
a2a-sdk— same SDK as a2a-gateway) - Cloud Run native from day one
- Agent Finder / AI Catalog compliant from day one
- BDD+TDD from day one, aiomock-style record/replay for LLM responses
- Chat platform bridges (Telegram/Slack/Discord) added later as optional adapters — stealing directly from a2a-gateway’s clean channel adapter pattern
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ A2A RELAY SERVICE │
│ (Cloud Run · public HTTPS URL) │
│ │
│ ┌────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
│ │ A2A Inbound │ │ Message Store │ │ Agent Finder │ │
│ │ POST /tasks │───▶│ (SQLite/PgSQL) │ │ Catalog │ │
│ │ (from senders)│ │ per-agent │ │ /.well-known│ │
│ └────────────────┘ │ mailbox │ │ POST /search│ │
│ └────────┬────────┘ └──────────────┘ │
│ ┌────────────────┐ │ │
│ │ Agent Poll │◀────────────┘ │
│ │ GET /mailbox │ (pull: agent polls when online) │
│ │ /{agent_id} │ │
│ └────────────────┘ │
│ │
│ ┌────────────────┐ ┌─────────────────────────────────────┐ │
│ │ Push Webhook │───▶│ Outbound delivery to agent webhook │ │
│ │ (if agent has │ │ (A2A message/send to callback URL) │ │
│ │ registered │ │ with exponential backoff retry │ │
│ │ callback URL) │ └─────────────────────────────────────┘ │
│ └────────────────┘ │
│ │
│ ┌────────────────┐ │
│ │ Agent Registry│ POST /agents/register │
│ │ (who exists, │ GET /agents/{agent_id}/card │
│ │ what caps, │ (stores A2A agent card + callback URL) │
│ │ callback URL) │ │
│ └────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
▲ ▲
│ send task │ poll/receive
┌────┴─────┐ ┌───┴──────┐
│ Hermes │ │ OpenClaw │
│ (NAT OK) │ │ (NAT OK) │
└──────────┘ └──────────┘
Key Design Choices
1. Mailbox Pattern (Async, Durable)
- Every registered agent has a mailbox in SQLite
-
Messages are stored with status:
PENDING → DELIVERED → ACKNOWLEDGED -
Agents can pull (poll
GET /mailbox/{agent_id}?since=cursor) — NAT-safe - Agents can optionally register a callback URL for push delivery (server-initiated)
- TTL: messages expire after 7 days if unacknowledged
- Retry: webhook delivery uses exponential backoff (3 attempts, 5s/30s/120s)
2. Agent Registration
-
POST /agents/register— agent submits its A2A agent card + optional callback URL -
Agent card is cached and served at
GET /agents/{agent_id}/.well-known/agent.json -
Registration is scoped by
agent_id(caller-specified URN or auto-assigned) - Optional API key auth on registration endpoint (env-var configured)
3. A2A Protocol Compliance
-
Relay exposes its OWN agent card at
/.well-known/agent.json -
POST /a2a— relay’s A2A endpoint (accepts tasks for routing to registered agents) -
Task routing:
metadata.relay_target_agent_idin incoming message → mailbox lookup -
Uses
a2a-sdktypes throughout — no hand-rolled A2A JSON
4. Agent Finder Compliance
-
GET /.well-known/ai-catalog.json— publishes relay + all registered agents -
POST /search— semantic search over registered agents (TF-IDF over descriptions + representativeQueries for MVP; vector search later) -
Federation:
referralsmode supported (relay can point to other registries)
5. Message Store: SQLite (Cloud Run friendly)
- SQLite with WAL mode for MVP (single Cloud Run instance)
- Schema: agents, messages, deliveries
- Swap to PostgreSQL (Cloud SQL) when horizontal scale needed
- aiosqlite for async I/O, no blocking
6. Testing Strategy
- Unit tests: pytest-asyncio, all business logic tested in isolation
-
Mock A2A agents:
respxfor HTTP mocking +pytest-recording(cassette-style record/replay — same concept as npm aimock but Python-native) -
BDD scenarios:
pytest-bddwith Gherkin.featurefiles -
Integration tests: spin up the FastAPI app via
httpx.AsyncClient(app=app)— no real network needed -
E2E test:
docker-composewith a real relay + two stub agents, run in CI
API Surface (MVP)
# Agent Management
POST /agents/register Register agent card + optional callback
GET /agents/{id}/card Fetch stored agent card
DELETE /agents/{id} Deregister
# Mailbox (NAT-safe polling)
GET /mailbox/{agent_id} Poll: returns pending messages (cursor-based)
POST /mailbox/{agent_id}/ack Acknowledge message(s)
# A2A Relay Endpoint
POST /a2a A2A JSON-RPC (relay's own A2A endpoint)
GET /.well-known/agent.json Relay's A2A agent card
# Agent Finder / AI Catalog
GET /.well-known/ai-catalog.json AI Catalog manifest
POST /search Semantic search over registered agents
# Health
GET /health Liveness
GET /ready Readiness (DB connected)
GET /metrics Prometheus metrics
Non-Goals (MVP)
- No WebSocket/SSE streaming relay (synchronous task send is enough to start)
- No multi-tenant auth beyond simple API key
- No web UI (a2a-gateway has one; we don’t need it yet)
- No Kubernetes deployment (Cloud Run only for now)
- No direct chat platform bridges (first release; add as adapters in v2 using a2a-gateway patterns)
- No vector semantic search (TF-IDF is sufficient for MVP catalog search)
Tech Stack
| Component | Choice | Rationale |
|---|---|---|
| Language | Python 3.12 | ADK ecosystem, a2a-sdk, team familiarity |
| Web framework | FastAPI 0.115+ | Async-native, Pydantic v2, OpenAPI free |
| A2A SDK |
a2a-sdk>=1.0 |
Google’s official; same as a2a-gateway |
| Message store | SQLite (aiosqlite) | Zero-ops, Cloud Run compatible, swap to PG later |
| HTTP client |
httpx (async) |
Standard async HTTP; same as a2a-gateway |
| Testing | pytest-asyncio + respx + pytest-bdd + pytest-recording | Record/replay, BDD features, async |
| Container | Python 3.12 slim | Minimal, Cloud Run native |
| CI/CD | GitHub Actions → GHCR → Cloud Run | Free, automated |
| Config | Pydantic Settings (env vars + .env) | 12-factor, Cloud Run compatible |
File Structure
a2a-relay/
├── relay/
│ ├── __init__.py
│ ├── main.py # FastAPI app factory
│ ├── config.py # Pydantic settings
│ ├── models.py # DB models (SQLite schema)
│ ├── db.py # aiosqlite connection + migrations
│ ├── api/
│ │ ├── agents.py # /agents/* endpoints
│ │ ├── mailbox.py # /mailbox/* endpoints
│ │ ├── a2a.py # /a2a relay endpoint + agent card
│ │ └── catalog.py # /.well-known/* + /search
│ ├── services/
│ │ ├── router.py # Route incoming A2A task → target mailbox
│ │ ├── delivery.py # Webhook push delivery + retry
│ │ └── search.py # Catalog search (TF-IDF)
│ └── a2a_client.py # Outbound A2A calls (wraps httpx)
├── tests/
│ ├── conftest.py
│ ├── features/ # Gherkin .feature files (BDD)
│ │ ├── relay.feature
│ │ ├── mailbox.feature
│ │ └── catalog.feature
│ ├── unit/
│ │ ├── test_router.py
│ │ ├── test_delivery.py
│ │ └── test_search.py
│ ├── integration/
│ │ ├── test_agents_api.py
│ │ ├── test_mailbox_api.py
│ │ ├── test_a2a_api.py
│ │ └── test_catalog_api.py
│ └── cassettes/ # respx recorded HTTP cassettes
├── docs/
│ ├── ARCHITECTURE.md # This file
│ └── DEPLOYMENT.md
├── .github/
│ └── workflows/
│ ├── test.yml
│ └── deploy.yml
├── Dockerfile
├── Makefile
├── pyproject.toml
└── README.md
Deployment: Cloud Run
# One-time setup
gcloud run deploy a2a-relay \
--image ghcr.io/<owner>/a2a-relay:main \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars "RELAY_API_KEY=<secret>,DATABASE_URL=:memory:" \
--min-instances 0 \
--max-instances 3 \
--memory 512Mi \
--port 8080
The Cloud Run URL becomes the relay’s public address.
Agents register with POST /agents/register including their callback URL (if any).
Agents behind NAT poll GET /mailbox/{agent_id} as often as they want.
Phase 2 (After MVP)
- Chat bridges (Telegram, Slack, Discord) as optional channel adapters — steal from a2a-gateway
- agentgateway as optional proxy in front of the relay for auth/rate limiting
- Vector semantic search for catalog (swap TF-IDF for sentence-transformers or Qdrant)
- PostgreSQL (Cloud SQL) for horizontal scale
- WebSocket/SSE streaming relay endpoint
- Multi-tenant API key management