Architecture

Ezra is two layers: a platform layer the agents and operators interact with, and Ezra Core — the runtime substrate that handles tiers, memory, beliefs, federation, and replay.

The dual-layer model

flowchart TB
  subgraph Platform["Platform layer"]
    Fleet["Dynamic agent fleet<br/>ADK · LangGraph · custom"]
    Dash["Operations dashboard"]
  end
  subgraph Core["Ezra Core — runtime substrate"]
    Router["8-step router"]
    Tiers["Hot / Warm / Cold tiers"]
    Belief["Belief store + reconciler<br/>+ branching replay"]
    Mesh["Federated query planner"]
    Meta["Two meta-agents"]
  end
  subgraph Ext["External sources"]
    Mongo["MongoDB Atlas"]
    WH["Snowflake · BigQuery"]
    Rest["REST APIs"]
    Streams["Atlas Stream Processing"]
  end
  Platform -- "EzraService · REST API" --> Core
  Core -- "MCP / SDK / native" --> Ext

Agents hold an EzraService (a scope-bound handle) and talk to the runtime through it. Non-Python or external systems use the REST API. The runtime itself is framework-agnostic.

Session graph

A session graph binds one root session (an operations period — an incident window, a quarter close, a race weekend) to N concurrent agent contexts. Agents spawn and terminate during its lifetime; the count is emergent, not designed.

Each agent has its own view into shared memory and beliefs, filtered by its permission scope. Commitments are scoped to one graph-wide belief store, where the reconciler runs continuously. New graphs can inherit core semantic memory from prior graphs (inherits_from=[…]), so institutional memory compounds across operations.

A graph moves through three states: active (≥1 agent alive), closed (all terminated, recently — hot tier cleared, warm/cold retained), and archived (idle past a threshold — episodic/warm compacted, belief history kept verbatim).

How performance stays flat as N grows

The central claim — adding agents does not slow per-agent decision time — rests on four properties:

Per-agent hot tier is O(1). Each agent has its own Redis hash. Adding agents adds rows, not iterations.
Warm and cold are shared but filtered. Qdrant and MongoDB queries filter by session_graph_id and permission scope; cost doesn't scale with agent count.
The reconciler is O(active commitments on topic), not O(N). Only agents that committed on a contested topic enter its working set.
Federated queries are independent. Each agent's fetch is a separate downstream call; concurrency scales with the source.

The result: constant per-agent latency, linear total throughput. Calibrated target — p50 < 40 ms / p99 < 100 ms per-agent router overhead (excluding the LLM call) at 5–100 agents.

Package boundary

ezra_core/ is the generic platform. The Formula 1 reference demo lives entirely under demo/f1_race_weekend/ and is explicitly not part of the platform package. The marketing/docs site lives under web/.

The 8-Step Router — the per-agent pipeline.
Three-Tier Memory — hot / warm / cold.

Architecture

The dual-layer model

Session graph

How performance stays flat as N grows

Package boundary

Next