Three-Tier Memory

Ezra manages memory across three tiers modelled on the hardware memory hierarchy — each a different time scale, backed by a different store, all scope-filtered per agent.

Tier	Analog	Latency	Holds	Backing store
Hot	CPU cache	~0 ms	Active turn, pinned beliefs, fetched mesh result	Redis (per-agent hash)
Warm	RAM	5–15 ms	Compressed prior turns, topic summaries	Qdrant (graph + scope filtered)
Cold	SSD	20–50 ms	Episodic, semantic, procedural, belief history	MongoDB Atlas

Hot — Redis

One Redis hash per agent holds the recent turns (a capped ring buffer), pinned beliefs, and the latest mesh result. Because each agent owns its key, the hot tier is O(1) per agent — adding agents adds rows, not iterations.

Warm — Qdrant

Compressed prior-turn and topic summaries, recalled by semantic similarity to the current input and filtered server-side by session_graph_id, then in Python by scope and TTL. Summaries past their TTL are evicted by the lifecycle meta-agent.

Cold — MongoDB Atlas

Four sub-stores share one Atlas client:

Sub-store	Loading	Notes
Episodic	by session-graph / timestamp	Time-windowed event memory
Semantic — core	always, at agent spawn, for scope	The always-needed facts
Semantic — archival	per turn, by vector similarity	Atlas Vector Search
Procedural	by intent pattern	Inferred behavioural rules
Belief history	append-only, queryable	The compliance artifact — never compacts

Core vs. archival semantic memory

Core facts load at every spawn for the agent's scope (institutional knowledge it always needs). Archival facts are the long tail: on every turn, the router embeds the input and pulls the most-similar archival facts via Atlas $vectorSearch, scope-filtered. Each recall bumps the fact's access count; once it crosses a threshold the learning meta-agent promotes it to core, so frequently-needed knowledge graduates into the always-loaded set.

# Per-turn archival recall (run inside router step 4):
facts = await semantic_store.recall_archival(
    "how long do the medium tyres last?",
    user_id="team-ezra",
    scope_topics={"tyres", "strategy"},
    limit=3,
)

The vector index is created with MongoSemanticStore.ensure_vector_index(num_dimensions) at wiring/ingest time (sized to the embedder — gemini-embedding-001 is 3072-dim; Vertex text-embedding-005 is 768-dim).

Cross-graph inheritance

A new session graph can hydrate core semantic memory from any number of prior graphs:

graph = await ezra.create_session_graph(
    session_graph_id="race-weekend-imola-2026",
    inherits_from=["race-weekend-monaco-2026", "race-weekend-bahrain-2026"],
)

Inherited core facts load at spawn, scope-filtered. Belief history is never inherited — only semantic core (and, optionally, procedural rules).