Mem0 vs Zep vs Letta comes down to three architectural bets: Mem0 is a vector-first memory layer you bolt onto any agent stack, Zep (via Graphiti) is a temporal knowledge graph for facts that change over time, and Letta is a full agent runtime where the model pages its own memory like an operating system. For most production teams in June 2026, Mem0 is the fastest path to cross-session personalization; Zep wins when "what was true in Q1?" matters; Letta fits long-running autonomous agents that must self-edit what stays in context.
Key takeaways
- Mem0 leads on ecosystem size, AWS Agent SDK integration, and April 2026 self-reported LongMemEval scores (94.4% at ~6,787 tokens per query).
- Zep / Graphiti is the strongest temporal engine: validity windows on facts, credit-based Zep Cloud, and Graphiti OSS at 27,244 GitHub stars.
- Letta (formerly MemGPT) trades a drop-in library for an OS-inspired tiered model: core, recall, and archival memory the agent manages with tools.
- Benchmarks now standardize on LoCoMo, LongMemEval, and BEAM, but vendor-reported scores still disagree; run evals on your own workload.
- Pricing starts free on all three; Mem0 Pro graph features land at $249/mo, Zep Flex at $125/mo for 50,000 credits.
Why agent memory became infrastructure in 2026
Three years ago, "memory" meant stuffing chat history into a context window. In 2026, it is a dedicated layer with its own benchmarks, funding rounds, and cloud partnerships. Mem0 raised $24 million in October 2025 ($3.9 million seed plus a $20 million Series A led by Basis Set Ventures) and reports serving as the exclusive memory provider for AWS's Agent SDK, with 186 million API calls processed in Q3 2025 alone. (Source: TechCrunch)
The category split is no longer "RAG or not." Updated industry comparisons published in May and June 2026 describe three production patterns: vector-first extraction and retrieval (Mem0), graph-native temporal knowledge (Zep / Graphiti), and OS-inspired tiered context (Letta). Stateless agents still work for one-shot tasks. They do not scale when users expect continuity, when institutional knowledge compounds across runs, or when token bills climb from re-injecting the same context every call.
"Memory is becoming one of their key moats now that LLMs are getting commoditized." - Taranjeet Singh, Mem0 co-founder (Source: TechCrunch)
Mem0's own State of AI Agent Memory 2026 report (April 1, 2026) argues the field now measures memory with reproducible suites rather than ad hoc demos. That shift is what makes a three-way comparison actionable for buyers, not just a feature checklist.
Three architectures teams actually compare
AI agent memory is a persistent storage and retrieval layer that lets agents retain facts, preferences, and procedural knowledge across sessions, separate from the model's finite context window. Each leading platform optimizes a different retrieval shape.
| Platform | Memory class | Architecture | Runtime coupling | Self-host path |
|---|---|---|---|---|
| Mem0 | Personalization + some institutional | Vector store + multi-signal retrieval (semantic, BM25, entity linking) | Drop-in library; framework-agnostic | OSS + managed cloud |
| Zep / Graphiti | Institutional + temporal context | Temporal knowledge graph with validity windows on facts | Memory server (Zep Cloud) or embed Graphiti | Graphiti OSS; Zep Cloud managed |
| Letta | Both; agent-managed | Three tiers: core (in-context), recall (searchable history), archival (long-term store) | Full agent runtime; tight integration | OSS server + emerging managed cloud |
Vector-first systems optimize for fast fact extraction and semantic recall. Graph-native systems optimize for entity relationships and how facts evolve. OS-tiered systems hand memory paging decisions to the agent through explicit tools, which adds control at the cost of adopting a runtime. (Source: Vectorize)
Mem0: vector-first with multi-signal retrieval
Mem0 is a model-agnostic memory layer: developers call add() on conversations, Mem0 extracts atomic facts, and search() returns scoped memories before each LLM turn. Its April 2026 token-efficient algorithm reports 92.5 on LoCoMo and 94.4 on LongMemEval, averaging 6,787 tokens per retrieval query on LongMemEval, with the largest gains on temporal queries (+29.6 points) and multi-hop reasoning (+23.1 points) versus its prior algorithm. (Source: Mem0 State of AI Agent Memory 2026)
The design centers on multi-scope memory: writes carry user_id, agent_id, run_id, and app_id scopes composed at retrieval. Mem0 documents 21 framework integrations and 20 vector backends. Its April 2026 OSS path replaced external graph stores with built-in entity linking in the retrieval score. (Source: Mem0 State of AI Agent Memory 2026)
Strengths: broadest integration surface, AWS Agent SDK exclusivity, and minutes-to-first-memory developer experience.
Tradeoffs: graph-style features on the managed product sit behind higher tiers; independent evaluations cited in the June 2026 Vectorize comparison still list older Mem0 LongMemEval scores near 49.0% (arxiv 2603.04814), well below Mem0's April 2026 self-report. Treat headline benchmarks as hypotheses until you run the open evaluation framework at github.com/mem0ai/memory-benchmarks on your data. (Sources: Vectorize, Mem0)
Operator note (first-hand): A local pip install mem0ai and from mem0 import Memory; memory = Memory() smoke test completed in under 30 seconds on a clean Python 3.11 venv with only OPENAI_API_KEY set. Scope fields (user_id, agent_id, run_id) are visible in the SDK types without extra wiring, which matches Mem0's positioning as a drop-in layer rather than a new agent framework.
Zep and Graphiti: temporal knowledge graphs
Zep is a context-engineering platform built on Graphiti, an open-source temporal knowledge graph engine. Graphiti decomposes ingested Episodes (chat messages, JSON payloads, or text blocks) into entities and edges, each carrying validity windows so the system can answer "who led the project in January?" differently from "who leads it now?" Zep Cloud markets sub-200ms retrieval and reports 63.8% on LongMemEval with GPT-4o, with particular strength in temporal reasoning and preference categories. (Source: Vectorize)
The open-source Graphiti repository shows 27,244 GitHub stars as of June 2026. Teams can self-host Graphiti against Neo4j, FalkorDB, or Kuzu. Zep's managed Community Edition has been deprecated per the Vectorize update; production Zep features flow through Zep Cloud or a custom Graphiti deployment without Zep's higher-level API.
Strengths: best-in-class fact evolution modeling; retrieval and storage unmetered on Zep Cloud; SOC 2 Type II on Enterprise. (Source: Zep Pricing)
Tradeoffs: credit-based pricing punishes large Episodes; steeper mental model than flat vector memory. Zep fits CRM copilots, compliance agents, and support bots where when a fact was true matters as much as what was true.
Letta: OS-inspired tiered memory as a runtime
Letta (the project formerly known as MemGPT) treats the LLM like an operating system managing its own memory hierarchy:
- Core memory stays in the active context window (like RAM).
- Recall memory holds searchable recent conversation history.
- Archival memory is long-term storage the agent queries on demand.
Agents use tools such as core_memory_append, archival_memory_insert, and archival_memory_search to decide what to promote, archive, or retrieve. That is fundamentally different from Mem0 and Zep, where the platform extracts and ranks facts on the agent's behalf.
Letta ships an Agent Development Environment for inspecting memory blocks and raised a $10 million seed led by Felicis Ventures at a reported $70 million post-money valuation. (Source: Vectorize) Managed cloud pricing runs roughly $20 to $200 per month; the OSS server remains the primary self-host path.
Tradeoffs: you adopt a runtime, not a library; setup takes hours, not minutes. Choose Letta when the agent's lifecycle is the memory lifecycle, not when you only need lightweight personalization inside an existing LangGraph workflow.
Benchmarks: what LoCoMo and LongMemEval measure
Three suites now anchor most 2026 comparisons:
| Benchmark | Scale | What it stresses |
|---|---|---|
| LoCoMo | 1,540 questions | Multi-session recall: single-hop, multi-hop, open-domain, temporal |
| LongMemEval | 500 questions | Knowledge updates, preferences, temporal reasoning, multi-session recall |
| BEAM | 1M and 10M token contexts | Production-scale volume where window stuffing fails |
Mem0's April 2026 algorithm also reports 64.1 on BEAM (1M) and 48.6 on BEAM (10M), highlighting a ~25% drop as context scales 10x, a signal that temporal abstraction at scale remains an open problem even after recent gains. (Source: Mem0)
Inference: No single published score settles a buying decision. LoCoMo and LongMemEval focus on conversational histories, not procurement workflows or code-review agents. Use vendor numbers as orientation, then run the Mem0 open eval harness or your own golden-set queries before committing.
Pricing and what actually drives your bill
| Platform | Free tier | Paid entry | Cost driver |
|---|---|---|---|
| Mem0 | 10,000 memory add requests/mo | $19/mo Starter (50,000 adds); $249/mo Pro (500,000 adds) | Memory add + retrieval API volume; graph on Pro |
| Zep Cloud | 1,000 credits/mo | Flex $125/mo (50,000 credits) | Episode byte size at ingestion (1 credit per 350 bytes) |
| Letta | OSS self-host | ~$20-200/mo managed (per Vectorize) | Agent count, model calls, hosting |
Mem0 charges per memory add and retrieval request on the platform. Zep charges per Episode ingested; retrieval, storage, and users are unmetered on cloud plans. Letta's managed tiers bundle hosting with agent runtime usage.
For prototype-to-early-production loads, expect under $200 per month on managed Mem0 Starter or Zep Flex unless your agents write large Episodes or re-index constantly. At scale, Mem0's token-per-query efficiency (under 7,000 tokens on LongMemEval per Mem0's April 2026 report) matters as much as list price. (Sources: Mem0 Pricing, Zep Pricing)
Which platform should you pick?
Pick Mem0 if you need the widest integration surface, AWS Agent SDK alignment, and the fastest drop-in path for user personalization across LangChain, CrewAI, or a custom stack. Budget for Pro at $249/mo if you need managed graph-style entity features at scale.
Pick Zep if your agent must reason over changing relationships and timestamps: customer issue evolution, org charts, compliance timelines, or "used to live in London, now Tokyo" preference shifts. Start with Graphiti OSS when you want data control; choose Zep Cloud when you want managed temporal GraphRAG without operating graph infra.
Pick Letta if you are designing stateful, long-horizon agents that should decide what enters context, what archives, and what gets forgotten. Accept runtime coupling in exchange for self-editing memory blocks and visual debugging in the ADE.
Most mature stacks combine a memory platform with caching underneath. Pair Mem0 or Zep with Redis session cache when latency-sensitive paths cannot wait for multi-signal retrieval. (Source: Vectorize)
Frequently asked questions
What is the best AI agent memory platform in 2026?
For a default production choice, Mem0 offers the largest ecosystem, AWS Agent SDK partnership, and quickest integration path. Zep is the better engine when temporal fact evolution is core to the product. Letta is the right foundation when you want agents to manage their own memory tiers inside a dedicated runtime.
What is the difference between Mem0 and Zep?
Mem0 extracts atomic facts into vector-backed stores with multi-signal retrieval and optional entity linking. Zep ingests Episodes into a temporal knowledge graph where facts carry validity windows. Mem0 optimizes drop-in personalization; Zep optimizes "what was true when?" queries across evolving entities.
Is Letta the same as MemGPT?
Yes. Letta is the current name for the project originally published as MemGPT, based on the peer-reviewed OS-inspired memory hierarchy paper. The MemGPT paging metaphor (core, recall, archival) survives in Letta's architecture.
How do LoCoMo and LongMemEval compare memory systems?
LoCoMo (1,540 questions) stresses multi-session conversational recall across four categories. LongMemEval (500 questions) adds knowledge updates, preference tracking, and harder temporal reasoning. Together they replaced ad hoc demos, though both focus on chat histories rather than task-execution agents.
Can I self-host Mem0, Zep, and Letta?
Mem0 and Letta offer full open-source self-host paths. Zep Cloud is managed; self-hosting means running Graphiti directly without deprecated Zep Community Edition. All three support local development without cloud dependency when you bring your own vector or graph database.
How much does agent memory infrastructure cost?
All three ship meaningful free tiers: Mem0 at 10,000 adds per month, Zep at 1,000 credits per month, Letta via OSS. Paid managed entry sits near $19 to $125 per month for early production. Enterprise compliance (HIPAA BAA, guaranteed SLAs) pushes teams to custom Mem0 Enterprise or Zep Enterprise contracts.
Related coverage
- Agent frameworks 2026: AutoGen fork, AG2 guide
- OpenAI Agents SDK update adds native sandboxes for safer long-horizon runs
- Anthropic Unveils 'Dreaming' System for Self-Improving AI Agents
- Microsoft Agent Framework 1.0 ships graph workflows and MCP, with A2A next
References
- Graphiti - https://github.com/getzep/graphiti
- Mem0 Pricing - https://mem0.ai/pricing
- Mem0 State of AI Agent Memory 2026 - https://mem0.ai/blog/state-of-ai-agent-memory-2026
- Mem0 raises $24M (TechCrunch) - https://techcrunch.com/2025/10/28/mem0-raises-24m-from-yc-peak-xv-and-basis-set-to-build-the-memory-layer-for-ai-apps
- Vectorize best AI agent memory systems - https://vectorize.io/articles/best-ai-agent-memory-systems
- Zep Pricing - https://www.getzep.com/pricing



