Key Point
AI agent memory is shifting from simple vector retrieval to a layered systems design
Reading current public documentation as of April 2026 from OpenAI, Anthropic, Google ADK, LangGraph, Letta, Mem0, Microsoft Azure, and AWS Bedrock / AgentCore alongside papers such as MemGPT, CoALA, LoCoMo, LongMemEval, A-MEM, MemoryOS, Reflective Memory Management, and MemInsight, a clear pattern appears. Agent memory is no longer just about keeping a longer chat history or attaching a vector database. The practical implementation center has moved toward a layered design that separates short-term working state, long-term persistent memory, shared memory, and background consolidation or reflection. The most adopted pattern today is to separate per-session state from persistent stores and re-inject only what matters. The frontier is adding graph memory, file-backed memory, and asynchronous memory updates on top of that base.
8 stacks
Converging implementation surfaces
OpenAI, Anthropic, Google, LangChain, Letta, Mem0, Microsoft, and AWS now document memory as an explicit systems layer.
34 sources
Public evidence base
Official docs and papers alone are now enough to compare short-term, long-term, shared, and consolidation designs.
4 layers
Emerging default shape
Working memory, persistent memory, shared memory, and consolidation are increasingly separated.
1 conclusion
Layered memory is the mainstream design
The common pattern is not a pure vector database. It is a system that separates state from memory.
What Changed
By spring 2026, memory is being treated as a systems problem rather than a model feature
The clearest signal is that major frameworks no longer explain memory as one box. Google ADK separates sessions, state, memory, context caching, and compaction.
LangGraph documents persistence and memory as distinct implementation surfaces, while Deep Agents adds agent-scoped, user-scoped, and organization-level memory plus background consolidation.
Letta separates memory blocks, shared memory, and archival memory, and Letta Code adds MemFS as an editable file-backed memory layer. AWS Bedrock summarizes ended sessions into a retained memory
keyed by memoryId, and Azure AI Foundry exposes memory stores as part of the agent object. Anthropic Claude Code combines CLAUDE.md with auto memory, while OpenAI's ChatGPT Memory
separates saved memories from chat history. The pattern is clear: memory is no longer just a longer context window. It is a system layer that decides where to write, what to keep, and when to retrieve.
Short-term memory is moving into session state
Instead of replaying the full transcript every turn, more systems keep thread, session, checkpoint, and tool state outside the prompt and restore only what is needed.
Long-term memory is moving into persistent stores
User preferences, recurring constraints, learned procedures, and prior session summaries are increasingly stored in retrievable external memory rather than raw chat history.
Shared memory is becoming a separate scope
Project, team, and organization knowledge is increasingly separated from personal memory by namespace, scope, or memory block so retrieval stays stable.
Compression and updates are moving off the hot path
Post-session summarization, context compaction, and reflection agents are increasingly handled asynchronously so token cost and latency do not grow linearly with history.
Implementation Map
Current memory systems are easiest to understand as at least four layers
1. Working memory: active execution state
- This layer holds thread state, checkpoints, tool outputs, and the most recent observations needed for the current task.
- It is the fastest and cheapest layer, but also the shortest-lived.
- Google ADK session state, LangGraph persistence, and Azure thread or run state all fit here.
2. Persistent memory: long-term individual memory
- This layer stores user preferences, recurring constraints, reusable procedures, and important facts from prior work.
- Vector retrieval, profile stores, key-value records, and document collections are the most common implementations.
- Mem0 user / agent / session memory, Letta archival memory, and AWS session summary retention all fit this layer.
3. Shared memory: project or organizational memory
- This layer stores rules, glossaries, and ongoing project state shared across agents or users.
- It needs to stay separate from personal memory or retrieval becomes noisy and inconsistent.
- Deep Agents organization-level memory, Letta shared memory, and Mem0 org scope are examples.
4. Consolidation layer: compression, reflection, forgetting
- This layer decides what to summarize, what to delete, what to promote, and what to rewrite.
- It matters for both cost and accuracy, which is why it is becoming a major differentiator in frontier systems.
- AWS memory summarization, Microsoft compaction, and Deep Agents background consolidation are representative examples.
Why It Matters
This shift matters because memory is becoming a decision substrate, not a storage feature
Accuracy now diverges at the memory layer, not only at the model layer
Two agents using the same model can still behave very differently depending on whether they remember who the user is, what was approved last week, and which constraints are still active.
Sending everything every turn does not scale
ADK context compaction and AWS session summarization both make the same point in public docs: practical systems are already moving away from replaying full histories on every invocation.
Memory write rules are becoming governance rules
What gets saved automatically, what stays read-only, and who can update shared memory are no longer minor implementation details. They define audit and failure boundaries.
The impressive part is not recall alone, but controlled recall
The frontier value is not having more stored text. It is being able to control scope, priority, update timing, and compression rules. That is the real break from older retrieval-only designs.
Read Path
Strong memory systems are defined more by how they retrieve than by where they store
Scope first
The system first narrows the search to user, agent, project, or organization scope. Deep Agents and Mem0 emphasize scope because retrieval quality drops quickly when scopes are mixed.
Then filter by memory type
Preferences, facts, prior procedures, and shared policies should not be fetched the same way. CoALA's split between episodic, semantic, and procedural memory is useful here.
Inject only what is needed
LangGraph and Deep Agents both point toward selective loading instead of eagerly placing all memory into the prompt. This reduces context pollution and token waste.
Join retrieved memory with live state
Persistent memory alone is not enough. Good systems combine it with the current task state, unfinished tool calls, and approval flags before acting. Retrieval needs a state join.
The practical implication is that teams often get better results by designing scope filtering and memory type selection before they optimize embeddings.
Many failures come less from too little memory than from fetching the wrong memory.
Write Path
Frontier systems separate not only what to store, but when to store it
Write during the interaction
Confirmed user preferences and durable project rules are often worth updating immediately. This is where writable memory blocks and file-backed memory are useful.
Summarize after the interaction
AWS Bedrock summarizes sessions asynchronously after they end. That is a practical way to retain long interactions without slowing down the active path.
Compact after a threshold
ADK context compaction uses a sliding window and overlap size to summarize older workflow events. This is especially useful in long multi-step tasks where context growth otherwise becomes expensive.
Delegate restructuring to a separate agent
Deep Agents background consolidation and Letta-style sleeptime patterns show a newer approach: let another agent reorganize memory between active runs. This is where memory starts to look like agent policy.
The key is not to make every memory write immediate. Immediate writes are responsive but noisy. Delayed writes are cleaner but can miss constraints needed for the next step. Strong systems explicitly separate immediate writes, post-session summaries, and disposable observations.
Graph Memory
Knowledge-graph style memory is attracting attention because it handles relations and change better
Graph memory matters because semantic similarity is only part of the problem. Vector retrieval is good at finding related text, but weaker at representing who reports to whom, which approvals apply to which workflow, and which fact supersedes an older one. That is why Mem0 highlights graph memory and why newer work such as A-MEM, MemoryOS, and MemInsight puts more emphasis on restructuring and relation-aware recall.
Where it helps
It is strongest when people, projects, deadlines, dependencies, and approval paths all matter at once. Support, sales, procurement, and research workflows are common examples.
Why it can be better
It can retrieve the right connection rather than the closest paragraph. It also makes partial updates easier when one fact changes but the broader structure remains valid.
Why it is harder
It needs entity extraction, normalization, deduplication, and some form of schema discipline. The setup cost is higher than a flat vector store.
How to adopt it safely
Do not graph everything first. Start with areas where relations are explicit, such as approval chains, customer-account links, or project dependency state.
Blueprints
The best near-term implementations combine multiple memory roles instead of one universal store
Personal assistant: profile memory plus session summaries
Keep user preferences, recurring habits, and standing constraints in a profile store. Keep live state in the session. After the interaction, promote only what will matter next time. This is close to the public split visible in OpenAI's saved memories versus chat history and AWS session summary retention.
Coding agent: repo memory, file memory, and checkpoints
Keep coding rules and review norms in repo-local memory files, hold active task state in checkpoints, and promote only durable learnings into file-backed memory. This makes diffs and audits easier while keeping the live prompt lean.
Support workflow: customer memory, policy memory, and approval graph
Customer history belongs in user memory, refund or legal rules belong in shared policy memory, and exception approval paths fit naturally into graph memory. That keeps personalization and compliance from colliding in one retrieval result.
Research agent: source cache, evidence memory, and contradiction log
Raw documents belong in a cache, validated claims belong in evidence memory, and disagreements belong in a contradiction log. For long investigations, that often works better than one flat retrieval layer.
Operational Risks
Memory systems usually break first on update conflicts and staleness, not on retrieval recall alone
Multiple agents rewrite the same shared memory
As Letta's docs imply, append and targeted replace are relatively safe, but simultaneous full rewrites push systems toward last-writer-wins behavior.
Old memory overrides newer facts
The more long-term memory grows, the more important timestamps, source tracking, and expiry become. Without them, durable memory can become confidently wrong memory.
Shared and personal memory are mixed together
Once user-specific preferences and organizational rules live in one namespace, correctness suffers even if retrieval looks convenient. Scope separation is a correctness requirement, not just a performance trick.
No one evaluates the write path
Many teams test retrieval quality but not what the system actually writes into memory. In practice, some of the most serious failures originate in bad writes rather than bad reads.
Unresolved Problems
The core weakness today is not failure to remember, but immature policies for what to keep and what to forget
The most important pattern in the research is that memory has moved past the binary question of whether it exists. The harder question is now how it is updated. LoCoMo and LongMemEval both suggest that long-term interaction, updated facts, and multi-session reasoning remain unstable even for strong assistants and long-context models. That means memory quality cannot be solved by retrieval volume alone. Systems need explicit policies for what gets written, promoted, expired, and rewritten.
The write path is less mature than the read path
Many systems can retrieve memory, but fewer can decide reliably what deserves to be written. That leads either to noisy memory or to brittle under-memory.
Old memory and new facts still collide
Without timestamps, source tracking, and expiry, long-term memory often returns answers that sound right but are no longer current. This is a structural weakness of flat retrieval.
Shared memory creates update conflicts
When multiple agents rewrite the same shared block or store, duplication, loss, and last-writer-wins behavior become likely. Letta's warnings around heavy rewrites point directly at this issue.
Mixed scopes damage correctness
Once personal preferences, organizational policies, and project state live in one namespace, the system may retrieve something relevant but still use the wrong premise.
Memory evaluation is still thin
Most teams test retrieval quality more than memory write quality. In practice, some of the largest failures come from bad writes that persist and spread.
Vendor Responses
What differs across vendors is not whether memory exists, but which part of the problem each stack tries to solve
OpenAI and Anthropic separate memory types
OpenAI separates saved memories from chat history, while Anthropic separates CLAUDE.md from auto memory. Both moves reduce the pressure to treat raw conversation as the only memory substrate.
Google ADK and AWS turn compression into a system feature
ADK exposes context compaction with thresholds and overlap, while AWS Bedrock runs asynchronous summarization after sessions end. Both are direct responses to cost and latency pressure.
LangGraph and Deep Agents emphasize scope and permissions
Agent-scoped, user-scoped, and organization-level memory, plus read-only versus writable memory and background consolidation, are practical responses to shared-state and write-quality problems.
Letta controls shared-memory contention at the block level
Its split between insert, replace, and rethink operations, together with the idea of a memory owner, is a concrete answer to concurrent edits in multi-agent systems.
Mem0 and newer papers lean toward graph and reflective updates
Mem0's graph memory and papers such as A-MEM, MemoryOS, Reflective Memory Management, and MemInsight all treat memory as something to restructure and govern, not just to retrieve.
The common thread is clear even if the product shapes differ. Type separation, scope separation, compression, asynchronous updates, and control over shared writes are all becoming first-class design concerns.
Use Cases
Memory becomes essential when work depends on continuity, handoff, or compliance rather than on one-shot generation
Personal assistants need continuity
The value is not generic recall. It is not having to renegotiate preferences, standing constraints, or recurring habits every time.
Coding agents need lower rediscovery cost
Repo rules, prior fixes, danger zones, and unfinished work all benefit from durable memory. This is why file-backed memory is so attractive in coding workflows.
Support and ops need continuity plus compliance
Customer context, policy memory, and approval routes all have to coexist without being confused with each other. That is where layered memory has clear economic value.
Multi-agent systems need reliable handoff
Subagents and specialists need to know what has been done, what remains uncertain, and which constraints are active. Here memory matters more for coordination than for personalization.
Research agents need multi-day evidence continuity
Primary sources, validated claims, contradictions, and open questions all need to persist in different forms. In that setting, memory behaves more like an evidence system than a chat feature.
Direction
The next competition is shifting from whether agents can remember to how well teams can control memory
The likely direction is further layering rather than convergence on one universal memory store. Short-term state, long-term profiles, shared memory, graph relations, file-backed memory, and background consolidation are more likely to be combined by use case than collapsed into one mechanism. The center of differentiation will probably move from retrieval quality alone toward write policy, forgetting policy, and memory auditability.
Vector retrieval will remain, but not as the whole story
It will likely stay as a major long-term memory substrate, but increasingly alongside structured profiles, graph memory, and file-based memory.
Forgetting and consolidation will become core features
In durable systems, the ability to drop or compress the right memories matters as much as the ability to keep them.
Memory will expand from personalization into control-plane infrastructure
Memory is likely to support workflow state, approvals, handoffs, and policy enforcement, not just user convenience.
Evaluation will expand from read quality to write quality
Future benchmarks and production monitoring will likely focus more on what gets written, whether it ages well, and whether it should have been retained at all.
Method Comparison
The most common approach and the most attention-grabbing approach are not exactly the same
Context-only and prompt-pinned memory
CLAUDE.md, system prompts, and pinned memory blocks are still widely used. When curated well they can be highly accurate, but capacity is scarce and updates are expensive.
Vector retrieval is the most widespread long-term memory base
Across LangGraph, Mem0, Letta, Azure, and ADK-style setups, retrieving only what matters remains the most common pattern. It is practical and token-efficient, but weaker on chronology, conflict resolution, and relations.
Structured profiles and scoped state are operationally strong
Keeping preferences and project settings as explicit records or state objects makes updates easier to govern and often reduces false retrieval. This is one reason it shows up so often in production-oriented docs.
Graph memory is the rising accuracy bet
Graph memory makes entities and relations explicit, which helps with who-knows-what, dependency chains, and changing constraints. Mem0 and several recent papers push in this direction, though the implementation cost is higher.
File-backed memory is strong for coding agents
Letta Code's MemFS and Claude Code's project memory show why editable file-based memory matters. In coding workflows, auditability and diffability can matter more than embedding similarity.
Shared memory is powerful but fragile
Once multiple agents write to the same memory, drift, duplication, and stale knowledge become more likely. Read-only versus writable separation and audit traces become important quickly.
Research Signal
The paper trail suggests flat RAG alone is not enough for durable memory quality
The research path is fairly consistent. Generative Agents and MemoryBank showed early that long-term memory works better when events are summarized and reflected on before being stored. MemGPT formalized the idea of moving memory outside the context window, while CoALA framed working, episodic, semantic, and procedural memory as an architecture. LoCoMo and LongMemEval then made the weakness of long-horizon memory visible: even commercial assistants and long-context models degrade noticeably when questions depend on time, updated facts, or multiple sessions. Newer work such as A-MEM, MemoryOS, Reflective Memory Management, and MemInsight increasingly treats memory writing, restructuring, and scheduling as agent policies rather than as passive retrieval. That is the important shift. Memory quality is becoming a systems behavior, not just a retrieval feature. Cross-paper numeric comparisons still need caution, because the benchmarks, base models, and write policies differ.
Summarization and reflection become core memory ideas
Generative Agents, MemoryBank, and MemGPT all point away from replaying raw history and toward compressing and managing memory outside the main prompt.
Benchmarks expose long-term memory weakness
LoCoMo and LongMemEval show that time, multi-session interaction, and updated facts remain hard even for strong models and assistants.
Memory update strategy becomes a first-class research target
A-MEM, MemoryOS, Reflective Memory Management, and MemInsight all move toward explicit policies for what gets written, how it gets reorganized, and when it should be recalled.
Adoption Guidance
The most efficient design today is layered memory with clear scope boundaries
Personal assistants
Separate session state from user profile memory, then promote only important items after the interaction ends. Keeping everything forever increases both token cost and false recall.
Coding agents
Keep project memory in files or repo-local rules, use checkpoints for active state, and keep long-term learnings in searchable logs. Human-readable diffs matter here.
Multi-agent workflows
Keep each agent's working memory separate and treat shared memory as mostly read-only unless there is strong coordination logic. Free-form shared writes are hard to audit and hard to trust.
High-accuracy workflows
Do not rely on vector retrieval alone. Combine structured profiles, graph relations, and deterministic checks before approval or external action, especially when facts can change over time.
Taking the public evidence together, the most adopted implementation pattern is not a standalone vector database. It is a combination of
session state + persistent store + selective retrieval + asynchronous consolidation. If shared memory is added, scope and write permissions need to be designed first.
Graph memory and file-backed memory are best introduced where accuracy or auditability requirements are already high.
Takeaway
Agent memory is becoming a competition in state and knowledge management, not just in context length
There is no single universal winner yet, but the common design direction is already visible. Short-term execution state is managed in sessions and checkpoints, long-term knowledge is moved into external stores, shared memory is separated by scope, and compression plus updates are handled asynchronously. That layered memory design currently offers the best balance of cost, speed, accuracy, and governance. The frontier question is no longer whether to add memory. It is how to make graph memory, file-backed memory, and reflective memory updates operationally reliable on top of the layered base.