# Memory Mechanism Analysis
## Overview
nanobot implements a **two-layer memory system** that gives the AI agent persistent recall across conversations. The design balances three competing concerns:
1. **LLM context window limits** — conversations grow unbounded, but LLMs have finite context
2. **LLM prompt cache efficiency** — modifying earlier messages invalidates the cache prefix
3. **Long-term knowledge retention** — the agent should remember facts and events across sessions
The solution: an **append-only session** with a sliding **consolidation pointer**, backed by two persistent Markdown files — one for facts (loaded into every prompt) and one for events (searchable via grep).
## Architecture
```mermaid
graph TB
subgraph "Runtime (Per Request)"
User[User Message]
Session[Session
messages: list]
History[get_history
unconsolidated tail]
Context[ContextBuilder
system prompt]
LLM[LLM Provider]
end
subgraph "Persistent Storage"
JSONL["sessions/{key}.jsonl
Append-only JSONL"]
MEMORY["memory/MEMORY.md
Long-term facts"]
HISTORYMD["memory/HISTORY.md
Event log"]
end
subgraph "Consolidation (Background)"
Trigger{Messages ≥
memory_window?}
Consolidator[MemoryStore.consolidate
LLM-driven summarization]
SaveTool[save_memory tool call]
end
User --> Session
Session --> History
History --> Context
MEMORY --> Context
Context --> LLM
LLM --> Session
Session --> JSONL
Session --> Trigger
Trigger -->|Yes| Consolidator
Consolidator --> SaveTool
SaveTool --> MEMORY
SaveTool --> HISTORYMD
style MEMORY fill:#e8f5e9
style HISTORYMD fill:#fff3e0
style Session fill:#e3f2fd
```
## The Two Memory Layers
### Layer 1: `MEMORY.md` — Long-term Facts
- **Location**: `~/.nanobot/workspace/memory/MEMORY.md`
- **Content**: Structured Markdown with enduring facts — user preferences, project context, relationships, configuration details
- **Update pattern**: **Full overwrite** — the consolidation LLM rewrites the entire file, merging existing facts with new ones
- **Loaded into**: Every system prompt via `ContextBuilder.build_system_prompt()` → `MemoryStore.get_memory_context()`
- **Size**: Grows slowly (facts are deduplicated and merged by the LLM)
Example content:
```markdown
# User Preferences
- Prefers dark mode
- Timezone: UTC+8
# Project Context
- Working on nanobot, an AI assistant framework
- Uses Python 3.11+, pytest for testing
- API key stored in ~/.nanobot/config.json
```
### Layer 2: `HISTORY.md` — Event Log
- **Location**: `~/.nanobot/workspace/memory/HISTORY.md`
- **Content**: Timestamped paragraph summaries of past conversations
- **Update pattern**: **Append-only** — new entries are appended at the end
- **Loaded into**: **NOT loaded into context** — too large; the agent searches it with `grep` via the `exec` tool
- **Size**: Grows continuously (one entry per consolidation cycle)
Example content:
```markdown
[2026-03-10 14:30] User asked about configuring Telegram bot. Discussed
bot token setup, allowFrom whitelist, and proxy configuration. User chose
to use SOCKS5 proxy at 127.0.0.1:1080.
[2026-03-12 09:15] Debugged a session corruption issue. The problem was
orphaned tool_call_id references after a partial consolidation. Fixed by
deleting the session file and restarting.
```
### How They Work Together
| Aspect | MEMORY.md | HISTORY.md |
|--------|-----------|------------|
| Purpose | "What I know" | "What happened" |
| Analogy | A person's knowledge/beliefs | A person's diary |
| In prompt? | Yes (always) | No (too large) |
| Searchable? | Via context (LLM sees it) | Via `grep -i "keyword" memory/HISTORY.md` |
| Update | Overwrite (merge new + old) | Append (new entries at end) |
| Growth | Slow (deduplicated) | Linear (one entry per consolidation) |
## Session Model
### Append-Only Messages
The `Session` dataclass (`nanobot/session/manager.py`) stores all messages in a `list[dict]`:
```python
@dataclass
class Session:
key: str # "channel:chat_id"
messages: list[dict[str, Any]] # Append-only
last_consolidated: int = 0 # Consolidation pointer
```
**Critical design rule**: Messages are **never modified or deleted**. This preserves LLM prompt cache prefixes — if earlier messages change, the entire cache is invalidated.
### The `last_consolidated` Pointer
The `last_consolidated` field is an integer index that tracks how far consolidation has progressed:
```
messages: [m0, m1, m2, ..., m14, m15, ..., m24, m25, ..., m59]
↑ ↑ ↑
0 15 59
│ │
└─ already consolidated ┘ ← last_consolidated = 15
│ │
└── unconsolidated ──────┘
```
- `messages[0:last_consolidated]` — already processed by consolidation (summaries in MEMORY.md/HISTORY.md)
- `messages[last_consolidated:]` — not yet consolidated (sent to LLM via `get_history()`)
### History Retrieval
`Session.get_history()` returns only **unconsolidated** messages, with safety checks:
1. Slice from `last_consolidated` to end
2. Trim to `max_messages` (default: 500) from the tail
3. Align to a user turn (drop leading non-user messages)
4. Remove orphaned tool results (tool_call_id without matching assistant tool_calls)
5. Iteratively remove incomplete tool_call groups (assistant with tool_calls but missing results)
This cleanup is essential because consolidation can advance `last_consolidated` to a point that splits a tool_call/tool_result pair across the boundary.
## Consolidation Process
### Trigger
Consolidation is triggered in `AgentLoop._process_message()` when:
```python
unconsolidated = len(session.messages) - session.last_consolidated
if unconsolidated >= self.memory_window and session.key not in self._consolidating:
# Launch background consolidation task
```
The `memory_window` defaults to **100 messages** (configurable via `agents.defaults.memoryWindow`).
### Execution Flow
```mermaid
sequenceDiagram
participant Loop as AgentLoop
participant Store as MemoryStore
participant LLM as LLM Provider
participant FS as File System
Loop->>Loop: unconsolidated >= memory_window?
Loop->>Loop: asyncio.create_task()
Note over Loop: Background task starts
Loop->>Store: consolidate(session, provider, model)
Store->>Store: keep_count = memory_window // 2
Store->>Store: old = messages[last_consolidated:-keep_count]
Store->>Store: Format old messages as text
Store->>FS: Read MEMORY.md (current facts)
Store->>LLM: chat(system="consolidation agent",
user="current memory + conversation",
tools=[save_memory])
LLM-->>Store: tool_call: save_memory(
history_entry="[2026-03-15] ...",
memory_update="# Updated facts...")
Store->>FS: Append history_entry to HISTORY.md
Store->>FS: Overwrite MEMORY.md with memory_update
Store->>Store: session.last_consolidated = len(messages) - keep_count
Note over Loop: Background task completes
```
### Key Details
1. **`keep_count = memory_window // 2`** — With default `memory_window=100`, consolidation keeps the 50 most recent messages unconsolidated. The range `messages[last_consolidated:-50]` is sent to the consolidation LLM.
2. **LLM-driven consolidation** — A separate LLM call (using the same provider and model) acts as a "consolidation agent". It receives:
- The current `MEMORY.md` content
- The old messages formatted as `[timestamp] ROLE: content`
- A `save_memory` tool with two required parameters
3. **The `save_memory` tool** returns:
- `history_entry`: A 2-5 sentence timestamped paragraph (appended to HISTORY.md)
- `memory_update`: The full updated MEMORY.md content (existing facts + new facts)
4. **Pointer advance**: After successful consolidation, `last_consolidated` advances to `len(messages) - keep_count`, marking the consolidated range as processed.
### Concurrency Guards
The agent loop includes multiple protections against concurrent consolidation:
| Guard | Purpose | Implementation |
|-------|---------|---------------|
| `_consolidating: set[str]` | Prevents duplicate consolidation tasks for the same session | Checked before creating task; set/cleared around execution |
| `_consolidation_locks: WeakValueDictionary[str, Lock]` | Serializes consolidation for a session (normal + `/new` don't overlap) | `asyncio.Lock` per session key |
| `_consolidation_tasks: set[Task]` | Strong references prevent GC of in-flight tasks | Tasks added on create, removed on completion |
### The `/new` Command
The `/new` slash command starts a fresh session:
1. **Wait** for any in-flight consolidation to finish (acquires the consolidation lock)
2. **Archive** remaining unconsolidated messages with `archive_all=True`
3. **Clear** session messages and reset `last_consolidated` to 0
4. **Save** the empty session to disk
If archival fails, the session is **not cleared** — no data loss.
## Memory Skill (Always Active)
The `memory` skill (`nanobot/skills/memory/SKILL.md`) is marked `always: true`, meaning its content is loaded into every system prompt. It instructs the agent:
- **MEMORY.md** is loaded into context — write important facts there immediately
- **HISTORY.md** is NOT in context — search it with `grep -i "keyword" memory/HISTORY.md`
- Auto-consolidation handles old conversations automatically
- The agent can also manually update MEMORY.md via `edit_file` or `write_file`
## Data Flow Diagram
```mermaid
flowchart TD
subgraph "Each Request"
A([User Message]) --> B[Session.add_message]
B --> C{Get History}
C --> D[messages from last_consolidated]
D --> E[+ MEMORY.md via ContextBuilder]
E --> F[Send to LLM]
F --> G[LLM Response]
G --> H[Session.add_message]
end
subgraph "Background Consolidation"
H --> I{unconsolidated
≥ memory_window?}
I -->|No| J([Wait for next message])
I -->|Yes| K[Select old messages]
K --> L[Format as text]
L --> M[LLM: summarize + extract facts]
M --> N{save_memory tool called?}
N -->|No| O([Skip - consolidation failed])
N -->|Yes| P[Append to HISTORY.md]
N -->|Yes| Q[Overwrite MEMORY.md]
P --> R[Advance last_consolidated]
Q --> R
end
subgraph "Manual Access"
S[Agent uses grep on HISTORY.md]
T[Agent uses edit_file on MEMORY.md]
end
style E fill:#e8f5e9
style P fill:#fff3e0
style Q fill:#e8f5e9
```
## Edge Cases and Robustness
### Provider Returns Non-String Arguments
Some LLM providers return `save_memory` arguments as dicts or JSON strings instead of plain strings. The consolidation code handles both:
```python
args = response.tool_calls[0].arguments
if isinstance(args, str):
args = json.loads(args) # JSON string → dict
if entry := args.get("history_entry"):
if not isinstance(entry, str):
entry = json.dumps(entry) # dict → JSON string
```
This was a fix for [issue #1042](https://github.com/HKUDS/nanobot/issues/1042).
### LLM Fails to Call save_memory
If the consolidation LLM returns text instead of a tool call, `consolidate()` returns `False` and the pointer is not advanced. No data is lost — consolidation will retry on the next trigger.
### Consolidation Failure
All exceptions in `consolidate()` are caught and logged. The session pointer is not advanced, so the same messages will be re-processed on the next successful consolidation.
### Orphaned Tool Results After Consolidation
When `last_consolidated` advances mid-tool-call sequence, `get_history()` may encounter tool results without their corresponding assistant messages. The iterative cleanup algorithm in `get_history()` handles this by:
1. Tracking all `tool_call_id`s from assistant messages in the current window
2. Dropping tool results whose `tool_call_id` is not in the tracked set
3. Dropping assistant messages whose tool_calls don't all have results
4. Repeating until stable (cascading cleanup)
### Very Large Sessions
For sessions with 1000+ messages, consolidation processes `messages[last_consolidated:-keep_count]`, which could be hundreds of messages formatted as text. This is sent as a single LLM prompt. The LLM's context window is the practical limit.
## Configuration
| Setting | Path | Default | Effect |
|---------|------|---------|--------|
| Memory window | `agents.defaults.memoryWindow` | 100 | Consolidation triggers when unconsolidated messages reach this count |
| Keep count | (derived) | `memory_window // 2` | Number of recent messages kept unconsolidated after consolidation |
Lower `memory_window` values cause more frequent consolidation (smaller batches, more LLM calls). Higher values delay consolidation but send larger batches.
## File References
| Component | File | Key Functions |
|-----------|------|--------------|
| MemoryStore | `nanobot/agent/memory.py` | `consolidate()`, `get_memory_context()`, `read_long_term()`, `write_long_term()`, `append_history()` |
| Session | `nanobot/session/manager.py` | `add_message()`, `get_history()`, `clear()` |
| SessionManager | `nanobot/session/manager.py` | `get_or_create()`, `save()`, `_load()` |
| ContextBuilder | `nanobot/agent/context.py` | `build_system_prompt()` (injects MEMORY.md) |
| AgentLoop | `nanobot/agent/loop.py` | `_process_message()` (triggers consolidation), `_consolidate_memory()` |
| Memory skill | `nanobot/skills/memory/SKILL.md` | Agent instructions (always loaded) |
| save_memory tool | `nanobot/agent/memory.py:_SAVE_MEMORY_TOOL` | LLM tool schema for consolidation |
## Test Coverage
| Test File | What It Tests |
|-----------|--------------|
| `tests/test_consolidate_offset.py` | `last_consolidated` tracking, persistence, slice logic, boundary conditions, archive_all mode, cache immutability, concurrency guards, `/new` command behavior |
| `tests/test_memory_consolidation_types.py` | String/dict/JSON-string argument handling, no-tool-call fallback, skip-when-few-messages |
## Design Trade-offs
| Decision | Benefit | Cost |
|----------|---------|------|
| Append-only messages | LLM cache efficiency; no data loss | Messages list grows unbounded in memory until session is cleared |
| LLM-driven consolidation | High-quality summaries; fact extraction | Extra LLM API call per consolidation; cost |
| MEMORY.md full overwrite | Deduplication; coherent document | Risk of fact loss if LLM omits existing entries |
| HISTORY.md not in context | Keeps prompt size small | Agent must actively grep; may miss relevant history |
| Background consolidation | Non-blocking; doesn't delay user response | Race conditions require concurrency guards |
## Related Documentation
- [Architecture](02-architecture.md) — System design
- [Data Model](04-data-and-api.md) — Storage formats
- [Workflows](03-workflows.md) — Agent loop and tool execution
---
**Last Updated**: 2026-03-15
**Version**: 1.0