Persistent Memory for LlamaIndex Agents
Free · Open source (MIT) · Works with LangChain, CrewAI, AutoGen · No signup
Your LlamaIndex agent forgets everything between calls. Chat context disappears, personalization resets, and multi-turn conversations break. This happens because QueryEngine and Agent instances are stateless by default — they only remember what you pass in each request. Here's how to add persistent memory that survives restarts and process crashes.
Why LlamaIndex Agents Lose Memory
LlamaIndex agents are designed to be stateless for simplicity and scalability. Each query() or chat() call is independent, which works great for one-shot questions but breaks conversational flows.
Consider this typical pattern:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.agent import ReActAgent
# Load your data
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Create agent
agent = ReActAgent.from_tools(index.as_query_engine())
# First conversation works
response1 = agent.chat("What's our Q1 revenue?")
print(response1) # "Q1 revenue was $2.1M"
# But context is lost in follow-ups
response2 = agent.chat("What about Q2?")
print(response2) # Agent has no idea what "Q2" refers to
The agent answers the first question but loses context for "What about Q2?" because it doesn't persist the conversation history or learned context between calls.
The Fix: Add Persistent Memory
Install BotWire to add persistent key-value memory that survives across processes:
pip install botwire
Here's the same agent with persistent memory:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.agent import ReActAgent
from botwire import Memory
# Initialize persistent memory with a namespace
memory = Memory("user-session-123")
# Load your data and create agent
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
agent = ReActAgent.from_tools([index.as_query_engine()])
# Store conversation context
def chat_with_memory(user_input):
# Retrieve previous context
history = memory.get("chat_history") or []
# Add context to the query
context = "\n".join([f"User: {h['user']}\nAssistant: {h['assistant']}"
for h in history[-3:]]) # Last 3 exchanges
full_query = f"Previous context:\n{context}\n\nCurrent question: {user_input}"
# Get response
response = agent.chat(full_query if context else user_input)
# Store this exchange
history.append({"user": user_input, "assistant": str(response)})
memory.set("chat_history", history)
return response
# Now context persists
response1 = chat_with_memory("What's our Q1 revenue?")
response2 = chat_with_memory("What about Q2?") # Agent remembers Q1 context
How It Works
The Memory class provides persistent key-value storage that survives process restarts. Each namespace is isolated, so you can have separate memory spaces for different users or conversations.
Key operations:
from botwire import Memory
memory = Memory("my-namespace")
# Store any JSON-serializable data
memory.set("user_preferences", {"theme": "dark", "language": "en"})
memory.set("conversation_count", 42)
memory.set("last_topic", "revenue analysis")
# Retrieve with optional defaults
prefs = memory.get("user_preferences", {})
count = memory.get("conversation_count", 0)
# List all keys in this namespace
all_keys = memory.list_keys()
# Delete specific keys
memory.delete("old_data")
For production use, add error handling and memory management:
def chat_with_managed_memory(user_input, user_id):
memory = Memory(f"user-{user_id}")
try:
# Limit history to prevent memory bloat
history = memory.get("chat_history", [])
if len(history) > 50: # Keep last 50 exchanges
history = history[-50:]
memory.set("chat_history", history)
# Your chat logic here
context = build_context(history)
response = agent.chat(f"{context}\n{user_input}")
# Store with metadata
history.append({
"user": user_input,
"assistant": str(response),
"timestamp": time.time()
})
memory.set("chat_history", history)
return response
except Exception as e:
print(f"Memory error: {e}")
# Fallback to stateless mode
return agent.chat(user_input)
Memory persists across processes and machines. Stop your Python script, restart it, and the conversation history is still there.
LlamaIndex Chat Engine Integration
For LlamaIndex's built-in chat engines, you can persist the entire chat history:
from llama_index.core.chat_engine import SimpleChatEngine
from llama_index.core.llms import OpenAI
from llama_index.core.memory import ChatMemoryBuffer
from botwire import Memory
class PersistentChatMemory(ChatMemoryBuffer):
def __init__(self, namespace, token_limit=3000):
super().__init__(token_limit=token_limit)
self.memory = Memory(namespace)
# Load existing messages
stored_messages = self.memory.get("chat_messages", [])
for msg_data in stored_messages:
self.put(msg_data["role"], msg_data["content"])
def put(self, role, content):
super().put(role, content)
# Persist to BotWire
messages = [{"role": msg.role.value, "content": msg.content}
for msg in self.get_all()]
self.memory.set("chat_messages", messages)
# Use with any LlamaIndex chat engine
llm = OpenAI()
persistent_memory = PersistentChatMemory("user-456")
chat_engine = SimpleChatEngine.from_defaults(
llm=llm,
memory=persistent_memory
)
# Conversation state persists across restarts
response = chat_engine.chat("Remember that I prefer technical details")
# Later, in a new process...
response = chat_engine.chat("Explain the architecture") # Remembers preference
When NOT to Use BotWire
BotWire isn't the right choice if you need:
• Vector similarity search — Use Pinecone, Weaviate, or Qdrant for semantic search over embeddings • High-throughput applications — The HTTP API adds ~50-100ms latency; use Redis for sub-millisecond access • Complex queries — This is key-value storage, not a database; use PostgreSQL for relational data
FAQ
Why not just use Redis? Redis requires setup, authentication, and infrastructure management. BotWire works immediately with no configuration, and the free tier covers most development needs.
Is this actually free? Yes — 1000 writes per day per namespace, 50MB storage, unlimited reads. No credit card required. You can also self-host the open-source version.
What about data privacy? Data is stored on BotWire servers by default. For sensitive applications, self-host using the MIT-licensed code at github.com/pmestre-Forge/signal-api — it's a single FastAPI service with SQLite.
Get Started
Add persistent memory to your LlamaIndex agents in two lines of code. Install with pip install botwire and check the docs at https://botwire.dev.