Managing AI Agent Session State in FastAPI
Free · Open source (MIT) · Works with LangChain, CrewAI, AutoGen · No signup
When building AI agents in FastAPI, your biggest headache isn't the AI logic—it's keeping track of what each user said three messages ago. Session state disappears between requests, conversations reset on server restarts, and scaling horizontally means losing context entirely. You need persistent memory that survives everything.
The Problem with Stateless FastAPI Agents
FastAPI is stateless by design. Every request starts fresh with no memory of previous interactions. For simple APIs, this is perfect. For AI agents that need conversation context, it's a nightmare.
Here's what breaks:
# This doesn't work - conversation context is lost
@app.post("/chat")
async def chat(message: str):
# Where's the conversation history?
response = llm.generate(message) # No context!
return {"response": response}
Your agent can't remember user preferences, previous questions, or maintain context across a multi-turn conversation. Store state in memory variables? Gone on restart. Use global dictionaries? Doesn't scale across multiple processes. Session cookies help with user identification but don't solve persistence.
The result: users get frustrated repeating themselves, agents give inconsistent answers, and your "intelligent" assistant feels pretty dumb.
The Fix: Persistent Agent Memory
Install BotWire Memory to add persistent key-value storage to any FastAPI agent:
pip install botwire
from fastapi import FastAPI
from botwire import Memory
app = FastAPI()
@app.post("/chat")
async def chat(user_id: str, message: str):
# Each user gets isolated memory
memory = Memory(f"agent-{user_id}")
# Get conversation history
history = memory.get("conversation") or []
history.append({"role": "user", "content": message})
# Generate response with context
response = your_llm_call(history)
history.append({"role": "assistant", "content": response})
# Persist the updated conversation
memory.set("conversation", history)
return {"response": response}
That's it. Your agent now remembers conversations across requests, server restarts, and deployments.
How It Works
BotWire creates isolated memory namespaces per user. The Memory("agent-{user_id}") call gives each user their own persistent key-value store that survives everything.
from botwire import Memory
# User-specific memory spaces
alice_memory = Memory("agent-alice")
bob_memory = Memory("agent-bob")
# Store any JSON-serializable data
alice_memory.set("preferences", {"language": "spanish", "tone": "formal"})
alice_memory.set("context", "discussing quarterly sales")
# Retrieve across requests/restarts
prefs = alice_memory.get("preferences") # {"language": "spanish", "tone": "formal"}
context = alice_memory.get("context") # "discussing quarterly sales"
Memory operations are synchronous and fast. Data persists automatically—no explicit saves needed. Each namespace is isolated, so Alice never sees Bob's data.
For managing state lifecycle:
memory = Memory("agent-user123")
# List all keys in this user's memory
keys = memory.keys() # ["conversation", "preferences", "context"]
# Remove specific data
memory.delete("old_context")
# Clear everything for this user
for key in memory.keys():
memory.delete(key)
The memory survives process crashes, deployments, and horizontal scaling. Multiple FastAPI instances can share the same user state seamlessly.
LangChain Integration
If you're using LangChain, BotWire provides a drop-in chat history store:
from fastapi import FastAPI
from botwire import BotWireChatHistory
from langchain.chains import ConversationChain
from langchain.llms import OpenAI
app = FastAPI()
@app.post("/chat")
async def chat(user_id: str, message: str):
# Persistent chat history for this user
history = BotWireChatHistory(session_id=f"user-{user_id}")
# LangChain conversation with memory
conversation = ConversationChain(
llm=OpenAI(),
memory=ConversationBufferMemory(chat_memory=history)
)
response = conversation.predict(input=message)
return {"response": response}
The chat history persists automatically. Users can leave and return days later to continue exactly where they left off.
When NOT to Use BotWire
- Vector search or embeddings: BotWire is key-value storage, not a vector database. Use Pinecone or Weaviate for semantic search.
- High-throughput applications: The free tier caps at 1000 writes/day per namespace. Fine for user conversations, not for logging every API call.
- Sub-millisecond latency requirements: BotWire adds ~10-50ms per call. Great for chat agents, not for real-time gaming.
FAQ
Why not just use Redis? Redis requires setup, authentication, and infrastructure management. BotWire works immediately with zero configuration. For production apps with existing Redis, stick with Redis.
Is this actually free? Yes. 1000 writes/day per namespace, 50MB storage per namespace, unlimited reads. No API keys, no credit cards, no surprises.
What about data privacy? Data is stored on BotWire's servers. For sensitive applications, self-host the open-source version—it's a single FastAPI service with SQLite.
---
Stop losing conversation context on every restart. Install BotWire Memory and give your FastAPI agents the persistent memory they need: pip install botwire. Get started at botwire.dev.