Streaming LLM Chats With Persistent Memory
Free · Open source (MIT) · Works with LangChain, CrewAI, AutoGen · No signup
When building streaming LLM chats with Server-Sent Events or WebSockets, your conversation memory disappears every time the stream ends. Users lose context between messages, agents forget previous interactions, and your chat feels broken. This happens because streaming responses are stateless — once the connection closes, everything's gone.
The Problem: Streaming Chats Lose Memory
Streaming chat implementations face a fundamental issue: streaming llm chat persistent memory doesn't exist by default. Each SSE stream or WebSocket message exists in isolation.
Here's what breaks:
# This doesn't work across streams
chat_history = [] # Lost when stream ends
async def stream_response(message):
chat_history.append({"role": "user", "content": message})
# Stream LLM response...
# Connection closes, chat_history disappears
Your LLM sees only the current message, not the conversation history. Users ask "What did I just say?" and the AI responds "I don't have context." Multi-turn conversations become impossible. Chat stream persistence requires external storage that survives process restarts, connection drops, and server deployments.
Without persistent memory, streaming chats are just expensive single-message APIs.
The Fix: Persistent Memory for Streaming Chats
Install BotWire Memory to persist conversation state across streams:
pip install botwire
Here's a working streaming chat with persistent memory:
from botwire import Memory
import asyncio
# Initialize persistent memory
memory = Memory("chat-app")
async def streaming_chat(user_id: str, message: str):
# Get conversation history from persistent storage
history_key = f"user:{user_id}:history"
history = memory.get(history_key) or []
# Add user message to persistent history
history.append({"role": "user", "content": message})
memory.set(history_key, history)
# Your LLM streaming logic here
# LLM now has full conversation context
# Save assistant response to memory
response = await stream_llm_response(history)
history.append({"role": "assistant", "content": response})
memory.set(history_key, history)
return response
This streaming chat memory persists across connections, server restarts, and deployments.
How It Works
The code above solves llm streaming state persistence in three steps:
- Retrieve: Load existing conversation history from BotWire's persistent storage
- Stream: Use full history for LLM context during streaming
- Persist: Save the complete conversation back to storage
Memory Management Patterns
Handle memory lifecycle with these patterns:
from botwire import Memory
memory = Memory("chat-sessions")
# Set TTL for auto-cleanup (optional)
memory.set("user:123:history", history, ttl=86400) # 24 hours
# List all conversations
user_sessions = memory.list_keys("user:*")
# Clear specific conversation
memory.delete("user:123:history")
# Check if conversation exists
has_history = "user:123:history" in memory
Cross-process memory sharing works automatically. Multiple server instances, background workers, and different services can access the same conversation state. No Redis cluster, no shared databases — just persistent key-value storage.
LLM streaming state remains consistent even during:
- Server deployments (memory persists)
- Connection drops (history intact)
- Process crashes (data survives)
- Horizontal scaling (shared across instances)
The memory namespace isolates different applications. Use Memory("prod-chat") vs Memory("dev-chat") for environment separation.
LangChain Integration
For LangChain streaming implementations, use the built-in adapter:
from botwire import BotWireChatHistory
from langchain.memory import ConversationBufferMemory
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# Persistent chat history
chat_history = BotWireChatHistory(session_id="user-42")
# LangChain memory with persistence
memory = ConversationBufferMemory(
chat_memory=chat_history,
return_messages=True
)
# Your streaming chain now remembers everything
chain = LLMChain(
llm=your_llm,
memory=memory,
callbacks=[StreamingStdOutCallbackHandler()]
)
The BotWireChatHistory adapter handles conversation persistence automatically. Your streaming responses maintain context without additional code.
When NOT to Use BotWire
BotWire Memory isn't suitable for:
• Vector search or embeddings — it's key-value storage, not a semantic database • High-frequency writes — free tier limits to 1000 writes/day per namespace • Sub-millisecond latency — HTTP API adds ~50ms overhead vs in-memory caches
FAQ
Why not just use Redis? Redis requires setup, authentication, and infrastructure management. BotWire works instantly with zero configuration — no Redis cluster, no connection strings, no ops overhead.
Is this actually free? Yes, 1000 writes/day per namespace forever. Unlimited reads. No credit card, no trial expiration. You'll hit API rate limits before pricing limits.
What about data privacy? Self-host the open-source version (MIT license) for full control, or use the hosted API for convenience. Data isn't used for training or analytics.
Get Started
BotWire Memory solves streaming chat persistence in one install. Your conversations survive restarts, scale across processes, and work without infrastructure setup.
pip install botwire
Full documentation and examples at https://botwire.dev.