Streaming LLM Chats With Persistent Memory

Free · Open source (MIT) · Works with LangChain, CrewAI, AutoGen · No signup

When building streaming LLM chats with Server-Sent Events or WebSockets, your conversation memory disappears every time the stream ends. Users lose context between messages, agents forget previous interactions, and your chat feels broken. This happens because streaming responses are stateless — once the connection closes, everything's gone.

The Problem: Streaming Chats Lose Memory

Streaming chat implementations face a fundamental issue: streaming llm chat persistent memory doesn't exist by default. Each SSE stream or WebSocket message exists in isolation.

Here's what breaks:

# This doesn't work across streams
chat_history = []  # Lost when stream ends

async def stream_response(message):
    chat_history.append({"role": "user", "content": message})
    # Stream LLM response...
    # Connection closes, chat_history disappears

Your LLM sees only the current message, not the conversation history. Users ask "What did I just say?" and the AI responds "I don't have context." Multi-turn conversations become impossible. Chat stream persistence requires external storage that survives process restarts, connection drops, and server deployments.

Without persistent memory, streaming chats are just expensive single-message APIs.

The Fix: Persistent Memory for Streaming Chats

Install BotWire Memory to persist conversation state across streams:

pip install botwire

Here's a working streaming chat with persistent memory:

from botwire import Memory
import asyncio

# Initialize persistent memory
memory = Memory("chat-app")

async def streaming_chat(user_id: str, message: str):
    # Get conversation history from persistent storage
    history_key = f"user:{user_id}:history"
    history = memory.get(history_key) or []
    
    # Add user message to persistent history
    history.append({"role": "user", "content": message})
    memory.set(history_key, history)
    
    # Your LLM streaming logic here
    # LLM now has full conversation context
    
    # Save assistant response to memory
    response = await stream_llm_response(history)
    history.append({"role": "assistant", "content": response})
    memory.set(history_key, history)
    
    return response

This streaming chat memory persists across connections, server restarts, and deployments.

How It Works

The code above solves llm streaming state persistence in three steps:

  1. Retrieve: Load existing conversation history from BotWire's persistent storage
  2. Stream: Use full history for LLM context during streaming
  3. Persist: Save the complete conversation back to storage

Memory Management Patterns

Handle memory lifecycle with these patterns:

from botwire import Memory

memory = Memory("chat-sessions")

# Set TTL for auto-cleanup (optional)
memory.set("user:123:history", history, ttl=86400)  # 24 hours

# List all conversations
user_sessions = memory.list_keys("user:*")

# Clear specific conversation
memory.delete("user:123:history")

# Check if conversation exists
has_history = "user:123:history" in memory

Cross-process memory sharing works automatically. Multiple server instances, background workers, and different services can access the same conversation state. No Redis cluster, no shared databases — just persistent key-value storage.

LLM streaming state remains consistent even during:

The memory namespace isolates different applications. Use Memory("prod-chat") vs Memory("dev-chat") for environment separation.

LangChain Integration

For LangChain streaming implementations, use the built-in adapter:

from botwire import BotWireChatHistory
from langchain.memory import ConversationBufferMemory
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Persistent chat history
chat_history = BotWireChatHistory(session_id="user-42")

# LangChain memory with persistence
memory = ConversationBufferMemory(
    chat_memory=chat_history,
    return_messages=True
)

# Your streaming chain now remembers everything
chain = LLMChain(
    llm=your_llm,
    memory=memory,
    callbacks=[StreamingStdOutCallbackHandler()]
)

The BotWireChatHistory adapter handles conversation persistence automatically. Your streaming responses maintain context without additional code.

When NOT to Use BotWire

BotWire Memory isn't suitable for:

Vector search or embeddings — it's key-value storage, not a semantic database • High-frequency writes — free tier limits to 1000 writes/day per namespace • Sub-millisecond latency — HTTP API adds ~50ms overhead vs in-memory caches

FAQ

Why not just use Redis? Redis requires setup, authentication, and infrastructure management. BotWire works instantly with zero configuration — no Redis cluster, no connection strings, no ops overhead.

Is this actually free? Yes, 1000 writes/day per namespace forever. Unlimited reads. No credit card, no trial expiration. You'll hit API rate limits before pricing limits.

What about data privacy? Self-host the open-source version (MIT license) for full control, or use the hosted API for convenience. Data isn't used for training or analytics.

Get Started

BotWire Memory solves streaming chat persistence in one install. Your conversations survive restarts, scale across processes, and work without infrastructure setup.

pip install botwire

Full documentation and examples at https://botwire.dev.

Install in one command:

pip install botwire

Start free at botwire.dev