RAG + Memory: When to Use Each (and Both)

Free · Open source (MIT) · Works with LangChain, CrewAI, AutoGen · No signup

You're building an AI agent and can't figure out when to use RAG (vector search) versus persistent memory (key-value storage). Your agent either can't remember context between sessions or drowns in irrelevant retrieved documents. The confusion is real: when does semantic search beat simple state, and when do you need both working together?

The Problem: RAG vs Memory Isn't Either/Or

Most AI engineers treat RAG and memory as competing approaches, but they solve different problems. RAG excels at finding relevant information from large knowledge bases using semantic similarity. Memory handles state persistence — what the user told you last week, current conversation context, user preferences.

The pain hits when you build a customer support agent that can search your docs (RAG) but forgets the user's name between conversations. Or a coding assistant that remembers everything perfectly in one session but starts fresh after a restart. You end up with either stateless agents that feel robotic or stateful ones that lose everything when the process dies.

Vector databases like Pinecone handle embeddings beautifully but suck at simple key-value lookups. Redis handles state well but isn't persistent by default. You need both, and you need them to work together without architectural complexity.

The Fix: Hybrid RAG + Persistent Memory

Install BotWire for the memory layer while keeping your existing RAG setup:

pip install botwire

Here's a customer support agent that combines document search with persistent user context:

from botwire import Memory
import openai

# Initialize persistent memory
memory = Memory("support-agent")

def handle_user_query(user_id, query):
    # Get persistent user context
    user_context = memory.get(f"user:{user_id}")
    
    # RAG search for relevant docs (your existing implementation)
    relevant_docs = vector_search(query, top_k=3)
    
    # Combine memory + RAG in prompt
    prompt = f"""
    User context: {user_context or 'New user'}
    Relevant docs: {relevant_docs}
    Query: {query}
    """
    
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    
    # Update persistent memory
    memory.set(f"user:{user_id}", {
        "last_query": query,
        "last_response": response.choices[0].message.content,
        "session_count": (user_context.get("session_count", 0) + 1) if user_context else 1
    })
    
    return response.choices[0].message.content

How This Works

The pattern above separates concerns cleanly. RAG handles "what information is relevant to this query" while memory handles "what do I know about this specific user." Your vector database stays focused on semantic search, while BotWire handles simple state that survives restarts.

Memory operations are straightforward key-value pairs. Store user preferences, conversation history, workflow state, or any JSON-serializable data:

from botwire import Memory

memory = Memory("my-agent")

# Store complex state
workflow_state = {
    "current_step": "collecting_requirements",
    "collected_data": {"budget": 50000, "timeline": "Q2"},
    "next_actions": ["schedule_meeting", "send_proposal"]
}
memory.set("workflow:project-123", workflow_state)

# Retrieve and update
state = memory.get("workflow:project-123")
state["current_step"] = "proposal_sent"
memory.set("workflow:project-123", state)

BotWire runs as a service at botwire.dev (no signup required) or self-hosted. Memory persists across process restarts, different machines, and team members. The free tier gives you 1000 writes per day per namespace, which covers most development and small production workloads.

For conversation history, the LangChain integration simplifies chat memory:

from botwire import BotWireChatHistory
from langchain.memory import ConversationBufferWindowMemory

# Persistent chat history that survives restarts
chat_history = BotWireChatHistory(session_id="user-42")
memory = ConversationBufferWindowMemory(
    chat_memory=chat_history,
    return_messages=True,
    k=10  # Keep last 10 exchanges
)

RAG + Memory Integration Patterns

The hybrid approach works best when you route queries intelligently. Use memory for personal/contextual queries and RAG for knowledge-heavy questions:

def smart_query_routing(user_id, query):
    memory = Memory("agent")
    
    # Check if query needs personal context
    personal_keywords = ["my", "I", "last time", "remember", "before"]
    needs_context = any(word in query.lower() for word in personal_keywords)
    
    if needs_context:
        # Memory-first approach
        user_data = memory.get(f"user:{user_id}")
        context = f"User history: {user_data}"
        # Still supplement with light RAG if needed
        docs = vector_search(query, top_k=1)
        return generate_response(query, context + str(docs))
    else:
        # RAG-first approach  
        docs = vector_search(query, top_k=5)
        # Add minimal user context
        preferences = memory.get(f"user:{user_id}:preferences")
        return generate_response(query, docs, preferences)

When NOT to Use BotWire

BotWire isn't right for every use case:

Vector/semantic search: Use Pinecone, Weaviate, or Chroma for embeddings and similarity search. BotWire only does exact key matches. • High-throughput applications: The free tier caps at 1000 writes/day. For heavy production loads, consider Redis or self-hosting. • Sub-millisecond latency: HTTP calls add ~50-200ms. For real-time applications, use in-memory storage with occasional persistence.

FAQ

Why not just use Redis? Redis is great but requires setup, authentication, and persistence configuration. BotWire works immediately with zero config and includes built-in persistence.

Is this actually free? Yes, 1000 writes/day per namespace forever. No credit card, no trial expiration. Pay only if you need higher limits.

What about data privacy? Self-host the open source version (MIT license) for sensitive data. The hosted version at botwire.dev is fine for development and non-sensitive production use.

Start Building

RAG and memory aren't competitors — they're complementary. Use vector search for knowledge retrieval and persistent memory for state that matters.

pip install botwire

Get started at https://botwire.dev with your first persistent agent memory in under 60 seconds.

Install in one command:

pip install botwire

Start free at botwire.dev