Testing AI Agents: Memory Fixtures and Replay

Free · Open source (MIT) · Works with LangChain, CrewAI, AutoGen · No signup

Testing AI agents is notoriously hard because they accumulate state over time — conversations, learned facts, decisions that influence future behavior. When your agent "remembers" something from yesterday's conversation, how do you write a deterministic test for that? You need agent test fixtures that let you control exactly what the agent remembers, and agent replay capabilities to recreate scenarios. Here's how to build an llm test harness that actually works.

The Testing Problem

AI agents break traditional testing patterns because they're stateful. Your agent learns that "John prefers email over Slack" in conversation A, then uses that preference in conversation B three days later. When you write tests, you can't just mock the LLM calls — you need to control the entire memory state.

Without proper ai agent testing, you get flaky tests that pass locally but fail in CI, or tests that only work when run in isolation. Worse, you can't reproduce bugs because you can't recreate the exact memory state that triggered the issue.

# This test will fail randomly
def test_agent_prefers_email():
    agent = MyAgent()
    agent.process("John prefers email")
    # ... 50 lines later, different test file ...
    result = agent.decide_notification_method("John")
    assert result == "email"  # Fails if memory was cleared

The Fix

Install BotWire to get persistent, controllable memory for your test ai memory scenarios:

pip install botwire
from botwire import Memory
import pytest

class TestAgentMemory:
    def setup_method(self):
        # Each test gets isolated memory
        self.memory = Memory(f"test-{pytest.current_test_id}")
        self.memory.clear()  # Clean slate
    
    def test_remembers_user_preference(self):
        # Set up test fixture
        self.memory.set("user:john:preference", "email")
        
        # Test agent behavior
        agent = MyAgent(memory=self.memory)
        result = agent.decide_notification("john", "meeting reminder")
        
        assert result["method"] == "email"
        assert "john" in result["recipient"]

How It Works

The key insight is treating agent memory as a test fixture. Just like you'd mock a database, you pre-populate memory with known state before each test.

BotWire gives you a persistent key-value store that survives process restarts — perfect for agent replay scenarios where you need to recreate complex state:

def test_conversation_replay(self):
    # Recreate a specific conversation state
    conversation_state = {
        "messages": ["Hello", "I need help with Python", "Specifically async/await"],
        "user_expertise": "beginner",
        "last_topic": "concurrency",
        "clarification_count": 2
    }
    
    for key, value in conversation_state.items():
        self.memory.set(f"session:abc123:{key}", value)
    
    # Now test how agent handles next message
    agent = MyAgent(memory=self.memory, session="abc123")
    response = agent.process("What's the difference between async and threading?")
    
    # Should give beginner-friendly answer since it knows user_expertise
    assert "beginner" in response.tone_markers

For cross-process testing (like testing agent handoffs between services), memory persists automatically:

def test_agent_handoff(self):
    # First agent stores research findings
    researcher = ResearchAgent(Memory("handoff-test"))
    researcher.research("climate change impacts")
    
    # Second agent (different process) picks up where first left off  
    writer = WriterAgent(Memory("handoff-test"))  # Same namespace
    article = writer.write_summary()
    
    # Writer should have access to researcher's findings
    assert len(article.sources) > 0

Framework Integration

BotWire integrates directly with popular agent frameworks. For LangChain agents, use the chat history adapter:

from botwire import BotWireChatHistory
from langchain.agents import AgentExecutor
import pytest

def test_langchain_agent_memory():
    # Controlled chat history for testing
    chat_history = BotWireChatHistory(session_id="test-session-456")
    
    # Pre-populate conversation context
    chat_history.add_user_message("My name is Sarah and I'm a data scientist")
    chat_history.add_ai_message("Nice to meet you Sarah! I'll remember you work in data science.")
    
    # Test agent with this context
    agent = create_langchain_agent(memory=chat_history)
    result = agent.run("What kind of work do I do?")
    
    assert "data scientist" in result.lower()
    assert "sarah" in result.lower()

For CrewAI, you get memory tools that agents can use in tests:

from botwire.memory import memory_tools

def test_crew_shared_memory():
    tools = memory_tools("test-crew-ns")
    
    # Agent 1 stores finding
    remember_tool = tools[0]  # remember tool
    remember_tool.run("customer_pain_point:slow_checkout")
    
    # Agent 2 recalls it
    recall_tool = tools[1]  # recall tool  
    finding = recall_tool.run("customer_pain_point")
    
    assert finding == "slow_checkout"

When NOT to Use BotWire

FAQ

Why not just use Redis for testing? Redis requires setup, auth, and doesn't persist by default. BotWire works immediately with zero config and gives you 1000 writes/day free forever.

Is this actually free? Yes. 1000 writes/day per namespace, 50MB storage, unlimited reads. No credit card, no signup. Open source MIT license if you want to self-host.

What about data privacy? Data stays in your namespace (isolated by default). Self-host the FastAPI service if you need on-premise. All data is yours.

Get Started

Stop writing flaky agent tests. Get deterministic ai agent testing with proper memory fixtures.

pip install botwire

Full documentation and self-hosting guide at botwire.dev.

Install in one command:

pip install botwire

Start free at botwire.dev