Cutting AI Agent Costs with Persistent Memory

Free · Open source (MIT) · Works with LangChain, CrewAI, AutoGen · No signup

Your AI agent is burning through tokens by asking the same questions every run. You're watching costs pile up as your agent re-discovers basic facts, re-processes the same data, and repeats expensive API calls that should happen once. The fix is persistent memory caching that survives restarts.

The Problem: Agents with Alzheimer's Disease

Every time your agent starts up, it's born with complete amnesia. It doesn't remember that it already figured out your database schema. It doesn't recall the user preferences it spent 20 tokens discovering. It can't reuse the expensive analysis it performed on that CSV file yesterday.

This hits hardest with token reduction memory scenarios. Your agent asks "What's the current user's timezone?" — burns tokens getting the answer. Process restarts. Same question, same tokens burned. Multiply this across dozens of facts your agent needs to remember, and you're looking at 10x the LLM cost you should be paying.

The math is brutal: if your agent uses 1,000 tokens per run to rediscover the same 10 facts, and you run it 100 times, that's 100,000 wasted tokens. At GPT-4 pricing, you just spent $2 on information you already paid for.

The Fix: Persistent Memory That Actually Persists

Install BotWire and give your agent a memory that survives restarts:

pip install botwire

from botwire import Memory

# Create persistent memory for your agent
agent_memory = Memory("my-agent")

# First run: expensive lookup
user_timezone = agent_memory.get("user_timezone")
if user_timezone is None:
    # This costs tokens - but only once
    user_timezone = llm.ask("What timezone is the user in?")
    agent_memory.set("user_timezone", user_timezone)

# Every subsequent run: free
user_timezone = agent_memory.get("user_timezone")  # instant, no LLM call

This simple pattern can reduce LLM cost by 60-90% for agents that rediscover the same information repeatedly.

How It Works: Cache Everything Expensive

The core principle for agent cost optimization is caching any information that's expensive to regenerate. Here are the patterns that save the most money:

Cache User Context and Preferences

from botwire import Memory

memory = Memory("user-context")

def get_user_context(user_id):
    context = memory.get(f"user_{user_id}_context")
    if context is None:
        # Expensive: analyze user's history, preferences, etc.
        context = analyze_user_with_llm(user_id)
        memory.set(f"user_{user_id}_context", context)
    return context

Cache Expensive Analysis Results

def analyze_document(doc_hash):
    analysis = memory.get(f"analysis_{doc_hash}")
    if analysis is None:
        # This might cost 500+ tokens
        analysis = llm.analyze_document(document)
        memory.set(f"analysis_{doc_hash}", analysis)
    return analysis

Handle Cache Invalidation

Sometimes cached data goes stale. Set expiration times or manual invalidation:

import time

# Cache with timestamp for manual expiration
def cache_with_ttl(key, value, ttl_seconds=3600):
    data = {
        "value": value,
        "expires_at": time.time() + ttl_seconds
    }
    memory.set(key, data)

def get_with_ttl(key):
    data = memory.get(key)
    if data and time.time() < data["expires_at"]:
        return data["value"]
    return None

The memory persists across processes and machines, so your agent keeps its knowledge even when deployed to different servers.

Integration with Agent Frameworks

CrewAI Integration

CrewAI agents can use BotWire through memory tools:

from botwire.memory import memory_tools
from crewai import Agent, Task, Crew

# Get remember/recall/list_memory tools
tools = memory_tools("crew-research")

researcher = Agent(
    role="Research Analyst",
    goal="Research topics without repeating expensive lookups",
    tools=tools,
    backstory="I remember everything I learn to avoid redundant research."
)

task = Task(
    description="Research the company's latest financial data",
    agent=researcher
)

# Agent can now remember("company_revenue_2024", "125M") and recall("company_revenue_2024")
crew = Crew(agents=[researcher], tasks=[task])

LangChain Chat History

from botwire import BotWireChatHistory
from langchain.memory import ConversationBufferMemory

# Persistent conversation history
chat_history = BotWireChatHistory(session_id="user-42")
memory = ConversationBufferMemory(chat_memory=chat_history)

# Conversations survive restarts and deployments

When NOT to Use BotWire

Vector search needs: BotWire is key-value storage, not a vector database. Use Pinecone/Weaviate for semantic search
High-throughput scenarios: 1000 writes/day free tier won't handle production apps with thousands of users
Sub-millisecond latency: HTTP API adds network overhead. Use Redis for ultra-low latency caching

FAQ

Why not just use Redis? Redis requires setup, hosting, and dies when your server restarts unless you configure persistence. BotWire works instantly with no setup and persists by default.

Is this actually free? Yes. 1000 writes/day per namespace, 50MB storage, unlimited reads. No signup, no credit card, no API keys.

What about data privacy? It's open source (MIT license) and you can self-host the entire thing. It's a single FastAPI + SQLite service you can run anywhere.

Start Caching Today

Stop paying for the same tokens twice. Install BotWire and cut your agent costs by caching expensive LLM responses that never change.

pip install botwire

Get started at https://botwire.dev — no signup required.

Install in one command:

pip install botwire

Start free at botwire.dev