Helicone + Memory: Tracking LLM Costs per Agent Namespace

Free · Open source (MIT) · Works with LangChain, CrewAI, AutoGen · No signup

When you're tracking LLM costs across multiple AI agents, you need to group expenses by agent namespace — not just by session or user. This lets you see which agents are burning through your OpenAI budget and optimize accordingly. Here's how to implement helicone memory with persistent agent state using BotWire.

The Problem: Agent Costs Disappear Into the Void

Most LLM observability tools track costs per API call or user session, but miss the bigger picture: which specific agent logic is expensive? Your customer service bot might be cheap while your code review agent burns $50/day on complex prompts.

Without per-agent cost tracking, you're flying blind:

# This tells you nothing about which agent made the call
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Debug this code..."}]
)
# Was this the expensive code-review agent or cheap FAQ bot?

When agents restart, crash, or scale across machines, their context vanishes. You lose the thread between agent identity and LLM spend. Helicone shows you spent $200 yesterday — but on which agents? For what conversations? The observability gap kills cost optimization.

The Fix: Persistent Agent Namespaces

BotWire Memory gives each agent a persistent namespace that survives restarts and process changes. Tag your LLM calls with stable agent identifiers:

pip install botwire

from botwire import Memory
import openai

# Each agent gets persistent memory
agent_memory = Memory("code-reviewer-v2")

# Track agent context before LLM calls  
agent_memory.set("current_task", "reviewing user-123/pull-request-456")
agent_memory.set("model_used", "gpt-4")
agent_memory.set("session_start", "2024-01-15T10:30:00Z")

# Make your LLM call with agent context
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": f"Review this PR: {pr_content}"}],
    metadata={"agent_namespace": "code-reviewer-v2"}
)

# Log the call for cost tracking
agent_memory.set(f"call_{response.id}", {
    "tokens": response.usage.total_tokens,
    "cost_estimate": response.usage.total_tokens * 0.00003,
    "timestamp": "2024-01-15T10:31:15Z"
})

How It Works: Memory That Survives Everything

The namespace "code-reviewer-v2" persists across machine restarts, deployments, and scale events. Your agent can crash at 3am and resume exactly where it left off:

# Agent restarts on different machine
agent_memory = Memory("code-reviewer-v2")

# Context survives — no cold start
current_task = agent_memory.get("current_task")  
# Returns: "reviewing user-123/pull-request-456"

# Track cumulative costs per agent
daily_calls = agent_memory.get("daily_call_count") or 0
agent_memory.set("daily_call_count", daily_calls + 1)

total_cost = agent_memory.get("daily_cost") or 0.0
agent_memory.set("daily_cost", total_cost + estimated_call_cost)

For complex workflows, list all memory keys to debug state:

# Debug what this agent remembers
all_keys = agent_memory.list_keys()
print(f"Agent {namespace} has {len(all_keys)} memory entries")

# Clean up old data
if "temp_data" in all_keys:
    agent_memory.delete("temp_data")

The memory is shared across processes — if your agent spawns workers, they all see the same namespace. This makes cost attribution accurate even in distributed setups.

Integration with LangChain Agents

For LangChain-based agents, use the chat history adapter to maintain conversation context alongside cost tracking:

from botwire import BotWireChatHistory, Memory
from langchain.agents import create_openai_functions_agent

# Persistent chat history per agent instance  
chat_history = BotWireChatHistory(session_id="support-agent-prod")
cost_tracker = Memory("support-agent-prod")

# Your agent maintains context across restarts
agent = create_openai_functions_agent(
    llm=llm,
    tools=tools,
    memory=chat_history
)

# Track costs with full context
def track_llm_call(response):
    cost_tracker.set(f"call_{response.id}", {
        "conversation_id": "support-agent-prod", 
        "tokens": response.usage.total_tokens,
        "user_query": chat_history.messages[-2].content,
        "agent_response": chat_history.messages[-1].content
    })

When NOT to Use BotWire

Vector search: This isn't a vector database. Use Pinecone/Weaviate for semantic memory
High-throughput logging: 1000 writes/day limit. Use ClickHouse for heavy telemetry
Sub-millisecond latency: HTTP-based, not local cache. Use Redis for ultra-fast access

FAQ

Why not just use Redis? Redis requires infrastructure. BotWire is zero-setup with a forever-free tier. For prototypes and small agents, the cognitive overhead isn't worth it.

Is this actually free? Yes. 1000 writes/day per namespace, unlimited reads, 50MB storage. No credit card, no signup, no surprise bills.

What about data privacy? It's open source (MIT license) and self-hostable. The hosted version doesn't log or analyze your data — it's just key-value storage.

Start Tracking Agent Costs

BotWire Memory solves the "which agent spent what" problem with persistent namespaces and zero infrastructure overhead.

pip install botwire

Full docs and self-hosting instructions: https://botwire.dev

Install in one command:

pip install botwire

Start free at botwire.dev