PromptRails

Cost Tracking

Track LLM costs per execution, per span, per agent, and across your entire workspace with automatic token usage and pricing calculations.

Cost Tracking

PromptRails automatically calculates and tracks costs for every LLM call made during agent execution. This gives you full visibility into your AI spending at every level -- from individual LLM calls to workspace-wide summaries.

How Costs Are Calculated

Costs are calculated based on:

  1. Token usage -- The number of prompt (input) and completion (output) tokens consumed
  2. Model pricing -- The per-token price for the specific LLM model used
  3. Automatic aggregation -- Span-level costs are rolled up to execution and workspace levels

The formula is:

cost = (prompt_tokens * input_price_per_token) + (completion_tokens * output_price_per_token)

PromptRails maintains a database of LLM model pricing that is used for automatic cost calculation.

Per-Execution Cost Breakdown

Every execution includes total cost and token usage:

execution = client.executions.get(execution_id="your-execution-id")
 
print(f"Total Cost: ${execution['data']['cost']:.6f}")
print(f"Token Usage: {execution['data']['token_usage']}")
print(f"Duration: {execution['data']['duration_ms']}ms")

For multi-step executions (chain, workflow, multi-agent), the execution cost is the sum of all LLM calls within that execution.

Per-Span Cost Analysis

Drill down into individual spans to see cost at the LLM call level:

traces = client.traces.list(execution_id="your-execution-id", kind="llm")
 
for span in traces["data"]:
    print(f"Model: {span['model_name']}")
    print(f"  Prompt tokens: {span.get('prompt_tokens', 0)}")
    print(f"  Completion tokens: {span.get('completion_tokens', 0)}")
    print(f"  Cost: ${span.get('cost', 0):.6f}")
    print(f"  Duration: {span['duration_ms']}ms")
    print("---")

This is valuable for identifying which step in a complex pipeline is the most expensive.

Token Usage Tracking

Token usage is tracked with three metrics:

MetricDescription
prompt_tokensNumber of tokens in the input sent to the model
completion_tokensNumber of tokens in the model's response
total_tokensSum of prompt and completion tokens

Token counts are provided by the LLM provider and recorded with each LLM span.

Workspace-Wide Cost Summary

Get aggregated cost data across your entire workspace:

summary = client.costs.get_summary()
 
print(f"Total Cost: ${summary.total_cost:.2f}")
print(f"Total Executions: {summary.total_executions}")
print(f"Total Tokens: {summary.total_tokens}")

Per-Agent Cost Analysis

Analyze costs broken down by agent to identify your most expensive workflows:

agent_costs = client.costs.get_agent_summary("your-agent-id")
 
print(f"Agent Total Cost: ${agent_costs.total_cost:.2f}")
print(f"Agent Executions: {agent_costs.total_executions}")
print(f"Agent Tokens: {agent_costs.total_tokens}")

LLM Model Pricing

PromptRails maintains pricing information for all supported LLM models. Prices are stored as cost per token (or per 1K/1M tokens depending on the model) and are used for automatic cost calculation.

Supported providers and their models include:

ProviderExample Models
OpenAIgpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropicclaude-3.5-sonnet, claude-3-opus, claude-3-haiku
Googlegemini-pro, gemini-ultra
DeepSeekdeepseek-chat, deepseek-coder
FireworksVarious hosted models
xAIgrok-1, grok-2
OpenRouterAggregated pricing from multiple providers

Cost Optimization Tips

  • Choose the right model -- Use smaller, cheaper models (GPT-4o-mini, Claude 3 Haiku) for simple tasks and reserve expensive models for complex reasoning
  • Use prompt caching -- Enable cache timeouts on prompts to avoid duplicate LLM calls
  • Monitor per-agent costs -- Identify agents with unexpectedly high costs
  • Set token limits -- Use max_tokens to prevent runaway completions
  • Review chain/workflow costs -- Multi-step agents multiply LLM costs; optimize the number of steps
  • Use data source caching -- Cache database query results to reduce data source query overhead