Cost Tracking
Track LLM costs per execution, per span, per agent, and across your entire workspace with automatic token usage and pricing calculations.
Cost Tracking
PromptRails automatically calculates and tracks costs for every LLM call made during agent execution. This gives you full visibility into your AI spending at every level -- from individual LLM calls to workspace-wide summaries.
How Costs Are Calculated
Costs are calculated based on:
- Token usage -- The number of prompt (input) and completion (output) tokens consumed
- Model pricing -- The per-token price for the specific LLM model used
- Automatic aggregation -- Span-level costs are rolled up to execution and workspace levels
The formula is:
cost = (prompt_tokens * input_price_per_token) + (completion_tokens * output_price_per_token)
PromptRails maintains a database of LLM model pricing that is used for automatic cost calculation.
Per-Execution Cost Breakdown
Every execution includes total cost and token usage:
execution = client.executions.get(execution_id="your-execution-id")
print(f"Total Cost: ${execution['data']['cost']:.6f}")
print(f"Token Usage: {execution['data']['token_usage']}")
print(f"Duration: {execution['data']['duration_ms']}ms")For multi-step executions (chain, workflow, multi-agent), the execution cost is the sum of all LLM calls within that execution.
Per-Span Cost Analysis
Drill down into individual spans to see cost at the LLM call level:
traces = client.traces.list(execution_id="your-execution-id", kind="llm")
for span in traces["data"]:
print(f"Model: {span['model_name']}")
print(f" Prompt tokens: {span.get('prompt_tokens', 0)}")
print(f" Completion tokens: {span.get('completion_tokens', 0)}")
print(f" Cost: ${span.get('cost', 0):.6f}")
print(f" Duration: {span['duration_ms']}ms")
print("---")This is valuable for identifying which step in a complex pipeline is the most expensive.
Token Usage Tracking
Token usage is tracked with three metrics:
| Metric | Description |
|---|---|
prompt_tokens | Number of tokens in the input sent to the model |
completion_tokens | Number of tokens in the model's response |
total_tokens | Sum of prompt and completion tokens |
Token counts are provided by the LLM provider and recorded with each LLM span.
Workspace-Wide Cost Summary
Get aggregated cost data across your entire workspace:
summary = client.costs.get_summary()
print(f"Total Cost: ${summary.total_cost:.2f}")
print(f"Total Executions: {summary.total_executions}")
print(f"Total Tokens: {summary.total_tokens}")Per-Agent Cost Analysis
Analyze costs broken down by agent to identify your most expensive workflows:
agent_costs = client.costs.get_agent_summary("your-agent-id")
print(f"Agent Total Cost: ${agent_costs.total_cost:.2f}")
print(f"Agent Executions: {agent_costs.total_executions}")
print(f"Agent Tokens: {agent_costs.total_tokens}")LLM Model Pricing
PromptRails maintains pricing information for all supported LLM models. Prices are stored as cost per token (or per 1K/1M tokens depending on the model) and are used for automatic cost calculation.
Supported providers and their models include:
| Provider | Example Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo |
| Anthropic | claude-3.5-sonnet, claude-3-opus, claude-3-haiku |
| gemini-pro, gemini-ultra | |
| DeepSeek | deepseek-chat, deepseek-coder |
| Fireworks | Various hosted models |
| xAI | grok-1, grok-2 |
| OpenRouter | Aggregated pricing from multiple providers |
Cost Optimization Tips
- Choose the right model -- Use smaller, cheaper models (GPT-4o-mini, Claude 3 Haiku) for simple tasks and reserve expensive models for complex reasoning
- Use prompt caching -- Enable cache timeouts on prompts to avoid duplicate LLM calls
- Monitor per-agent costs -- Identify agents with unexpectedly high costs
- Set token limits -- Use
max_tokensto prevent runaway completions - Review chain/workflow costs -- Multi-step agents multiply LLM costs; optimize the number of steps
- Use data source caching -- Cache database query results to reduce data source query overhead
Related Topics
- Executions -- Execution-level cost data
- Tracing -- Span-level cost breakdown
- Billing and Plans -- Execution limits and plan pricing