PromptRails

Cost Tracking

See which agents, models, prompts, and runs are driving AI spend before you optimize or change providers.

PromptRails calculates estimated cost for model calls and rolls that up from spans to executions and workspace summaries. Use cost tracking to understand which agents, models, and workflows are driving spend before you optimize prompts or switch models.

For PromptRails-hosted models, usage is also tied to the workspace balance in Billing. Cost Tracking shows what happened inside runs; LLM Gateway explains how hosted-model balance, free allowances, and transactions are managed.

The Costs page shows spend over time and relates it to execution volume and token usage, so model changes can be evaluated against real workspace activity.
Technical detailsCost formula and breakdown details

How Costs Are Calculated

Cost is based on:

  1. Token usage -- The number of prompt (input) and completion (output) tokens consumed
  2. Model pricing -- The per-token price for the specific LLM model used
  3. Automatic aggregation -- Span-level costs are rolled up to execution and workspace levels

The formula is:

cost = (prompt_tokens * input_price_per_token) + (completion_tokens * output_price_per_token)

PromptRails keeps model pricing metadata for supported providers and uses it to estimate costs automatically.

Per-Execution Cost Breakdown

Every execution includes total cost and token usage:

execution = client.executions.get(execution_id="your-execution-id")
 
print(f"Total Cost: ${execution['data']['cost']:.6f}")
print(f"Token Usage: {execution['data']['token_usage']}")
print(f"Duration: {execution['data']['duration_ms']}ms")

For multi-step executions such as chains, workflows, and multi-agent runs, the execution cost is the sum of the LLM spans inside that execution.

Per-Span Cost Analysis

Drill down into individual spans to see cost at the LLM call level:

traces = client.traces.list(execution_id="your-execution-id", kind="llm")
 
for span in traces["data"]:
    print(f"Model: {span['model_name']}")
    print(f"  Prompt tokens: {span.get('prompt_tokens', 0)}")
    print(f"  Completion tokens: {span.get('completion_tokens', 0)}")
    print(f"  Cost: ${span.get('cost', 0):.6f}")
    print(f"  Duration: {span['duration_ms']}ms")
    print("---")

This is valuable for identifying which step in a complex pipeline is the most expensive.

Token Usage Tracking

Token usage is tracked with three metrics:

MetricDescription
prompt_tokensNumber of tokens in the input sent to the model
completion_tokensNumber of tokens in the model's response
total_tokensSum of prompt and completion tokens

Token counts are provided by the LLM provider and recorded with each LLM span.

Workspace-Wide Cost Summary

Get aggregated cost data across your entire workspace:

summary = client.costs.get_summary()
 
print(f"Total Cost: ${summary.total_cost:.2f}")
print(f"Total Executions: {summary.total_executions}")
print(f"Total Tokens: {summary.total_tokens}")

Per-Agent Cost Analysis

Analyze costs broken down by agent to identify your most expensive workflows:

agent_costs = client.costs.get_agent_summary("your-agent-id")
 
print(f"Agent Total Cost: ${agent_costs.total_cost:.2f}")
print(f"Agent Executions: {agent_costs.total_executions}")
print(f"Agent Tokens: {agent_costs.total_tokens}")

LLM Model Pricing

PromptRails maintains pricing information for all supported LLM models. Prices are stored as cost per token (or per 1K/1M tokens depending on the model) and are used for automatic cost calculation.

Supported providers and their models include:

ProviderExample Models
OpenAIgpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropicclaude-3.5-sonnet, claude-3-opus, claude-3-haiku
Googlegemini-pro, gemini-ultra
DeepSeekdeepseek-chat, deepseek-coder
FireworksVarious hosted models
xAIgrok-2, grok-3, grok-4
OpenRouterAggregated pricing from multiple providers
Together AIHosted open models (Llama, Qwen, …)
Mistralmistral-large, mistral-small, codestral
Coherecommand-a, command-r
GroqLlama / Qwen on LPU inference
Perplexitysonar, sonar-pro
AWS BedrockClaude, Llama, Nova, and more on AWS
Cerebrasllama3.1-8b, llama3.3-70b
SambaNovaDeepSeek-V3, Meta-Llama-3.1-70B
Hyperbolicmeta-llama/Meta-Llama-3.1-70B, DeepSeek-V3
DeepInfraDeepSeek-V4-Pro, Llama-3.3-70B, Gemma-4-31B
Novita AIDeepSeek-V4-Pro, Qwen3.7-Max, Kimi K2.6
Friendli AILlama-3.3-70B, GLM-5.1, MiniMax-M2.5
Chutes AIDeepSeek-V3, Qwen3-32B, Gemma-4-31B-TEE
Z.AIglm-5.1, glm-4.7, glm-4.6v
Moonshotkimi-k2.6, kimi-k2-thinking
DashScopeqwen-max, qwen-plus, qwen3.7-max
Hugging FaceDeepSeek-V4-Pro, Qwen3-Coder-Next (router access)

Cached input tokens (prompt-cache read hits) are billed at the model's discounted cached-input rate when configured, so prices reflect the lower cost providers charge for cached prompt prefixes.

Cost Optimization Tips

  • Choose the right model -- Use smaller, cheaper models (GPT-4o-mini, Claude 3 Haiku) for simple tasks and reserve expensive models for complex reasoning
  • Use prompt caching -- Enable cache timeouts on prompts to avoid duplicate LLM calls
  • Monitor per-agent costs -- Identify agents with unexpectedly high costs
  • Set token limits -- Use max_tokens to prevent runaway completions
  • Review chain/workflow costs -- Multi-step agents multiply LLM costs; optimize the number of steps
  • Use data source caching -- Cache database query results to reduce data source query overhead