Cost Tracking

See which agents, models, prompts, and runs are driving AI spend before you optimize or change providers.

PromptRails calculates estimated cost for model calls and rolls that up from spans to executions and workspace summaries. Use cost tracking to understand which agents, models, and workflows are driving spend before you optimize prompts or switch models.

For PromptRails-hosted models, usage is also tied to the workspace balance in Billing. Cost Tracking shows what happened inside runs; LLM Gateway explains how hosted-model balance, free allowances, and transactions are managed.

PromptRails costs dashboard showing total cost, executions, tokens, daily cost, and model breakdown — The Costs page shows spend over time and relates it to execution volume and token usage, so model changes can be evaluated against real workspace activity.

Technical detailsCost formula and breakdown details

How Costs Are Calculated

Cost is based on:

Token usage -- The number of prompt (input) and completion (output) tokens consumed
Model pricing -- The per-token price for the specific LLM model used
Automatic aggregation -- Span-level costs are rolled up to execution and workspace levels

The formula is:

cost = (prompt_tokens * input_price_per_token) + (completion_tokens * output_price_per_token)

PromptRails keeps model pricing metadata for supported providers and uses it to estimate costs automatically.

Per-Execution Cost Breakdown

Every execution includes total cost and token usage:

execution = client.executions.get(execution_id="your-execution-id")
 
print(f"Total Cost: ${execution['data']['cost']:.6f}")
print(f"Token Usage: {execution['data']['token_usage']}")
print(f"Duration: {execution['data']['duration_ms']}ms")

For multi-step executions such as chains, workflows, and multi-agent runs, the execution cost is the sum of the LLM spans inside that execution.

Per-Span Cost Analysis

Drill down into individual spans to see cost at the LLM call level:

traces = client.traces.list(execution_id="your-execution-id", kind="llm")
 
for span in traces["data"]:
    print(f"Model: {span['model_name']}")
    print(f"  Prompt tokens: {span.get('prompt_tokens', 0)}")
    print(f"  Completion tokens: {span.get('completion_tokens', 0)}")
    print(f"  Cost: ${span.get('cost', 0):.6f}")
    print(f"  Duration: {span['duration_ms']}ms")
    print("---")

This is valuable for identifying which step in a complex pipeline is the most expensive.

Token Usage Tracking

Token usage is tracked with three metrics:

Metric	Description
`prompt_tokens`	Number of tokens in the input sent to the model
`completion_tokens`	Number of tokens in the model's response
`total_tokens`	Sum of prompt and completion tokens

Token counts are provided by the LLM provider and recorded with each LLM span.

Workspace-Wide Cost Summary

Get aggregated cost data across your entire workspace:

summary = client.costs.get_summary()
 
print(f"Total Cost: ${summary.total_cost:.2f}")
print(f"Total Executions: {summary.total_executions}")
print(f"Total Tokens: {summary.total_tokens}")

Per-Agent Cost Analysis

Analyze costs broken down by agent to identify your most expensive workflows:

agent_costs = client.costs.get_agent_summary("your-agent-id")
 
print(f"Agent Total Cost: ${agent_costs.total_cost:.2f}")
print(f"Agent Executions: {agent_costs.total_executions}")
print(f"Agent Tokens: {agent_costs.total_tokens}")

LLM Model Pricing

PromptRails maintains pricing information for all supported LLM models. Prices are stored as cost per token (or per 1K/1M tokens depending on the model) and are used for automatic cost calculation.

Supported providers and their models include:

Provider	Example Models
OpenAI	gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropic	claude-3.5-sonnet, claude-3-opus, claude-3-haiku
Google	gemini-pro, gemini-ultra
DeepSeek	deepseek-chat, deepseek-coder
Fireworks	Various hosted models
xAI	grok-2, grok-3, grok-4
OpenRouter	Aggregated pricing from multiple providers
Together AI	Hosted open models (Llama, Qwen, …)
Mistral	mistral-large, mistral-small, codestral
Cohere	command-a, command-r
Groq	Llama / Qwen on LPU inference
Perplexity	sonar, sonar-pro
AWS Bedrock	Claude, Llama, Nova, and more on AWS
Cerebras	llama3.1-8b, llama3.3-70b
SambaNova	DeepSeek-V3, Meta-Llama-3.1-70B
Hyperbolic	meta-llama/Meta-Llama-3.1-70B, DeepSeek-V3
DeepInfra	DeepSeek-V4-Pro, Llama-3.3-70B, Gemma-4-31B
Novita AI	DeepSeek-V4-Pro, Qwen3.7-Max, Kimi K2.6
Friendli AI	Llama-3.3-70B, GLM-5.1, MiniMax-M2.5
Chutes AI	DeepSeek-V3, Qwen3-32B, Gemma-4-31B-TEE
Z.AI	glm-5.1, glm-4.7, glm-4.6v
Moonshot	kimi-k2.6, kimi-k2-thinking
DashScope	qwen-max, qwen-plus, qwen3.7-max
Hugging Face	DeepSeek-V4-Pro, Qwen3-Coder-Next (router access)

Cached input tokens (prompt-cache read hits) are billed at the model's discounted cached-input rate when configured, so prices reflect the lower cost providers charge for cached prompt prefixes.

Cost Optimization Tips

Choose the right model -- Use smaller, cheaper models (GPT-4o-mini, Claude 3 Haiku) for simple tasks and reserve expensive models for complex reasoning
Use prompt caching -- Enable cache timeouts on prompts to avoid duplicate LLM calls
Monitor per-agent costs -- Identify agents with unexpectedly high costs
Set token limits -- Use max_tokens to prevent runaway completions
Review chain/workflow costs -- Multi-step agents multiply LLM costs; optimize the number of steps
Use data source caching -- Cache database query results to reduce data source query overhead

Executions -- Execution-level cost data
Tracing -- Span-level cost breakdown
Billing and Plans -- Execution limits and plan pricing
LLM Gateway -- Hosted-model balance, free allowances, and gateway usage