# Cost Tracking

> Track LLM costs per execution, per span, per agent, and across your entire workspace with automatic token usage and pricing calculations.

Source: https://0.0.0.0:8080/docs/cost-tracking

PromptRails automatically calculates and tracks costs for every LLM call made during agent execution. This gives you full visibility into your AI spending at every level -- from individual LLM calls to workspace-wide summaries.

## How Costs Are Calculated

Costs are calculated based on:

1. **Token usage** -- The number of prompt (input) and completion (output) tokens consumed
2. **Model pricing** -- The per-token price for the specific LLM model used
3. **Automatic aggregation** -- Span-level costs are rolled up to execution and workspace levels

The formula is:

```
cost = (prompt_tokens * input_price_per_token) + (completion_tokens * output_price_per_token)
```

PromptRails maintains a database of LLM model pricing that is used for automatic cost calculation.

## Per-Execution Cost Breakdown

Every execution includes total cost and token usage:

```python
execution = client.executions.get(execution_id="your-execution-id")

print(f"Total Cost: ${execution['data']['cost']:.6f}")
print(f"Token Usage: {execution['data']['token_usage']}")
print(f"Duration: {execution['data']['duration_ms']}ms")
```

For multi-step executions (chain, workflow, multi-agent), the execution cost is the sum of all LLM calls within that execution.

## Per-Span Cost Analysis

Drill down into individual spans to see cost at the LLM call level:

```python
traces = client.traces.list(execution_id="your-execution-id", kind="llm")

for span in traces["data"]:
    print(f"Model: {span['model_name']}")
    print(f"  Prompt tokens: {span.get('prompt_tokens', 0)}")
    print(f"  Completion tokens: {span.get('completion_tokens', 0)}")
    print(f"  Cost: ${span.get('cost', 0):.6f}")
    print(f"  Duration: {span['duration_ms']}ms")
    print("---")
```

This is valuable for identifying which step in a complex pipeline is the most expensive.

## Token Usage Tracking

Token usage is tracked with three metrics:

| Metric              | Description                                     |
| ------------------- | ----------------------------------------------- |
| `prompt_tokens`     | Number of tokens in the input sent to the model |
| `completion_tokens` | Number of tokens in the model's response        |
| `total_tokens`      | Sum of prompt and completion tokens             |

Token counts are provided by the LLM provider and recorded with each LLM span.

## Workspace-Wide Cost Summary

Get aggregated cost data across your entire workspace:

```python
summary = client.costs.get_summary()

print(f"Total Cost: ${summary.total_cost:.2f}")
print(f"Total Executions: {summary.total_executions}")
print(f"Total Tokens: {summary.total_tokens}")
```

## Per-Agent Cost Analysis

Analyze costs broken down by agent to identify your most expensive workflows:

```python
agent_costs = client.costs.get_agent_summary("your-agent-id")

print(f"Agent Total Cost: ${agent_costs.total_cost:.2f}")
print(f"Agent Executions: {agent_costs.total_executions}")
print(f"Agent Tokens: {agent_costs.total_tokens}")
```

## LLM Model Pricing

PromptRails maintains pricing information for all supported LLM models. Prices are stored as cost per token (or per 1K/1M tokens depending on the model) and are used for automatic cost calculation.

Supported providers and their models include:

| Provider   | Example Models                                   |
| ---------- | ------------------------------------------------ |
| OpenAI     | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo  |
| Anthropic  | claude-3.5-sonnet, claude-3-opus, claude-3-haiku |
| Google     | gemini-pro, gemini-ultra                         |
| DeepSeek   | deepseek-chat, deepseek-coder                    |
| Fireworks  | Various hosted models                            |
| xAI        | grok-1, grok-2                                   |
| OpenRouter | Aggregated pricing from multiple providers       |

## Cost Optimization Tips

- **Choose the right model** -- Use smaller, cheaper models (GPT-4o-mini, Claude 3 Haiku) for simple tasks and reserve expensive models for complex reasoning
- **Use prompt caching** -- Enable cache timeouts on prompts to avoid duplicate LLM calls
- **Monitor per-agent costs** -- Identify agents with unexpectedly high costs
- **Set token limits** -- Use `max_tokens` to prevent runaway completions
- **Review chain/workflow costs** -- Multi-step agents multiply LLM costs; optimize the number of steps
- **Use data source caching** -- Cache database query results to reduce data source query overhead

## Related Topics

- [Executions](/docs/executions) -- Execution-level cost data
- [Tracing](/docs/tracing) -- Span-level cost breakdown
- [Billing and Plans](/docs/billing-and-plans) -- Execution limits and plan pricing