Tracing
Inspect the exact steps behind an agent run: prompts, model calls, tools, data, guardrails, errors, cost, and tokens.
Tracing turns an agent run into a timeline you can debug. Instead of seeing only the final answer, you can inspect the prompt render, model call, tool call, data source query, guardrail scan, memory lookup, media generation step, error, token usage, and cost that led to it.
PromptRails traces platform-managed runs automatically. You can also send spans from your own application code when you want PromptRails to act as a standalone LLM observability backend.
What To Inspect First
Start with the trace when an execution looks wrong, slow, expensive, or surprising:
- Wrong answer -- Open the prompt render and model call to see exactly what the model saw.
- Tool issue -- Inspect the tool or MCP span for arguments, response, latency, and errors.
- Data issue -- Open the data source span to check the query and returned context.
- Safety issue -- Review guardrail spans to see whether a policy blocked, redacted, or only logged.
- Cost issue -- Look at model spans and token counts before changing prompts or providers.
How Tracing Works
Every agent execution creates a trace. A trace is a tree of spans, and each span represents one operation inside the run. A span captures:
- What happened (name, kind)
- How long it took (duration, start/end timestamps)
- What went in and came out (input/output)
- Whether it succeeded (status, error details)
- How much it cost (token usage, cost)
- Context links (agent, prompt, model, execution, session)
Spans form a parent-child hierarchy that reflects the execution flow. In the UI, that hierarchy becomes the trace tree you use to move from the full run into a specific prompt render, model call, tool call, or guardrail scan.
Technical detailsExample span hierarchy
agent (root span)
├── guardrail (input scan)
├── prompt (template rendering)
├── llm (model call)
│ └── tool (tool call by LLM)
│ └── mcp_call (MCP server invocation)
├── guardrail (output scan)
└── memory (memory update)
Technical detailsTrace ingestion methods
Sending Your Own Traces
Traces are created two ways:
- Automatically — every agent, prompt, and data source run on PromptRails is traced for you.
- From your own code — send spans to PromptRails from any application, even one that does not manage its prompts or agents on the platform. This lets you use PromptRails as a standalone LLM observability backend for LangChain, the OpenAI / Anthropic / Google GenAI SDKs, OpenTelemetry, or your own pipelines.
External spans only need an API key with the traces:write scope. They are tagged with a source attribute such as sdk or otlp, so you can separate them from PromptRails-managed runs in the trace list.
From an SDK
The Python, JavaScript, and Go SDKs each ship a tracing module that batches spans and sends them in the background.
Python
from promptrails.tracing import Tracer
tracer = Tracer(api_key="pr_...")
with tracer.span("agent-run", kind="agent") as root:
root.set_input({"q": "weather?"})
with tracer.span("llm-call", kind="llm") as llm:
llm.set_model("gpt-4o").set_usage(prompt_tokens=120, completion_tokens=30)
tracer.flush()JavaScript / TypeScript
import { Tracer } from '@promptrails/sdk/tracing'
const tracer = new Tracer({ apiKey: 'pr_...' })
await tracer.span('agent-run', { kind: 'agent' }, async (root) => {
root.setInput({ q: 'weather?' })
await tracer.span('llm-call', { kind: 'llm' }, async (llm) => {
llm.setModel('gpt-4o').setUsage(120, 30)
})
})
await tracer.flush()Go
import "github.com/promptrails/go-sdk/tracing"
tracer := tracing.NewTracer("pr_...")
defer tracer.Shutdown()
_ = tracer.Span(ctx, "agent-run", tracing.KindAgent, func(ctx context.Context, root *tracing.Span) error {
root.SetInput(map[string]any{"q": "weather?"})
return tracer.Span(ctx, "llm-call", tracing.KindLLM, func(ctx context.Context, llm *tracing.Span) error {
// SetUsage(promptTokens, completionTokens, totalTokens); -1 = auto-compute total
llm.SetModel("gpt-4o").SetUsage(120, 30, -1)
return nil
})
})Nested spans automatically share a trace and link to their parent, so the tree renders correctly in the trace viewer.
Framework auto-instrumentation
The Python and JavaScript SDKs include integrations that turn framework calls into spans with no manual instrumentation:
- LangChain — a callback handler that traces chains, LLMs, tools, and retrievers.
- OpenAI — wrap an OpenAI-compatible client to auto-trace every chat completion (model, tokens, latency, output).
- Anthropic — wrap an Anthropic client to auto-trace every
messages.createcall. - Google GenAI — wrap a Google GenAI client to auto-trace every
generateContentcall.
See the Python SDK and JavaScript SDK pages for setup.
OpenTelemetry (OTLP)
If you already instrument with OpenTelemetry, point an OTLP/HTTP exporter at the PromptRails OTLP endpoint — no custom code required. GenAI semantic-convention attributes (gen_ai.*) are mapped onto the span model.
POST <base-url>/api/v1/otel/v1/traces
Header: X-API-Key: pr_... # key with the traces:write scope
Content-Type: application/x-protobuf (or application/json)
Configure an OpenTelemetry exporter with the endpoint <base-url>/api/v1/otel and an X-API-Key header; the standard exporter appends /v1/traces.
Native ingest endpoint
SDKs post to a simple batch endpoint you can also call directly:
POST <base-url>/api/v1/traces/ingest
Header: X-API-Key: pr_... # key with the traces:write scope
{
"spans": [
{
"trace_id": "a1b2c3...",
"span_id": "1111...",
"parent_span_id": "0000...",
"name": "llm-call",
"kind": "llm",
"model_name": "gpt-4o",
"prompt_tokens": 120,
"completion_tokens": 30,
"started_at": "2026-06-01T10:00:00Z",
"ended_at": "2026-06-01T10:00:01Z"
}
]
}
Set parent_span_id to a parent span's span_id to nest spans; omit it (or leave it empty) for a root span. The workspace is taken from the API key. Up to 1000 spans per request.
Technical detailsSpan kinds, hierarchy, and attributes
Span Kinds
PromptRails currently defines these span kinds:
| Kind | Identifier | Description |
|---|---|---|
| Agent | agent | Top-level agent execution span |
| LLM | llm | LLM model call (prompt + completion) |
| Tool | tool | Tool invocation within an agent |
| Data Source | datasource | Database or file query |
| Prompt | prompt | Prompt template rendering |
| Guardrail | guardrail | Input or output guardrail scan |
| Chain | chain | Chain-type agent orchestration |
| Workflow | workflow | Workflow step execution |
| Agent Step | agent_step | Individual step in a multi-agent execution |
| MCP Call | mcp_call | Remote MCP server tool call |
| Preprocessing | preprocessing | Input preprocessing before LLM |
| Postprocessing | postprocessing | Output postprocessing after LLM |
| Memory | memory | Memory retrieval or storage |
| Embedding | embedding | Vector embedding generation |
| Speech | speech | Text-to-speech or speech-to-text operation |
| Image | image | Image generation or editing |
| Video | video | Video generation |
| Storage | storage | Asset storage upload/download |
Span Hierarchy
Spans are organized in a tree structure using trace IDs, span IDs, and parent span IDs:
- trace_id -- Groups all spans belonging to the same execution
- span_id -- Uniquely identifies a single span
- parent_span_id -- Links a span to its parent (empty for root spans)
Example Trace for a Simple Agent
[agent] Customer Support Bot (15ms total)
[guardrail] prompt_injection input scan (2ms)
[prompt] Render main prompt (1ms)
[llm] gpt-4o call (10ms, 450 prompt + 120 completion tokens, $0.003)
[guardrail] pii output scan (2ms)
Example Trace for a Chain Agent
[chain] Data Analysis Pipeline (45ms total)
[agent_step] Step 1: Data Extraction
[datasource] Query analytics DB (8ms)
[prompt] Render extraction prompt (1ms)
[llm] gpt-4o call (15ms)
[agent_step] Step 2: Analysis
[memory] Retrieve analysis templates (2ms)
[prompt] Render analysis prompt (1ms)
[llm] claude-3.5-sonnet call (18ms)
Span Status and Levels
Status
| Status | Description |
|---|---|
ok | The span completed successfully |
error | The span encountered an error |
Level
| Level | Description |
|---|---|
debug | Detailed diagnostic information |
default | Standard operational information |
warning | Something unexpected but non-fatal |
error | An error occurred |
Span Attributes
Each span carries an attributes JSON object with kind-specific metadata:
LLM Span Attributes
{
"model": "gpt-4o",
"provider": "openai",
"temperature": 0.7,
"max_tokens": 1024,
"prompt_tokens": 450,
"completion_tokens": 120,
"total_tokens": 570,
"cost": 0.003
}Guardrail Span Attributes
{
"scanner_type": "prompt_injection",
"direction": "input",
"action": "block",
"triggered": false
}Tool Span Attributes
{
"tool_name": "weather_api",
"tool_type": "api",
"parameters": { "location": "New York" }
}Media Span Attributes (Speech / Image / Video)
{
"provider": "fal",
"model": "fal-ai/flux/schnell",
"media_type": "image_gen",
"prompt": "A futuristic city skyline at sunset",
"asset_url": "https://storage.example.com/assets/image.png",
"content_type": "image/png"
}Filtering Traces
List and filter traces with multiple criteria:
Technical detailsFilter traces with SDKs
traces = client.traces.list(
agent_id="your-agent-id", # Filter by agent
execution_id="execution-id", # Filter by execution
kind="llm", # Filter by span kind
status="ok", # Filter by status
model_name="gpt-4o", # Filter by model
session_id="session-id", # Filter by session
user_id="user-id", # Filter by user
page=1,
limit=50
)JavaScript SDK
const traces = await client.traces.list({
agentId: 'your-agent-id',
kind: 'llm',
status: 'ok',
page: 1,
limit: 50,
})Cost and Token Tracking Per Span
Every LLM span includes precise cost and token tracking:
| Field | Description |
|---|---|
prompt_tokens | Number of input tokens sent to the model |
completion_tokens | Number of output tokens generated |
total_tokens | Sum of prompt and completion tokens |
cost | Cost in USD for this specific span |
model_name | The model used for this call |
This makes cost attribution concrete: in a multi-step execution, you can see which model call was the expensive one instead of only seeing a total.
Error Information
When a span has an error status, additional fields provide diagnostic information:
| Field | Description |
|---|---|
error_message | Human-readable error description |
error_type | Error classification (e.g., rate_limit, timeout, validation) |
error_stack | Stack trace for debugging |
Technical detailsTrace field reference
Trace Fields
| Field | Type | Description |
|---|---|---|
id | KSUID | Unique span record ID |
trace_id | string | Trace group identifier |
span_id | string | Unique span identifier |
parent_span_id | string | Parent span (empty for root) |
name | string | Span name |
kind | string | Span kind (one of 18 types) |
status | string | ok or error |
level | string | debug, default, warning, error |
input | JSON | Span input data |
output | JSON | Span output data |
attributes | JSON | Kind-specific metadata |
tags | JSON | Custom tags |
token_usage | JSON | Token consumption |
cost | float | Cost in USD |
duration_ms | integer | Duration in milliseconds |
model_name | string | LLM model name |
agent_id | KSUID | Associated agent |
prompt_id | KSUID | Associated prompt |
execution_id | KSUID | Associated execution |
session_id | string | Associated chat session |
started_at | timestamp | Span start time |
ended_at | timestamp | Span end time |
Related Topics
- Executions -- Execution lifecycle
- Assets and Media -- Speech, image, and video tool outputs
- Cost Tracking -- Aggregated cost analysis
- Evaluations -- Scoring traces and individual spans