Tracing

Inspect the exact steps behind an agent run: prompts, model calls, tools, data, guardrails, errors, cost, and tokens.

Tracing turns an agent run into a timeline you can debug. Instead of seeing only the final answer, you can inspect the prompt render, model call, tool call, data source query, guardrail scan, memory lookup, media generation step, error, token usage, and cost that led to it.

PromptRails traces platform-managed runs automatically. You can also send spans from your own application code when you want PromptRails to act as a standalone LLM observability backend.

PromptRails trace detail with timeline, span tree, cost, tokens, and span detail — The trace detail view shows the high-level run metrics first, then the timeline, span tree, and selected span payloads for debugging.

What To Inspect First

Start with the trace when an execution looks wrong, slow, expensive, or surprising:

Wrong answer -- Open the prompt render and model call to see exactly what the model saw.
Tool issue -- Inspect the tool or MCP span for arguments, response, latency, and errors.
Data issue -- Open the data source span to check the query and returned context.
Safety issue -- Review guardrail spans to see whether a policy blocked, redacted, or only logged.
Cost issue -- Look at model spans and token counts before changing prompts or providers.

How Tracing Works

Every agent execution creates a trace. A trace is a tree of spans, and each span represents one operation inside the run. A span captures:

What happened (name, kind)
How long it took (duration, start/end timestamps)
What went in and came out (input/output)
Whether it succeeded (status, error details)
How much it cost (token usage, cost)
Context links (agent, prompt, model, execution, session)

Spans form a parent-child hierarchy that reflects the execution flow. In the UI, that hierarchy becomes the trace tree you use to move from the full run into a specific prompt render, model call, tool call, or guardrail scan.

Technical detailsExample span hierarchy

agent (root span)
  ├── guardrail (input scan)
  ├── prompt (template rendering)
  ├── llm (model call)
  │   └── tool (tool call by LLM)
  │       └── mcp_call (MCP server invocation)
  ├── guardrail (output scan)
  └── memory (memory update)

Technical detailsTrace ingestion methods

Sending Your Own Traces

Traces are created two ways:

Automatically — every agent, prompt, and data source run on PromptRails is traced for you.
From your own code — send spans to PromptRails from any application, even one that does not manage its prompts or agents on the platform. This lets you use PromptRails as a standalone LLM observability backend for LangChain, the OpenAI / Anthropic / Google GenAI SDKs, OpenTelemetry, or your own pipelines.

External spans only need an API key with the traces:write scope. They are tagged with a source attribute such as sdk or otlp, so you can separate them from PromptRails-managed runs in the trace list.

From an SDK

The Python, JavaScript, and Go SDKs each ship a tracing module that batches spans and sends them in the background.

Python

from promptrails.tracing import Tracer
 
tracer = Tracer(api_key="pr_...")
 
with tracer.span("agent-run", kind="agent") as root:
    root.set_input({"q": "weather?"})
    with tracer.span("llm-call", kind="llm") as llm:
        llm.set_model("gpt-4o").set_usage(prompt_tokens=120, completion_tokens=30)
 
tracer.flush()

JavaScript / TypeScript

import { Tracer } from '@promptrails/sdk/tracing'
 
const tracer = new Tracer({ apiKey: 'pr_...' })
 
await tracer.span('agent-run', { kind: 'agent' }, async (root) => {
  root.setInput({ q: 'weather?' })
  await tracer.span('llm-call', { kind: 'llm' }, async (llm) => {
    llm.setModel('gpt-4o').setUsage(120, 30)
  })
})
 
await tracer.flush()

import "github.com/promptrails/go-sdk/tracing"
 
tracer := tracing.NewTracer("pr_...")
defer tracer.Shutdown()
 
_ = tracer.Span(ctx, "agent-run", tracing.KindAgent, func(ctx context.Context, root *tracing.Span) error {
    root.SetInput(map[string]any{"q": "weather?"})
    return tracer.Span(ctx, "llm-call", tracing.KindLLM, func(ctx context.Context, llm *tracing.Span) error {
        // SetUsage(promptTokens, completionTokens, totalTokens); -1 = auto-compute total
        llm.SetModel("gpt-4o").SetUsage(120, 30, -1)
        return nil
    })
})

Nested spans automatically share a trace and link to their parent, so the tree renders correctly in the trace viewer.

Framework auto-instrumentation

The Python and JavaScript SDKs include integrations that turn framework calls into spans with no manual instrumentation:

LangChain — a callback handler that traces chains, LLMs, tools, and retrievers.
OpenAI — wrap an OpenAI-compatible client to auto-trace every chat completion (model, tokens, latency, output).
Anthropic — wrap an Anthropic client to auto-trace every messages.create call.
Google GenAI — wrap a Google GenAI client to auto-trace every generateContent call.

See the Python SDK and JavaScript SDK pages for setup.

OpenTelemetry (OTLP)

If you already instrument with OpenTelemetry, point an OTLP/HTTP exporter at the PromptRails OTLP endpoint — no custom code required. GenAI semantic-convention attributes (gen_ai.*) are mapped onto the span model.

POST <base-url>/api/v1/otel/v1/traces
Header: X-API-Key: pr_...               # key with the traces:write scope
Content-Type: application/x-protobuf    (or application/json)

Configure an OpenTelemetry exporter with the endpoint <base-url>/api/v1/otel and an X-API-Key header; the standard exporter appends /v1/traces.

Native ingest endpoint

SDKs post to a simple batch endpoint you can also call directly:

POST <base-url>/api/v1/traces/ingest
Header: X-API-Key: pr_...               # key with the traces:write scope

{
  "spans": [
    {
      "trace_id": "a1b2c3...",
      "span_id": "1111...",
      "parent_span_id": "0000...",
      "name": "llm-call",
      "kind": "llm",
      "model_name": "gpt-4o",
      "prompt_tokens": 120,
      "completion_tokens": 30,
      "started_at": "2026-06-01T10:00:00Z",
      "ended_at": "2026-06-01T10:00:01Z"
    }
  ]
}

Set parent_span_id to a parent span's span_id to nest spans; omit it (or leave it empty) for a root span. The workspace is taken from the API key. Up to 1000 spans per request.

Technical detailsSpan kinds, hierarchy, and attributes

Span Kinds

PromptRails currently defines these span kinds:

Kind	Identifier	Description
Agent	`agent`	Top-level agent execution span
LLM	`llm`	LLM model call (prompt + completion)
Tool	`tool`	Tool invocation within an agent
Data Source	`datasource`	Database or file query
Prompt	`prompt`	Prompt template rendering
Guardrail	`guardrail`	Input or output guardrail scan
Chain	`chain`	Chain-type agent orchestration
Workflow	`workflow`	Workflow step execution
Agent Step	`agent_step`	Individual step in a multi-agent execution
MCP Call	`mcp_call`	Remote MCP server tool call
Preprocessing	`preprocessing`	Input preprocessing before LLM
Postprocessing	`postprocessing`	Output postprocessing after LLM
Memory	`memory`	Memory retrieval or storage
Embedding	`embedding`	Vector embedding generation
Speech	`speech`	Text-to-speech or speech-to-text operation
Image	`image`	Image generation or editing
Video	`video`	Video generation
Storage	`storage`	Asset storage upload/download

Span Hierarchy

Spans are organized in a tree structure using trace IDs, span IDs, and parent span IDs:

trace_id -- Groups all spans belonging to the same execution
span_id -- Uniquely identifies a single span
parent_span_id -- Links a span to its parent (empty for root spans)

Example Trace for a Simple Agent

[agent] Customer Support Bot (15ms total)
  [guardrail] prompt_injection input scan (2ms)
  [prompt] Render main prompt (1ms)
  [llm] gpt-4o call (10ms, 450 prompt + 120 completion tokens, $0.003)
  [guardrail] pii output scan (2ms)

Example Trace for a Chain Agent

[chain] Data Analysis Pipeline (45ms total)
  [agent_step] Step 1: Data Extraction
    [datasource] Query analytics DB (8ms)
    [prompt] Render extraction prompt (1ms)
    [llm] gpt-4o call (15ms)
  [agent_step] Step 2: Analysis
    [memory] Retrieve analysis templates (2ms)
    [prompt] Render analysis prompt (1ms)
    [llm] claude-3.5-sonnet call (18ms)

Span Status and Levels

Status

Status	Description
`ok`	The span completed successfully
`error`	The span encountered an error

Level

Level	Description
`debug`	Detailed diagnostic information
`default`	Standard operational information
`warning`	Something unexpected but non-fatal
`error`	An error occurred

Span Attributes

Each span carries an attributes JSON object with kind-specific metadata:

LLM Span Attributes

{
  "model": "gpt-4o",
  "provider": "openai",
  "temperature": 0.7,
  "max_tokens": 1024,
  "prompt_tokens": 450,
  "completion_tokens": 120,
  "total_tokens": 570,
  "cost": 0.003
}

Guardrail Span Attributes

{
  "scanner_type": "prompt_injection",
  "direction": "input",
  "action": "block",
  "triggered": false
}

Tool Span Attributes

{
  "tool_name": "weather_api",
  "tool_type": "api",
  "parameters": { "location": "New York" }
}

Media Span Attributes (Speech / Image / Video)

{
  "provider": "fal",
  "model": "fal-ai/flux/schnell",
  "media_type": "image_gen",
  "prompt": "A futuristic city skyline at sunset",
  "asset_url": "https://storage.example.com/assets/image.png",
  "content_type": "image/png"
}

Filtering Traces

List and filter traces with multiple criteria:

Technical detailsFilter traces with SDKs

traces = client.traces.list(
    agent_id="your-agent-id",          # Filter by agent
    execution_id="execution-id",        # Filter by execution
    kind="llm",                         # Filter by span kind
    status="ok",                        # Filter by status
    model_name="gpt-4o",               # Filter by model
    session_id="session-id",            # Filter by session
    user_id="user-id",                  # Filter by user
    page=1,
    limit=50
)

JavaScript SDK

const traces = await client.traces.list({
  agentId: 'your-agent-id',
  kind: 'llm',
  status: 'ok',
  page: 1,
  limit: 50,
})

Cost and Token Tracking Per Span

Every LLM span includes precise cost and token tracking:

Field	Description
`prompt_tokens`	Number of input tokens sent to the model
`completion_tokens`	Number of output tokens generated
`total_tokens`	Sum of prompt and completion tokens
`cost`	Cost in USD for this specific span
`model_name`	The model used for this call

This makes cost attribution concrete: in a multi-step execution, you can see which model call was the expensive one instead of only seeing a total.

Error Information

When a span has an error status, additional fields provide diagnostic information:

Field	Description
`error_message`	Human-readable error description
`error_type`	Error classification (e.g., `rate_limit`, `timeout`, `validation`)
`error_stack`	Stack trace for debugging

Technical detailsTrace field reference

Trace Fields

Field	Type	Description
`id`	KSUID	Unique span record ID
`trace_id`	string	Trace group identifier
`span_id`	string	Unique span identifier
`parent_span_id`	string	Parent span (empty for root)
`name`	string	Span name
`kind`	string	Span kind (one of 18 types)
`status`	string	`ok` or `error`
`level`	string	`debug`, `default`, `warning`, `error`
`input`	JSON	Span input data
`output`	JSON	Span output data
`attributes`	JSON	Kind-specific metadata
`tags`	JSON	Custom tags
`token_usage`	JSON	Token consumption
`cost`	float	Cost in USD
`duration_ms`	integer	Duration in milliseconds
`model_name`	string	LLM model name
`agent_id`	KSUID	Associated agent
`prompt_id`	KSUID	Associated prompt
`execution_id`	KSUID	Associated execution
`session_id`	string	Associated chat session
`started_at`	timestamp	Span start time
`ended_at`	timestamp	Span end time

Executions -- Execution lifecycle
Assets and Media -- Speech, image, and video tool outputs
Cost Tracking -- Aggregated cost analysis
Evaluations -- Scoring traces and individual spans