PromptRails

Tracing

Inspect the exact steps behind an agent run: prompts, model calls, tools, data, guardrails, errors, cost, and tokens.

Tracing turns an agent run into a timeline you can debug. Instead of seeing only the final answer, you can inspect the prompt render, model call, tool call, data source query, guardrail scan, memory lookup, media generation step, error, token usage, and cost that led to it.

PromptRails traces platform-managed runs automatically. You can also send spans from your own application code when you want PromptRails to act as a standalone LLM observability backend.

The trace detail view shows the high-level run metrics first, then the timeline, span tree, and selected span payloads for debugging.

What To Inspect First

Start with the trace when an execution looks wrong, slow, expensive, or surprising:

  • Wrong answer -- Open the prompt render and model call to see exactly what the model saw.
  • Tool issue -- Inspect the tool or MCP span for arguments, response, latency, and errors.
  • Data issue -- Open the data source span to check the query and returned context.
  • Safety issue -- Review guardrail spans to see whether a policy blocked, redacted, or only logged.
  • Cost issue -- Look at model spans and token counts before changing prompts or providers.

How Tracing Works

Every agent execution creates a trace. A trace is a tree of spans, and each span represents one operation inside the run. A span captures:

  • What happened (name, kind)
  • How long it took (duration, start/end timestamps)
  • What went in and came out (input/output)
  • Whether it succeeded (status, error details)
  • How much it cost (token usage, cost)
  • Context links (agent, prompt, model, execution, session)

Spans form a parent-child hierarchy that reflects the execution flow. In the UI, that hierarchy becomes the trace tree you use to move from the full run into a specific prompt render, model call, tool call, or guardrail scan.

Technical detailsExample span hierarchy
agent (root span)
  ├── guardrail (input scan)
  ├── prompt (template rendering)
  ├── llm (model call)
  │   └── tool (tool call by LLM)
  │       └── mcp_call (MCP server invocation)
  ├── guardrail (output scan)
  └── memory (memory update)
Technical detailsTrace ingestion methods

Sending Your Own Traces

Traces are created two ways:

  1. Automatically — every agent, prompt, and data source run on PromptRails is traced for you.
  2. From your own code — send spans to PromptRails from any application, even one that does not manage its prompts or agents on the platform. This lets you use PromptRails as a standalone LLM observability backend for LangChain, the OpenAI / Anthropic / Google GenAI SDKs, OpenTelemetry, or your own pipelines.

External spans only need an API key with the traces:write scope. They are tagged with a source attribute such as sdk or otlp, so you can separate them from PromptRails-managed runs in the trace list.

From an SDK

The Python, JavaScript, and Go SDKs each ship a tracing module that batches spans and sends them in the background.

Python

from promptrails.tracing import Tracer
 
tracer = Tracer(api_key="pr_...")
 
with tracer.span("agent-run", kind="agent") as root:
    root.set_input({"q": "weather?"})
    with tracer.span("llm-call", kind="llm") as llm:
        llm.set_model("gpt-4o").set_usage(prompt_tokens=120, completion_tokens=30)
 
tracer.flush()

JavaScript / TypeScript

import { Tracer } from '@promptrails/sdk/tracing'
 
const tracer = new Tracer({ apiKey: 'pr_...' })
 
await tracer.span('agent-run', { kind: 'agent' }, async (root) => {
  root.setInput({ q: 'weather?' })
  await tracer.span('llm-call', { kind: 'llm' }, async (llm) => {
    llm.setModel('gpt-4o').setUsage(120, 30)
  })
})
 
await tracer.flush()

Go

import "github.com/promptrails/go-sdk/tracing"
 
tracer := tracing.NewTracer("pr_...")
defer tracer.Shutdown()
 
_ = tracer.Span(ctx, "agent-run", tracing.KindAgent, func(ctx context.Context, root *tracing.Span) error {
    root.SetInput(map[string]any{"q": "weather?"})
    return tracer.Span(ctx, "llm-call", tracing.KindLLM, func(ctx context.Context, llm *tracing.Span) error {
        // SetUsage(promptTokens, completionTokens, totalTokens); -1 = auto-compute total
        llm.SetModel("gpt-4o").SetUsage(120, 30, -1)
        return nil
    })
})

Nested spans automatically share a trace and link to their parent, so the tree renders correctly in the trace viewer.

Framework auto-instrumentation

The Python and JavaScript SDKs include integrations that turn framework calls into spans with no manual instrumentation:

  • LangChain — a callback handler that traces chains, LLMs, tools, and retrievers.
  • OpenAI — wrap an OpenAI-compatible client to auto-trace every chat completion (model, tokens, latency, output).
  • Anthropic — wrap an Anthropic client to auto-trace every messages.create call.
  • Google GenAI — wrap a Google GenAI client to auto-trace every generateContent call.

See the Python SDK and JavaScript SDK pages for setup.

OpenTelemetry (OTLP)

If you already instrument with OpenTelemetry, point an OTLP/HTTP exporter at the PromptRails OTLP endpoint — no custom code required. GenAI semantic-convention attributes (gen_ai.*) are mapped onto the span model.

POST <base-url>/api/v1/otel/v1/traces
Header: X-API-Key: pr_...               # key with the traces:write scope
Content-Type: application/x-protobuf    (or application/json)

Configure an OpenTelemetry exporter with the endpoint <base-url>/api/v1/otel and an X-API-Key header; the standard exporter appends /v1/traces.

Native ingest endpoint

SDKs post to a simple batch endpoint you can also call directly:

POST <base-url>/api/v1/traces/ingest
Header: X-API-Key: pr_...               # key with the traces:write scope

{
  "spans": [
    {
      "trace_id": "a1b2c3...",
      "span_id": "1111...",
      "parent_span_id": "0000...",
      "name": "llm-call",
      "kind": "llm",
      "model_name": "gpt-4o",
      "prompt_tokens": 120,
      "completion_tokens": 30,
      "started_at": "2026-06-01T10:00:00Z",
      "ended_at": "2026-06-01T10:00:01Z"
    }
  ]
}

Set parent_span_id to a parent span's span_id to nest spans; omit it (or leave it empty) for a root span. The workspace is taken from the API key. Up to 1000 spans per request.

Technical detailsSpan kinds, hierarchy, and attributes

Span Kinds

PromptRails currently defines these span kinds:

KindIdentifierDescription
AgentagentTop-level agent execution span
LLMllmLLM model call (prompt + completion)
TooltoolTool invocation within an agent
Data SourcedatasourceDatabase or file query
PromptpromptPrompt template rendering
GuardrailguardrailInput or output guardrail scan
ChainchainChain-type agent orchestration
WorkflowworkflowWorkflow step execution
Agent Stepagent_stepIndividual step in a multi-agent execution
MCP Callmcp_callRemote MCP server tool call
PreprocessingpreprocessingInput preprocessing before LLM
PostprocessingpostprocessingOutput postprocessing after LLM
MemorymemoryMemory retrieval or storage
EmbeddingembeddingVector embedding generation
SpeechspeechText-to-speech or speech-to-text operation
ImageimageImage generation or editing
VideovideoVideo generation
StoragestorageAsset storage upload/download

Span Hierarchy

Spans are organized in a tree structure using trace IDs, span IDs, and parent span IDs:

  • trace_id -- Groups all spans belonging to the same execution
  • span_id -- Uniquely identifies a single span
  • parent_span_id -- Links a span to its parent (empty for root spans)

Example Trace for a Simple Agent

[agent] Customer Support Bot (15ms total)
  [guardrail] prompt_injection input scan (2ms)
  [prompt] Render main prompt (1ms)
  [llm] gpt-4o call (10ms, 450 prompt + 120 completion tokens, $0.003)
  [guardrail] pii output scan (2ms)

Example Trace for a Chain Agent

[chain] Data Analysis Pipeline (45ms total)
  [agent_step] Step 1: Data Extraction
    [datasource] Query analytics DB (8ms)
    [prompt] Render extraction prompt (1ms)
    [llm] gpt-4o call (15ms)
  [agent_step] Step 2: Analysis
    [memory] Retrieve analysis templates (2ms)
    [prompt] Render analysis prompt (1ms)
    [llm] claude-3.5-sonnet call (18ms)

Span Status and Levels

Status

StatusDescription
okThe span completed successfully
errorThe span encountered an error

Level

LevelDescription
debugDetailed diagnostic information
defaultStandard operational information
warningSomething unexpected but non-fatal
errorAn error occurred

Span Attributes

Each span carries an attributes JSON object with kind-specific metadata:

LLM Span Attributes

{
  "model": "gpt-4o",
  "provider": "openai",
  "temperature": 0.7,
  "max_tokens": 1024,
  "prompt_tokens": 450,
  "completion_tokens": 120,
  "total_tokens": 570,
  "cost": 0.003
}

Guardrail Span Attributes

{
  "scanner_type": "prompt_injection",
  "direction": "input",
  "action": "block",
  "triggered": false
}

Tool Span Attributes

{
  "tool_name": "weather_api",
  "tool_type": "api",
  "parameters": { "location": "New York" }
}

Media Span Attributes (Speech / Image / Video)

{
  "provider": "fal",
  "model": "fal-ai/flux/schnell",
  "media_type": "image_gen",
  "prompt": "A futuristic city skyline at sunset",
  "asset_url": "https://storage.example.com/assets/image.png",
  "content_type": "image/png"
}

Filtering Traces

List and filter traces with multiple criteria:

Technical detailsFilter traces with SDKs
traces = client.traces.list(
    agent_id="your-agent-id",          # Filter by agent
    execution_id="execution-id",        # Filter by execution
    kind="llm",                         # Filter by span kind
    status="ok",                        # Filter by status
    model_name="gpt-4o",               # Filter by model
    session_id="session-id",            # Filter by session
    user_id="user-id",                  # Filter by user
    page=1,
    limit=50
)

JavaScript SDK

const traces = await client.traces.list({
  agentId: 'your-agent-id',
  kind: 'llm',
  status: 'ok',
  page: 1,
  limit: 50,
})

Cost and Token Tracking Per Span

Every LLM span includes precise cost and token tracking:

FieldDescription
prompt_tokensNumber of input tokens sent to the model
completion_tokensNumber of output tokens generated
total_tokensSum of prompt and completion tokens
costCost in USD for this specific span
model_nameThe model used for this call

This makes cost attribution concrete: in a multi-step execution, you can see which model call was the expensive one instead of only seeing a total.

Error Information

When a span has an error status, additional fields provide diagnostic information:

FieldDescription
error_messageHuman-readable error description
error_typeError classification (e.g., rate_limit, timeout, validation)
error_stackStack trace for debugging
Technical detailsTrace field reference

Trace Fields

FieldTypeDescription
idKSUIDUnique span record ID
trace_idstringTrace group identifier
span_idstringUnique span identifier
parent_span_idstringParent span (empty for root)
namestringSpan name
kindstringSpan kind (one of 18 types)
statusstringok or error
levelstringdebug, default, warning, error
inputJSONSpan input data
outputJSONSpan output data
attributesJSONKind-specific metadata
tagsJSONCustom tags
token_usageJSONToken consumption
costfloatCost in USD
duration_msintegerDuration in milliseconds
model_namestringLLM model name
agent_idKSUIDAssociated agent
prompt_idKSUIDAssociated prompt
execution_idKSUIDAssociated execution
session_idstringAssociated chat session
started_attimestampSpan start time
ended_attimestampSpan end time