Prompts

Write, test, version, and reuse model instructions without redeploying your application.

Prompts are the instructions and model settings that shape an LLM response. In PromptRails, prompts are first-class resources: you can edit them in Studio, test them directly, attach them to agents, and promote or roll back versions without redeploying your application.

PromptRails Studio showing a prompt editor with model settings and prompt content — Prompts sit beside agents in Studio. Open a prompt to edit instructions, review model settings, and see which workflow depends on that version.

Prompt Management Overview

A prompt version usually contains:

System prompt -- The standing instructions for role, tone, policy, and behavior.
User prompt -- A template that turns runtime input into the message sent to the model.
Model assignment -- The primary model and optional fallback model.
Parameters -- Temperature, max tokens, and top-p settings.
Input and output schemas -- Optional JSON schemas for validation and structured responses.
Cache timeout -- Optional PromptRails-side response caching for repeated inputs.
Version notes -- A changelog entry for why the version exists.

Prompts can be executed directly while you iterate, then linked to one or more agents when they are ready.

Technical detailsTemplate and schema details

Jinja2 Templating

PromptRails uses Jinja2 templating so the prompt can include values from the execution input. Keep templates readable: most production prompts only need variables, a few conditionals, and simple loops.

Basic Variables

You are a customer support agent for {{ company_name }}.
The customer's name is {{ customer_name }}.
 
Please help them with their inquiry:
{{ message }}

Conditionals

You are a {{ role }} assistant.
 
{% if language == "spanish" %}
Please respond in Spanish.
{% elif language == "french" %}
Please respond in French.
{% else %}
Please respond in English.
{% endif %}
 
User query: {{ message }}

Loops

Here are the relevant documents for context:
 
{% for doc in documents %}
Document {{ loop.index }}: {{ doc.title }}
Content: {{ doc.content }}
---
{% endfor %}
 
Based on the above documents, answer: {{ question }}

Filters

Customer name: {{ name | upper }}
Order date: {{ date | default("Unknown") }}
Summary: {{ long_text | truncate(200) }}

Input and Output Schemas

Use schemas when the caller or downstream tool expects a stable shape. Input schemas catch bad requests before the model call; output schemas make the response easier to parse and evaluate.

Input Schema

{
  "type": "object",
  "properties": {
    "message": {
      "type": "string",
      "description": "The user's message"
    },
    "language": {
      "type": "string",
      "enum": ["en", "es", "fr", "de"],
      "default": "en"
    },
    "context": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["message"]
}

Output Schema

{
  "type": "object",
  "properties": {
    "response": { "type": "string" },
    "sentiment": {
      "type": "string",
      "enum": ["positive", "neutral", "negative"]
    },
    "confidence": {
      "type": "number",
      "minimum": 0,
      "maximum": 1
    }
  }
}

Model Assignment

Each prompt version selects the model used for execution:

Primary model -- The default model for execution
Fallback model -- Used if the primary model fails or is unavailable

Models are configured from workspace credentials and then referenced by PromptRails. See Credentials for provider setup.

Temperature, Max Tokens, and Top P

These standard parameters apply to most prompt runs:

Parameter	Default	Range	Description
`temperature`	0.7	0.0 - 1.0	Controls randomness. Lower values produce more deterministic outputs.
`max_tokens`	Provider default	Varies by model	Maximum number of tokens in the response.
`top_p`	Provider default	0.0 - 1.0	Nucleus sampling. Controls diversity by limiting the token pool.

Use lower temperature for extraction, classification, and policy work. Use higher temperature when the prompt is meant to explore ideas or generate varied drafts. Keep max_tokens explicit when the caller or downstream system expects a bounded response.

Model Capabilities

Some model controls only appear when the selected model supports them. PromptRails shows these capability-gated settings in the prompt configuration instead of asking teams to memorize which provider supports which feature.

Reasoning

Models with extended reasoning support can expose a Reasoning effort control. Higher effort lets the model spend more on internal reasoning before answering. It is useful for harder analysis tasks, but it can add latency and token usage. Reasoning token counts are reported in traces when the provider returns them.

Web Search

When a model supports provider-native web search, enabling Web search lets the model search during a run and return citations with the response. Citations are captured with the output so the run can be reviewed later.

Structured Output

Structured output constrains the model response to JSON, optionally against a schema you define. Use it when the caller, evaluator, or downstream tool expects a stable shape instead of free-form text.

Structured output and tool calls interact: when a schema is set, the requested response shape takes precedence over free-form tool selection.

Model Deprecation

Models can be marked deprecated when a newer model supersedes them. A deprecated model that is not in use is hidden from new selection; a deprecated model already attached to a prompt stays visible with a warning so existing workflows keep running while the team migrates.

Technical detailsProvider feature and prompt API details

Provider Prompt Caching

Enabling provider-side Prompt caching lets the provider reuse compute for a repeated prompt prefix, such as a long system prompt, document, or few-shot example. This can reduce cost and latency on follow-up calls.

This is distinct from PromptRails-side response caching, which short-circuits the LLM call entirely for identical rendered inputs.

Some providers use explicit cache breakpoints.
Some providers cache repeated prefixes implicitly.
Cached token counts are reported in traces when the provider returns them.

Response Caching

PromptRails supports response caching at the prompt version level. When cache_timeout is greater than 0, identical rendered prompts can return a cached response without another LLM call.

This is PromptRails-side response caching, distinct from provider prompt-prefix caching. Provider caching reduces the cost of repeated prompt prefixes; PromptRails response caching skips the model call for identical rendered inputs.

version = client.prompts.create_version(
    prompt_id="your-prompt-id",
    system_prompt="You are a helpful assistant.",
    user_prompt="Translate '{{ text }}' to {{ target_language }}.",
    temperature=0.3,
    cache_timeout=3600,  # Cache responses for 1 hour
    message="Added caching for translation prompt"
)

Caching is keyed on the rendered prompt content (after template variables are substituted), so different inputs produce different cache entries.

Creating Prompts

Python SDK

# Create a prompt
prompt = client.prompts.create(
    name="Product Description Generator",
    description="Generates product descriptions from features"
)
 
# Create the first version
version = client.prompts.create_version(
    prompt_id=prompt["data"]["id"],
    system_prompt="You are an expert copywriter who writes compelling product descriptions.",
    user_prompt="""Write a product description for:
Product: {{ product_name }}
Category: {{ category }}
Features:
{% for feature in features %}
- {{ feature }}
{% endfor %}
 
The description should be {{ tone }} and approximately {{ word_count }} words.""",
    input_schema={
        "type": "object",
        "properties": {
            "product_name": {"type": "string"},
            "category": {"type": "string"},
            "features": {"type": "array", "items": {"type": "string"}},
            "tone": {"type": "string", "default": "professional"},
            "word_count": {"type": "integer", "default": 150}
        },
        "required": ["product_name", "features"]
    },
    temperature=0.8,
    max_tokens=512,
    message="Initial version"
)

JavaScript SDK

const prompt = await client.prompts.create({
  name: 'Product Description Generator',
  description: 'Generates product descriptions from features',
})
 
const version = await client.prompts.createVersion(prompt.data.id, {
  systemPrompt: 'You are an expert copywriter who writes compelling product descriptions.',
  userPrompt: `Write a product description for:
Product: {{ product_name }}
Category: {{ category }}
Features:
{% for feature in features %}
- {{ feature }}
{% endfor %}`,
  temperature: 0.8,
  maxTokens: 512,
  message: 'Initial version',
})

Testing Prompts

Execute a prompt directly to test it without going through an agent:

result = client.prompts.execute(
    prompt_id="your-prompt-id",
    input={
        "product_name": "Wireless Earbuds Pro",
        "category": "Electronics",
        "features": [
            "Active noise cancellation",
            "30-hour battery life",
            "IPX5 water resistance"
        ],
        "tone": "enthusiastic"
    }
)
 
print(result["data"]["output"])

Prompt Status

Status	Description
`active`	The prompt is available for use and execution
`archived`	The prompt is hidden from listings and cannot be executed

Prompt Versioning -- Version management and promotion
Agents -- How agents use prompts
Tracing -- Prompt rendering appears as prompt spans in traces