# Guardrails

> Protect your agents with 14 built-in scanner types for input and output validation, including toxicity detection, PII filtering, and prompt injection prevention.

Source: https://0.0.0.0:8080/docs/guardrails

Guardrails are safety scanners that validate agent inputs and outputs before and after LLM execution. They protect against harmful content, data leakage, prompt injection, and other risks.

## What Are Guardrails?

A guardrail is a scanner attached to an agent that inspects content at a specific point in the execution pipeline:

- **Input guardrails** scan the user's input before it reaches the LLM
- **Output guardrails** scan the LLM's response before it is returned to the user

Each guardrail has a scanner type, an action to take when triggered, and a sort order that determines the evaluation sequence.

## Direction: Input vs Output

| Direction | When It Runs         | Purpose                                                                   |
| --------- | -------------------- | ------------------------------------------------------------------------- |
| `input`   | Before LLM execution | Validate user input, block prompt injection, detect PII                   |
| `output`  | After LLM execution  | Filter harmful responses, redact sensitive data, enforce content policies |

## Scanner Types

PromptRails includes 14 built-in scanner types:

### Content Safety

| Scanner             | Identifier   | Description                                                            |
| ------------------- | ------------ | ---------------------------------------------------------------------- |
| **Toxicity**        | `toxicity`   | Detects toxic, abusive, or hateful language in text                    |
| **Harmful Content** | `harmful`    | Identifies content that promotes harm, violence, or illegal activities |
| **Bias Detection**  | `bias`       | Detects biased or discriminatory language                              |
| **No Refusal**      | `no_refusal` | Ensures the LLM does not refuse to answer (output only)                |

### Data Protection

| Scanner               | Identifier  | Description                                                                            |
| --------------------- | ----------- | -------------------------------------------------------------------------------------- |
| **PII Detection**     | `pii`       | Detects personally identifiable information (names, emails, phone numbers, SSNs, etc.) |
| **Anonymize**         | `anonymize` | Replaces detected PII with placeholder tokens                                          |
| **Secrets Detection** | `secrets`   | Detects API keys, passwords, tokens, and other secrets in text                         |
| **Sensitive Data**    | `sensitive` | Detects broader categories of sensitive information                                    |

### Security

| Scanner              | Identifier         | Description                                                                  |
| -------------------- | ------------------ | ---------------------------------------------------------------------------- |
| **Prompt Injection** | `prompt_injection` | Detects attempts to override system instructions or inject malicious prompts |
| **Invisible Text**   | `invisible_text`   | Detects hidden Unicode characters or zero-width text used for injection      |
| **Malicious URLs**   | `malicious_urls`   | Detects known malicious, phishing, or suspicious URLs                        |

### Content Filtering

| Scanner                | Identifier       | Description                                                 |
| ---------------------- | ---------------- | ----------------------------------------------------------- |
| **Substring Ban**      | `ban_substrings` | Blocks content containing specified banned words or phrases |
| **Topic Ban**          | `ban_topics`     | Blocks content related to specified banned topics           |
| **Language Detection** | `language`       | Ensures content is in the expected language(s)              |

## Actions

When a guardrail scanner triggers, it takes one of three actions:

| Action   | Behavior                                                                 |
| -------- | ------------------------------------------------------------------------ |
| `block`  | Stops execution and returns an error. The LLM response is not delivered. |
| `redact` | Removes or replaces the offending content and continues execution.       |
| `log`    | Records the detection in the trace but allows execution to continue.     |

## Configuring Guardrails

Guardrails are configured per agent. Each agent can have multiple guardrails with different scanners, directions, and actions.

**Python SDK**

```python
# Add an input guardrail for prompt injection
client.guardrails.create(
    agent_id="your-agent-id",
    type="input",
    scanner_type="prompt_injection",
    action="block",
    sort_order=1,
    config={}
)

# Add an output guardrail for PII
client.guardrails.create(
    agent_id="your-agent-id",
    type="output",
    scanner_type="pii",
    action="redact",
    sort_order=1,
    config={
        "entities": ["email", "phone", "ssn", "credit_card"]
    }
)

# Add a substring ban
client.guardrails.create(
    agent_id="your-agent-id",
    type="input",
    scanner_type="ban_substrings",
    action="block",
    sort_order=2,
    config={
        "substrings": ["ignore previous instructions", "system prompt"],
        "case_sensitive": False
    }
)
```

**JavaScript SDK**

```typescript
await client.guardrails.create({
  agentId: 'your-agent-id',
  type: 'input',
  scannerType: 'prompt_injection',
  action: 'block',
  sortOrder: 1,
  config: {},
})

await client.guardrails.create({
  agentId: 'your-agent-id',
  type: 'output',
  scannerType: 'pii',
  action: 'redact',
  sortOrder: 1,
  config: {
    entities: ['email', 'phone', 'ssn', 'credit_card'],
  },
})
```

## Sort Order

Guardrails execute in sort order (ascending) within each direction. Lower numbers execute first.

A typical input guardrail ordering might be:

1. `invisible_text` (sort_order: 1) -- Detect hidden characters first
2. `prompt_injection` (sort_order: 2) -- Block injection attempts
3. `toxicity` (sort_order: 3) -- Filter toxic content
4. `ban_substrings` (sort_order: 4) -- Apply custom word filters

If a guardrail with `block` action triggers, subsequent guardrails are not evaluated.

## Scanner Configuration

Each scanner type accepts a configuration object (`config`) for customization:

### ban_substrings

```json
{
  "substrings": ["forbidden phrase", "blocked word"],
  "case_sensitive": false
}
```

### ban_topics

```json
{
  "topics": ["politics", "religion", "gambling"]
}
```

### pii

```json
{
  "entities": ["email", "phone", "ssn", "credit_card", "address"]
}
```

### language

```json
{
  "languages": ["en", "es", "fr"],
  "action_on_mismatch": "block"
}
```

### toxicity, harmful, bias, prompt_injection, secrets, invisible_text, malicious_urls, anonymize, no_refusal, sensitive

These scanners typically work with an empty configuration object `{}` and use their built-in detection models.

## Managing Guardrails

```python
# List guardrails for an agent
guardrails = client.guardrails.list(agent_id="your-agent-id")

# Update a guardrail
client.guardrails.update(
    guardrail_id="guardrail-id",
    action="log",  # Change from block to log
    is_active=True
)

# Disable a guardrail (without deleting)
client.guardrails.update(
    guardrail_id="guardrail-id",
    is_active=False
)

# Delete a guardrail
client.guardrails.delete(guardrail_id="guardrail-id")
```

## Guardrail Traces

Every guardrail evaluation produces a `guardrail` span in the execution trace, recording:

- Which scanner was used
- Whether it triggered
- What action was taken
- The duration of the scan
- Any details about detected content

This provides full visibility into why content was blocked or redacted.

## Best Practices

- **Layer your guardrails** -- Use multiple scanners in combination for defense in depth
- **Start with `log` mode** -- Monitor what would be caught before switching to `block`
- **Prioritize injection prevention** -- Always run `prompt_injection` on inputs
- **Protect PII** -- Use `pii` or `anonymize` on outputs to prevent data leakage
- **Test with adversarial inputs** -- Verify your guardrails catch edge cases
- **Monitor guardrail traces** -- Review blocked content regularly to tune configurations

## Related Topics

- [Agents](/docs/agents) -- Attaching guardrails to agents
- [Tracing](/docs/tracing) -- Guardrail evaluation spans
- [Security](/docs/security) -- Overall security architecture
