# Data Masking

> Intercept PII in outbound LLM calls and replace it with opaque placeholders before the request leaves your workspace. The agent runtime sees real values; the cloud provider sees only tokens.

Source: https://0.0.0.0:8080/docs/data-masking

Data Masking is PromptRails' answer to the most common enterprise blocker for cloud LLMs: "we can't let customer data leave our perimeter." With masking on, sensitive values in your prompts, datasource results, and tool arguments are replaced with opaque placeholders before the request goes to OpenAI / Anthropic / Gemini / any other provider. The placeholders are restored on the response path, so your agents and tools keep seeing real values — only the provider sees the masked form.

  Data Masking is available on the **Pro** and **Enterprise** plans. Free and Starter workspaces will see an upgrade prompt on the masking settings page.

## What Gets Masked

Twelve built-in detectors run on every outbound LLM call when masking is enabled:

| Category   | Types                                                                                          |
| ---------- | ---------------------------------------------------------------------------------------------- |
| Identity   | Email, phone, credit card (Luhn-validated), IBAN (mod-97), US SSN, TC Kimlik No (TR national ID) |
| Network    | IPv4, IPv6                                                                                     |
| Secrets    | JWT tokens, AWS access keys, generic API keys (OpenAI / Anthropic / Stripe / GitHub / Slack / Google), PEM private key blocks |

Each detector pairs a pattern with a validator — credit cards must pass Luhn, IBANs must pass mod-97, US SSNs reject reserved area ranges, TC Kimlik passes the two-digit checksum. False positives stay rare.

Two additional types — **Name** and **Address** — have no regex (their shapes are too ambiguous) and are masked only when you mark them explicitly on a datasource column.

## How It Works

```
Prompt / tool result (real PII)
    │
    ▼
Detect + replace with [PII_TYPE_xxxxxxxx] placeholders
    │
    ▼
Cloud LLM provider (sees only the placeholders)
    │
    ▼
Response with placeholders preserved
    │
    ▼
Restore originals before the agent / tool / chat client reads them
```

The placeholder format is stable so the LLM can reason about coreferences ("the email I mentioned earlier") and tool calls receive the real value when they execute — the boundary is strictly the cloud provider.

## Workspace Policy

Open **Settings → Data Masking** in your workspace to flip the master switch on. Two controls:

- **Enabled** — off by default. When on, every outbound LLM call from this workspace runs through the masking engine.
- **Failure mode** — what to do if the masking engine itself fails (rare; the engine talks to a workspace-local store):
  - **Strict** (recommended) — abort the request rather than risk leaking PII. Best for compliance-sensitive workspaces.
  - **Permissive** — log a warning and let the request through. Useful in dev workspaces where stability beats strict guarantees.

You can also restrict masking to a subset of detector types — for example, "only mask EMAIL and PHONE" — when you want a targeted policy.

## Per-Agent and Per-Datasource Overrides

The workspace policy is the default. Individual agents and data sources can override it.

In the **Studio** detail page for an agent or data source, open the **PII Masking** tab (it's hidden behind the **+** menu — you opt in to see it). The tab offers a three-way control:

- **Inherit workspace policy** — follow whatever the workspace setting is. Recommended unless this specific resource has a different requirement.
- **Force on** — always mask this resource's outbound calls, even if the workspace policy is off.
- **Force off** — skip masking on this resource even if the workspace policy is on. Use sparingly — typically for a non-PII dev datasource where masking adds noise.

A small chip on the tab label tells you at a glance whether an explicit override is set.

## Marking Datasource Columns

Detectors catch values whose shape is recognizable (an email always looks like an email). They don't catch names, internal customer IDs, or domain-specific identifiers — but those are usually the values you most want to mask in queries.

Open a credential's detail page in **Settings → Credentials**. The schema view now carries a **PII** dropdown per column:

| Mark a column as | Behaviour                                                                                   |
| ----------------- | ------------------------------------------------------------------------------------------- |
| `Email`, `Phone`, `Credit card`, `IBAN`, `SSN`, `TC Kimlik`, `IP`, `JWT`, `AWS / API / Private key` | Validator type is matched — useful when a column holds the value but the detector would also catch it from raw text. |
| `Name`, `Address` | High-confidence masking for values no regex would catch.                                    |
| `none`            | Default. The column flows through unmasked unless a detector matches its content.            |

After saving, every value returned in that column from a datasource tool call gets masked before the prompt that wraps it reaches the LLM. The trace count badge tells you how many fields were intercepted.

## Trace Visibility

Every LLM span in the trace UI shows a small amber **`N PII masked`** chip in its header when the call had PII intercepted. The number tells you the count; the trace store never holds the original values themselves — only the count, model name, and the usual timing / token usage. This means an auditor can verify that masking ran without giving them a path to the raw PII that was masked.

The chip appears on the trace list view too, so you can scan a session for which calls had PII flow through them.

## Upstream Hints (API Clients)

When you call the OpenAI-compatible gateway directly (without going through an agent), you can attach structured PII markers with the `X-Masking-Hints` header. The value is a base64-encoded JSON array:

```bash
HINTS=$(printf '%s' '[{"value":"John Doe","type":"NAME"},{"value":"alice@x.com","type":"EMAIL"}]' | base64)

curl https://api.promptrails.ai/v1/chat/completions \
  -H "Authorization: Bearer $PROMPTRAILS_API_KEY" \
  -H "X-Workspace-ID: $WORKSPACE_ID" \
  -H "X-Masking-Hints: $HINTS" \
  -d '{
    "model": "pr/gpt-4o",
    "messages": [{"role":"user","content":"draft an email to John Doe at alice@x.com"}]
  }'
```

Hints take precedence over the built-in detectors on overlap — and they're the only way to mask `NAME` and `ADDRESS` values for ad-hoc text outside a datasource.

## What Happens On Downgrade

If a workspace was on the Pro plan with masking enabled and then downgrades to Starter, the gateway notices on the next request and stops masking — your settings document is preserved, so re-upgrading restores the prior behaviour. If you actively try to enable masking from a plan without the feature, the API returns `402 Payment Required` and the dashboard surfaces an upgrade card with a link to billing.

## What Doesn't Change

- The agent runtime sees real values. Tools called with a masked email get the real email when they execute.
- Coreference within a single conversation works — the LLM is given the same placeholder for repeat mentions, so it can still reason about "the customer", "their email", etc.
- Streaming continues to work — placeholders that span chunk boundaries are reassembled before they reach your client.

## Audit Notes

- Masking state is per-workspace; no data crosses between workspaces.
- The placeholder mapping store is encrypted at rest. Plaintext PII is never written to the masking store.
- Workspace mappings expire one hour after last activity by default.
- The trace store only ever sees masked content and the count attribute. There is no path from the trace UI to the original values.

## Related

- [Guardrails](/docs/guardrails) — content-safety scanners that run around the LLM call. Composes with masking; both are independently configurable.
- [Security](/docs/security) — workspace isolation, encryption, and authentication that masking builds on.
- [Billing & Plans](/docs/billing-and-plans) — feature-flag matrix per plan tier.
