# Data Sources

> Connect databases and static files to your agents with versioned query templates and parameterized queries.

Source: https://0.0.0.0:8080/docs/data-sources

Data sources allow your agents to query external databases and files during execution. PromptRails supports multiple database technologies and provides versioned, parameterized query templates that can be safely used by agents.

## What Are Data Sources?

A data source represents a connection to an external data store. It defines:

- The database technology (PostgreSQL, MySQL, BigQuery, etc.)
- Connection credentials (linked to an encrypted credential)
- A query template with parameters
- Caching configuration
- Version history for safe iteration

When an agent executes, it can query its linked data sources to retrieve contextual information, look up records, or pull analytics data.

## Supported Databases

| Type            | Identifier    | Description                      |
| --------------- | ------------- | -------------------------------- |
| **PostgreSQL**  | `postgresql`  | Standard PostgreSQL databases    |
| **MySQL**       | `mysql`       | MySQL and MariaDB databases      |
| **BigQuery**    | `bigquery`    | Google BigQuery data warehouse   |
| **Snowflake**   | `snowflake`   | Snowflake cloud data platform    |
| **Redshift**    | `redshift`    | Amazon Redshift data warehouse   |
| **MSSQL**       | `mssql`       | Microsoft SQL Server             |
| **ClickHouse**  | `clickhouse`  | ClickHouse analytics database    |
| **Static File** | `static_file` | CSV, JSON, or other static files |

## Query Templates

Data source versions include a query template that uses parameter placeholders. Parameters are substituted at execution time from the agent's input.

### Example: PostgreSQL Query

```sql
SELECT order_id, status, total_amount, created_at
FROM orders
WHERE customer_id = :customer_id
  AND status = :status
ORDER BY created_at DESC
LIMIT :limit
```

### Parameters

Each query template defines its parameters:

```json
[
  {
    "name": "customer_id",
    "type": "string",
    "required": true,
    "description": "The customer's unique identifier"
  },
  {
    "name": "status",
    "type": "string",
    "required": false,
    "default": "active",
    "description": "Order status filter"
  },
  {
    "name": "limit",
    "type": "integer",
    "required": false,
    "default": "10",
    "description": "Maximum number of results"
  }
]
```

## Creating a Data Source

**Python SDK**

```python
# Create the data source
ds = client.data_sources.create(
    name="Customer Orders",
    description="Query customer order history",
    type="postgresql"
)

# Create a version with query template
version = client.data_sources.create_version(
    data_source_id=ds["data"]["id"],
    credential_id="your-postgresql-credential-id",
    query_template="""
        SELECT order_id, status, total_amount, created_at
        FROM orders
        WHERE customer_id = :customer_id
        ORDER BY created_at DESC
        LIMIT :limit
    """,
    parameters=[
        {"name": "customer_id", "type": "string", "required": True},
        {"name": "limit", "type": "integer", "required": False, "default": "10"}
    ],
    cache_timeout=300,
    message="Initial query for customer orders"
)
```

**JavaScript SDK**

```typescript
const ds = await client.dataSources.create({
  name: 'Customer Orders',
  description: 'Query customer order history',
  type: 'postgresql',
})

const version = await client.dataSources.createVersion(ds.data.id, {
  credentialId: 'your-postgresql-credential-id',
  queryTemplate: `
    SELECT order_id, status, total_amount, created_at
    FROM orders
    WHERE customer_id = :customer_id
    ORDER BY created_at DESC
    LIMIT :limit
  `,
  parameters: [
    { name: 'customer_id', type: 'string', required: true },
    { name: 'limit', type: 'integer', required: false, default: '10' },
  ],
  cacheTimeout: 300,
  message: 'Initial query for customer orders',
})
```

## Version Management

Data sources use the same immutable versioning pattern as agents and prompts:

- Each version captures the query template, parameters, credential, and connection config
- Exactly one version per data source is marked as current
- Promote versions to make them active
- Roll back by promoting a previous version

```python
# List versions
versions = client.data_sources.list_versions(data_source_id="your-ds-id")

# Promote a version
client.data_sources.promote_version(
    data_source_id="your-ds-id",
    version_id="version-to-promote"
)
```

## Connection Configuration

The `connection_config` object varies by database type. For databases that connect via the credential, this may be minimal. For BigQuery or Snowflake, it may include project/dataset/warehouse identifiers.

## Cache Timeout

Each version specifies a `cache_timeout` in seconds:

- `0` -- No caching (every query hits the database)
- `3600` -- Cache results for 1 hour (default)
- Any positive integer -- Cache duration in seconds

Caching is keyed on the rendered query (after parameter substitution), so different parameter values produce independent cache entries.

## Output Format

Data source versions support configurable output formats:

- `json` (default) -- Query results as JSON arrays
- `csv` -- Query results as CSV text

## Test Connections

Before using a data source in production, test the connection:

```python
result = client.data_sources.execute(
    data_source_id="your-ds-id",
    parameters={
        "customer_id": "test-customer",
        "limit": 5
    }
)

print(f"Status: {result['data']['status']}")
print(f"Duration: {result['data']['duration_ms']}ms")
print(f"Results: {result['data']['result']}")
```

## Using Data Sources in Agents

Link data sources to agents through the agent version configuration. When the agent executes, it can query linked data sources as part of its pipeline:

1. The agent receives input
2. Parameters from the input are mapped to data source query parameters
3. The query is executed and results are returned
4. Results are injected into the prompt context for the LLM

This enables agents to provide data-grounded responses based on real-time database queries.

## Data Source Status

| Status     | Description                  |
| ---------- | ---------------------------- |
| `active`   | Available for use by agents  |
| `archived` | Hidden and cannot be queried |

## Related Topics

- [Credentials](/docs/credentials) -- Database connection credentials
- [Agents](/docs/agents) -- Using data sources in agent configurations
- [Tracing](/docs/tracing) -- Data source queries appear as `datasource` spans
