# Media Generation

> Generate speech, images, and video using 8 media providers integrated into agents and workflows.

Source: https://0.0.0.0:8080/docs/media-generation

PromptRails supports multi-modal media generation through 8 providers across three categories: speech, image, and video. Media providers can be used as MCP tools within agents or as dedicated workflow nodes.

## Providers

### Speech

| Provider   | Capabilities         | Models                                  |
| ---------- | -------------------- | --------------------------------------- |
| ElevenLabs | Text-to-Speech       | eleven_multilingual_v2, eleven_turbo_v2 |
| Deepgram   | TTS + Speech-to-Text | aura-asteria-en, nova-2                 |

### Image

| Provider     | Capabilities               | Models                   |
| ------------ | -------------------------- | ------------------------ |
| Fal          | Image generation & editing | FLUX, Stable Diffusion   |
| Replicate    | Image generation           | SDXL, open-source models |
| Stability AI | Image generation & editing | Stable Diffusion 3, SDXL |

### Video

| Provider | Capabilities        | Models        |
| -------- | ------------------- | ------------- |
| Runway   | Text/image-to-video | Gen-3 Alpha   |
| Pika     | Text/image-to-video | Pika 1.0      |
| Luma     | Text/image-to-video | Dream Machine |

## Using Media in Agents

Media generation is available as MCP tools within agent executions. When a media tool is configured on an agent, the LLM can invoke it during execution.

### Available Tool Types

- `tts` -- Text-to-speech generation
- `stt` -- Speech-to-text transcription
- `image_gen` -- Image generation from text prompt
- `image_edit` -- Image editing with prompt
- `video_gen` -- Video generation from text prompt
- `video_from_image` -- Video generation from image + prompt

### Configuration Example

```python
# Create an agent with media tools
agent = client.agents.create(
    name="Content Creator",
    type="simple",
    config={
        "model": "gpt-4o",
        "credential_id": "your-openai-credential-id",
        "tools": [
            {
                "type": "media",
                "media_type": "image_gen",
                "provider": "fal",
                "model": "fal-ai/flux/schnell",
                "credential_id": "your-fal-credential-id"
            },
            {
                "type": "media",
                "media_type": "tts",
                "provider": "elevenlabs",
                "model": "eleven_multilingual_v2",
                "credential_id": "your-elevenlabs-credential-id"
            }
        ]
    }
)
```

## Using Media in Workflows

Workflow agents can include media generation nodes alongside LLM and tool nodes.

```python
agent = client.agents.create(
    name="Video Pipeline",
    type="workflow",
    config={
        "nodes": [
            {
                "id": "generate_script",
                "type": "llm",
                "model": "gpt-4o",
                "credential_id": "your-openai-credential-id",
                "prompt_id": "script-writer-prompt"
            },
            {
                "id": "generate_image",
                "type": "media",
                "media_provider": "fal",
                "media_type": "image_gen",
                "model": "fal-ai/flux/schnell",
                "credential_id": "your-fal-credential-id"
            },
            {
                "id": "generate_video",
                "type": "media",
                "media_provider": "runway",
                "media_type": "video_from_image",
                "model": "gen3a_turbo",
                "credential_id": "your-runway-credential-id"
            }
        ],
        "edges": [
            { "from": "generate_script", "to": "generate_image" },
            { "from": "generate_image", "to": "generate_video" }
        ]
    }
)
```

## Video Polling

Video generation is asynchronous. When a video job is submitted, PromptRails automatically polls the provider for completion using background workers. The polling configuration is:

- **Max attempts**: 60
- **Poll interval**: 10 seconds
- **Total timeout**: ~10 minutes

Once the video is ready, the generated asset is downloaded and stored in your workspace's asset storage.

## Asset Storage

All generated media (audio, images, video) is automatically uploaded to S3-compatible storage and tracked as assets. See [Assets](/docs/assets) for details on managing generated media.

## Tracing

Media generation spans appear in execution traces with dedicated span kinds:

- `speech` -- TTS and STT operations
- `image` -- Image generation and editing
- `video` -- Video generation

Each span includes the provider, model, prompt, output URL, and estimated cost. See [Tracing](/docs/tracing) for more details.

## Related Topics

- [Credentials](/docs/credentials) -- Setting up media provider credentials
- [Assets](/docs/assets) -- Managing generated media files
- [Agents](/docs/agents) -- Agent configuration
- [Tracing](/docs/tracing) -- Execution observability