Media Generation

Generate speech, images, and video using 8 media providers integrated into agents and workflows.

Media Generation

PromptRails supports multi-modal media generation through 8 providers across three categories: speech, image, and video. Media providers can be used as MCP tools within agents or as dedicated workflow nodes.

Providers

Speech

Provider	Capabilities	Models
ElevenLabs	Text-to-Speech	eleven_multilingual_v2, eleven_turbo_v2
Deepgram	TTS + Speech-to-Text	aura-asteria-en, nova-2

Image

Provider	Capabilities	Models
Fal	Image generation & editing	FLUX, Stable Diffusion
Replicate	Image generation	SDXL, open-source models
Stability AI	Image generation & editing	Stable Diffusion 3, SDXL

Video

Provider	Capabilities	Models
Runway	Text/image-to-video	Gen-3 Alpha
Pika	Text/image-to-video	Pika 1.0
Luma	Text/image-to-video	Dream Machine

Using Media in Agents

Media generation is available as MCP tools within agent executions. When a media tool is configured on an agent, the LLM can invoke it during execution.

Available Tool Types

tts -- Text-to-speech generation
stt -- Speech-to-text transcription
image_gen -- Image generation from text prompt
image_edit -- Image editing with prompt
video_gen -- Video generation from text prompt
video_from_image -- Video generation from image + prompt

Configuration Example

# Create an agent with media tools
agent = client.agents.create(
    name="Content Creator",
    type="simple",
    config={
        "model": "gpt-4o",
        "credential_id": "your-openai-credential-id",
        "tools": [
            {
                "type": "media",
                "media_type": "image_gen",
                "provider": "fal",
                "model": "fal-ai/flux/schnell",
                "credential_id": "your-fal-credential-id"
            },
            {
                "type": "media",
                "media_type": "tts",
                "provider": "elevenlabs",
                "model": "eleven_multilingual_v2",
                "credential_id": "your-elevenlabs-credential-id"
            }
        ]
    }
)

Using Media in Workflows

Workflow agents can include media generation nodes alongside LLM and tool nodes.

agent = client.agents.create(
    name="Video Pipeline",
    type="workflow",
    config={
        "nodes": [
            {
                "id": "generate_script",
                "type": "llm",
                "model": "gpt-4o",
                "credential_id": "your-openai-credential-id",
                "prompt_id": "script-writer-prompt"
            },
            {
                "id": "generate_image",
                "type": "media",
                "media_provider": "fal",
                "media_type": "image_gen",
                "model": "fal-ai/flux/schnell",
                "credential_id": "your-fal-credential-id"
            },
            {
                "id": "generate_video",
                "type": "media",
                "media_provider": "runway",
                "media_type": "video_from_image",
                "model": "gen3a_turbo",
                "credential_id": "your-runway-credential-id"
            }
        ],
        "edges": [
            { "from": "generate_script", "to": "generate_image" },
            { "from": "generate_image", "to": "generate_video" }
        ]
    }
)

Video Polling

Video generation is asynchronous. When a video job is submitted, PromptRails automatically polls the provider for completion using background workers. The polling configuration is:

Max attempts: 60
Poll interval: 10 seconds
Total timeout: ~10 minutes

Once the video is ready, the generated asset is downloaded and stored in your workspace's asset storage.

Asset Storage

All generated media (audio, images, video) is automatically uploaded to S3-compatible storage and tracked as assets. See Assets for details on managing generated media.

Tracing

Media generation spans appear in execution traces with dedicated span kinds:

speech -- TTS and STT operations
image -- Image generation and editing
video -- Video generation

Each span includes the provider, model, prompt, output URL, and estimated cost. See Tracing for more details.

Credentials -- Setting up media provider credentials
Assets -- Managing generated media files
Agents -- Agent configuration
Tracing -- Execution observability

Media Generation

Media Generation

Providers

Speech

Image

Video

Using Media in Agents

Available Tool Types

Configuration Example

Using Media in Workflows

Video Polling

Asset Storage

Tracing

Related Topics