PromptRails

Media Generation

Generate speech, images, and video using 8 media providers integrated into agents and workflows.

Media Generation

PromptRails supports multi-modal media generation through 8 providers across three categories: speech, image, and video. Media providers can be used as MCP tools within agents or as dedicated workflow nodes.

Providers

Speech

ProviderCapabilitiesModels
ElevenLabsText-to-Speecheleven_multilingual_v2, eleven_turbo_v2
DeepgramTTS + Speech-to-Textaura-asteria-en, nova-2

Image

ProviderCapabilitiesModels
FalImage generation & editingFLUX, Stable Diffusion
ReplicateImage generationSDXL, open-source models
Stability AIImage generation & editingStable Diffusion 3, SDXL

Video

ProviderCapabilitiesModels
RunwayText/image-to-videoGen-3 Alpha
PikaText/image-to-videoPika 1.0
LumaText/image-to-videoDream Machine

Using Media in Agents

Media generation is available as MCP tools within agent executions. When a media tool is configured on an agent, the LLM can invoke it during execution.

Available Tool Types

  • tts -- Text-to-speech generation
  • stt -- Speech-to-text transcription
  • image_gen -- Image generation from text prompt
  • image_edit -- Image editing with prompt
  • video_gen -- Video generation from text prompt
  • video_from_image -- Video generation from image + prompt

Configuration Example

# Create an agent with media tools
agent = client.agents.create(
    name="Content Creator",
    type="simple",
    config={
        "model": "gpt-4o",
        "credential_id": "your-openai-credential-id",
        "tools": [
            {
                "type": "media",
                "media_type": "image_gen",
                "provider": "fal",
                "model": "fal-ai/flux/schnell",
                "credential_id": "your-fal-credential-id"
            },
            {
                "type": "media",
                "media_type": "tts",
                "provider": "elevenlabs",
                "model": "eleven_multilingual_v2",
                "credential_id": "your-elevenlabs-credential-id"
            }
        ]
    }
)

Using Media in Workflows

Workflow agents can include media generation nodes alongside LLM and tool nodes.

agent = client.agents.create(
    name="Video Pipeline",
    type="workflow",
    config={
        "nodes": [
            {
                "id": "generate_script",
                "type": "llm",
                "model": "gpt-4o",
                "credential_id": "your-openai-credential-id",
                "prompt_id": "script-writer-prompt"
            },
            {
                "id": "generate_image",
                "type": "media",
                "media_provider": "fal",
                "media_type": "image_gen",
                "model": "fal-ai/flux/schnell",
                "credential_id": "your-fal-credential-id"
            },
            {
                "id": "generate_video",
                "type": "media",
                "media_provider": "runway",
                "media_type": "video_from_image",
                "model": "gen3a_turbo",
                "credential_id": "your-runway-credential-id"
            }
        ],
        "edges": [
            { "from": "generate_script", "to": "generate_image" },
            { "from": "generate_image", "to": "generate_video" }
        ]
    }
)

Video Polling

Video generation is asynchronous. When a video job is submitted, PromptRails automatically polls the provider for completion using background workers. The polling configuration is:

  • Max attempts: 60
  • Poll interval: 10 seconds
  • Total timeout: ~10 minutes

Once the video is ready, the generated asset is downloaded and stored in your workspace's asset storage.

Asset Storage

All generated media (audio, images, video) is automatically uploaded to S3-compatible storage and tracked as assets. See Assets for details on managing generated media.

Tracing

Media generation spans appear in execution traces with dedicated span kinds:

  • speech -- TTS and STT operations
  • image -- Image generation and editing
  • video -- Video generation

Each span includes the provider, model, prompt, output URL, and estimated cost. See Tracing for more details.

  • Credentials -- Setting up media provider credentials
  • Assets -- Managing generated media files
  • Agents -- Agent configuration
  • Tracing -- Execution observability