Media Generation
Generate speech, images, and video using 8 media providers integrated into agents and workflows.
Media Generation
PromptRails supports multi-modal media generation through 8 providers across three categories: speech, image, and video. Media providers can be used as MCP tools within agents or as dedicated workflow nodes.
Providers
Speech
| Provider | Capabilities | Models |
|---|---|---|
| ElevenLabs | Text-to-Speech | eleven_multilingual_v2, eleven_turbo_v2 |
| Deepgram | TTS + Speech-to-Text | aura-asteria-en, nova-2 |
Image
| Provider | Capabilities | Models |
|---|---|---|
| Fal | Image generation & editing | FLUX, Stable Diffusion |
| Replicate | Image generation | SDXL, open-source models |
| Stability AI | Image generation & editing | Stable Diffusion 3, SDXL |
Video
| Provider | Capabilities | Models |
|---|---|---|
| Runway | Text/image-to-video | Gen-3 Alpha |
| Pika | Text/image-to-video | Pika 1.0 |
| Luma | Text/image-to-video | Dream Machine |
Using Media in Agents
Media generation is available as MCP tools within agent executions. When a media tool is configured on an agent, the LLM can invoke it during execution.
Available Tool Types
tts-- Text-to-speech generationstt-- Speech-to-text transcriptionimage_gen-- Image generation from text promptimage_edit-- Image editing with promptvideo_gen-- Video generation from text promptvideo_from_image-- Video generation from image + prompt
Configuration Example
# Create an agent with media tools
agent = client.agents.create(
name="Content Creator",
type="simple",
config={
"model": "gpt-4o",
"credential_id": "your-openai-credential-id",
"tools": [
{
"type": "media",
"media_type": "image_gen",
"provider": "fal",
"model": "fal-ai/flux/schnell",
"credential_id": "your-fal-credential-id"
},
{
"type": "media",
"media_type": "tts",
"provider": "elevenlabs",
"model": "eleven_multilingual_v2",
"credential_id": "your-elevenlabs-credential-id"
}
]
}
)Using Media in Workflows
Workflow agents can include media generation nodes alongside LLM and tool nodes.
agent = client.agents.create(
name="Video Pipeline",
type="workflow",
config={
"nodes": [
{
"id": "generate_script",
"type": "llm",
"model": "gpt-4o",
"credential_id": "your-openai-credential-id",
"prompt_id": "script-writer-prompt"
},
{
"id": "generate_image",
"type": "media",
"media_provider": "fal",
"media_type": "image_gen",
"model": "fal-ai/flux/schnell",
"credential_id": "your-fal-credential-id"
},
{
"id": "generate_video",
"type": "media",
"media_provider": "runway",
"media_type": "video_from_image",
"model": "gen3a_turbo",
"credential_id": "your-runway-credential-id"
}
],
"edges": [
{ "from": "generate_script", "to": "generate_image" },
{ "from": "generate_image", "to": "generate_video" }
]
}
)Video Polling
Video generation is asynchronous. When a video job is submitted, PromptRails automatically polls the provider for completion using background workers. The polling configuration is:
- Max attempts: 60
- Poll interval: 10 seconds
- Total timeout: ~10 minutes
Once the video is ready, the generated asset is downloaded and stored in your workspace's asset storage.
Asset Storage
All generated media (audio, images, video) is automatically uploaded to S3-compatible storage and tracked as assets. See Assets for details on managing generated media.
Tracing
Media generation spans appear in execution traces with dedicated span kinds:
speech-- TTS and STT operationsimage-- Image generation and editingvideo-- Video generation
Each span includes the provider, model, prompt, output URL, and estimated cost. See Tracing for more details.
Related Topics
- Credentials -- Setting up media provider credentials
- Assets -- Managing generated media files
- Agents -- Agent configuration
- Tracing -- Execution observability