Nano Banana Pro | Nano Banana 2Mar.13 - 26 (UTC+8) 25% off
WaveSpeed.ai
首頁/探索/openai/sora-2-pro/text-to-video
text-to-video

text-to-video

OpenAI Sora 2

openai/sora-2-pro/text-to-video

OpenAI Sora 2 Pro is a state-of-the-art text-to-video model with realistic physics, synchronized audio, and strong steerability. Supports multiple resolutions up to 1080p and durations up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Idle

您的請求將花費 $1.2 每次運行。

還有一件事:

README

OpenAI Sora 2 Pro Text-to-Video

OpenAI Sora 2 Pro is a state-of-the-art text-to-video model that generates high-quality videos with realistic physics, synchronized audio, and strong steerability. Create cinematic videos from text prompts with multiple resolution and duration options.

Model Highlights

  • Physics-aware motion with realistic contact, inertia, and momentum
  • Temporal consistency with stable identities and clean frame transitions
  • Synchronized audio with lip-sync alignment and ambient sounds
  • High-frequency detail preserving fine textures
  • Complex scene reasoning with multiple subjects and depth handling
  • Cinematic camera movements without warping artifacts
  • Wide stylistic range from photoreal to anime and 3D
  • Strong steerability responding to prompt edits and control settings

Parameters

  • prompt (required): Text description of the scene, style, camera, and audio cues
  • size (optional): Output resolution
    • 7201280 or 1280720 (720p) - default
    • 10241792 or 17921024 (1024p)
    • 10801920 or 19201080 (1080p)
  • duration (optional): Video length in seconds (4, 8, 12, 16, or 20 seconds, default: 4)

Use Cases

  • Cinematic video production from text descriptions
  • Marketing and promotional video content
  • Social media video creation
  • Concept visualization and storyboarding
  • Creative storytelling with synchronized audio
  • Product demonstrations and explainer videos

Pricing

Pricing is per second based on output resolution:

SizeOutput ResolutionPrice per Second
720pPortrait: 720x1280, Landscape: 1280x720$0.30
1024pPortrait: 1024x1792, Landscape: 1792x1024$0.50
1080pPortrait: 1080x1920, Landscape: 1920x1080$0.70

Examples

ResolutionDurationTotal Cost
720p4s$1.20
720p8s$2.40
720p20s$6.00
1024p4s$2.00
1024p20s$10.00
1080p4s$2.80
1080p20s$14.00

Billing Rules

  • Pricing scales linearly with duration
  • Duration options: 4, 8, 12, 16, or 20 seconds

How to Use

  1. Write your prompt describing scene, style, camera, and audio cues
  2. Select output resolution (720p, 1024p, or 1080p)
  3. Choose duration (4, 8, 12, 16, or 20 seconds)
  4. Submit via REST API endpoint
  5. Preview and download your generated video

API Integration

Simple REST API with text-to-video generation. The model processes your text prompt and generates high-quality video with synchronized audio, realistic physics, and cinematic visuals.

Notes

  • Prompt is the only required field
  • Default resolution is 1280*720 (720p landscape)
  • Default duration is 4 seconds
  • Higher resolutions increase generation cost
  • Longer durations scale linearly with cost
  • Follow content guidelines for appropriate use