OpenAI Sora 2 Pro Text-to-Video
OpenAI Sora 2 Pro is a state-of-the-art text-to-video model that generates high-quality videos with realistic physics, synchronized audio, and strong steerability. Create cinematic videos from text prompts with multiple resolution and duration options.
Model Highlights
- Physics-aware motion with realistic contact, inertia, and momentum
- Temporal consistency with stable identities and clean frame transitions
- Synchronized audio with lip-sync alignment and ambient sounds
- High-frequency detail preserving fine textures
- Complex scene reasoning with multiple subjects and depth handling
- Cinematic camera movements without warping artifacts
- Wide stylistic range from photoreal to anime and 3D
- Strong steerability responding to prompt edits and control settings
Parameters
- prompt (required): Text description of the scene, style, camera, and audio cues
- size (optional): Output resolution
- 7201280 or 1280720 (720p) - default
- 10241792 or 17921024 (1024p)
- 10801920 or 19201080 (1080p)
- duration (optional): Video length in seconds (4, 8, 12, 16, or 20 seconds, default: 4)
Use Cases
- Cinematic video production from text descriptions
- Marketing and promotional video content
- Social media video creation
- Concept visualization and storyboarding
- Creative storytelling with synchronized audio
- Product demonstrations and explainer videos
Pricing
Pricing is per second based on output resolution:
| Size | Output Resolution | Price per Second |
|---|
| 720p | Portrait: 720x1280, Landscape: 1280x720 | $0.30 |
| 1024p | Portrait: 1024x1792, Landscape: 1792x1024 | $0.50 |
| 1080p | Portrait: 1080x1920, Landscape: 1920x1080 | $0.70 |
Examples
| Resolution | Duration | Total Cost |
|---|
| 720p | 4s | $1.20 |
| 720p | 8s | $2.40 |
| 720p | 20s | $6.00 |
| 1024p | 4s | $2.00 |
| 1024p | 20s | $10.00 |
| 1080p | 4s | $2.80 |
| 1080p | 20s | $14.00 |
Billing Rules
- Pricing scales linearly with duration
- Duration options: 4, 8, 12, 16, or 20 seconds
How to Use
- Write your prompt describing scene, style, camera, and audio cues
- Select output resolution (720p, 1024p, or 1080p)
- Choose duration (4, 8, 12, 16, or 20 seconds)
- Submit via REST API endpoint
- Preview and download your generated video
API Integration
Simple REST API with text-to-video generation. The model processes your text prompt and generates high-quality video with synchronized audio, realistic physics, and cinematic visuals.
Notes
- Prompt is the only required field
- Default resolution is 1280*720 (720p landscape)
- Default duration is 4 seconds
- Higher resolutions increase generation cost
- Longer durations scale linearly with cost
- Follow content guidelines for appropriate use