Sora 2 Text-to-Video
Sora 2 Text-to-Video is OpenAI's text-to-video model purpose-built for scenes featuring multiple distinct characters simultaneously. Describe the scene in natural language, reference your pre-defined character IDs, and the model renders a cohesive, temporally consistent video where every character looks and moves exactly as intended — no manual compositing required.
Why Choose This?
-
True multi-character consistency
Reference two or more character IDs in a single generation. Each character retains its unique appearance, proportions, and style throughout every frame.
-
Natural-language scene control
Describe interactions, environments, and actions in plain text. The model understands spatial relationships and character dynamics to produce believable compositions.
-
Flexible aspect ratio support
Choose between portrait (720×1280) and landscape (1280×720) orientations to match your target platform.
-
Scalable duration
Generate clips from 4 seconds up to 20 seconds in fixed steps, giving you full control over pacing and output cost.
-
Production-ready output
Delivers smooth, artifact-free motion suitable for marketing content, storytelling, game cinematics, and social media video.
Parameters
| Parameter | Required | Description |
|---|
| prompt | Yes | Text description of the scene, characters, actions, and environment. |
| size | No | Output resolution: 720×1280 (portrait) or 1280×720 (landscape). |
| duration | No | Clip length in seconds. Options: 4, 8, 12, 16, 20. |
| characters | No | List of character IDs to include. Add one or more char_... identifiers. |
How to Use
- Write your prompt — describe what the characters are doing and where the scene takes place.
- Select size — portrait (720×1280) for mobile/social, landscape (1280×720) for widescreen.
- Set duration — choose 4, 8, 12, 16, or 20 seconds based on your scene length.
- Add character IDs — click Add Item under the characters section to include each character by their unique identifier.
- Submit — generate, preview, and download your video.
Pricing
| Duration | Cost per Generation |
|---|
| 4s | $0.40 |
| 8s | $0.80 |
| 12s | $1.20 |
| 16s | $1.60 |
| 20s | $2.00 |
Billing Rules
- Rate: $0.10 per second
- Duration options: 4, 8, 12, 16, or 20 seconds
- Billing is based on the selected duration, not actual playback length
Best Use Cases
- Brand & Marketing Videos — Feature multiple characters or spokespeople in a single scene without manual compositing.
- Social Media Content — Produce portrait-format multi-character clips optimized for Reels, TikTok, and Shorts.
- Game & IP Storytelling — Render in-world scenes with established characters maintaining consistent visual identity.
- Educational & Explainer Content — Animate two or more characters interacting to illustrate concepts or narratives.
- Advertising & Campaigns — Generate diverse cast scenarios rapidly for A/B testing creative variations.
Pro Tips
- Be specific about character positions and actions in your prompt for better spatial composition.
- Use portrait mode (720×1280) for mobile-first platforms and landscape (1280×720) for cinematic or desktop use.
- Start with a 4-second generation to validate composition and character rendering before committing to a longer duration.
- Ensure all referenced character IDs are valid and accessible in your account before submitting.
Notes
- Character IDs must be created and saved in advance — this model references existing character profiles and does not create new definitions.
- Only prompt is a required field; size, duration, and characters are optional.
- Complex multi-character scenes benefit from concise, clearly structured prompts.
Related Models
- Sora 2 Characters — Create and save reusable character IDs for use in this model.