Скидка 50% на модели Vidu Q3 и Q3 Pro · только на WaveSpeedAI | 20 мая – 2 июня

Sora 2 Text to Video

openai /

OpenAI Sora 2 is a state-of-the-art text-to-video model with realistic visuals, accurate physics, synchronized audio, and strong steerability. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video
Ввод
id
id

Ожидание

$0.4за запуск·~25 / $10

Далее:

ПримерыСмотреть всё

Peter and Joe are playing on the grass in a park.

In a 90s documentary-style interview, an old Swedish man sits in a study and says, "I still remember when I was young."

Format & Look Duration 4s; 180° shutter; digital capture emulating 65 mm photochemical contrast; fine grain; subtle halation on speculars; no gate weave. Lenses & Filtration 32 mm / 50 mm spherical primes; Black Pro-Mist 1/4; slight CPL rotation to manage glass reflections on train windows. Grade / Palette Highlights: clean morning sunlight with amber lift. Mids: balanced neutrals with slight teal cast in shadows. Blacks: soft, neutral with mild lift for haze retention. Lighting & Atmosphere Natural sunlight from camera left, low angle (07:30 AM). Bounce: 4×4 ultrabounce silver from trackside. Negative fill from opposite wall. Practical: sodium platform lights on dim fade. Atmos: gentle mist; train exhaust drift through light beam. Location & Framing Urban commuter platform, dawn. Foreground: yellow safety line, coffee cup on bench. Midground: waiting passengers silhouetted in haze. Background: arriving train braking to a stop. Avoid signage or corporate branding. Wardrobe / Props / Extras Main subject: mid-30s traveler, navy coat, backpack slung on one shoulder, holding phone loosely at side. Extras: commuters in muted tones; one cyclist pushing bike. Props: paper coffee cup, rolling luggage, LED departure board (generic destinations). Sound Diegetic only: faint rail screech, train brakes hiss, distant announcement muffled (-20 LUFS), low ambient hum. Footsteps and paper rustle; no score or added foley. Optimized Shot List (2 shots / 4 s total) 0.00–2.40 — “Arrival Drift” (32 mm, shoulder-mounted slow dolly left) Camera slides past platform signage edge; shallow focus reveals traveler mid-frame looking down tracks. Morning light blooms across lens; train headlights flare softly through mist. Purpose: establish setting and tone, hint anticipation. 2.40–4.00 — “Turn and Pause” (50 mm, slow arc in) Cut to tighter over-shoulder arc as train halts; traveler turns slightly toward camera, catching sunlight rim across cheek and phone screen refle

Convenience store entrance after rain; street reflections; meteors streak above. Characters: Night clerk (blue vest) + lone traveler. Action: Clerk hands over hot cocoa; both glance up to watch a meteor; traveler bows in thanks. Camera: Warm interior push-out → meteor reflected in puddle → shoulders-together upshot → rack focus back to cup steam. Look & Lighting: Anime-real blend; clean mirror-wet pavement with cool/warm contrast. Physics & Motion: Stable handoff; believable steam and drips. Audio: Distant city ambience + light electronic pad; soft “Thanks—the road feels closer now.”

Morning above cloud sea; toast-shaped balloons drift. Characters: two travel vloggers in basket. Action: orbiting drone shot; a seagull swoops; balloon makes a tiny “bounce.” Camera: orbit + gentle dolly; autofocus through clouds. Look: bright/clean sky blue; toasted surface texture. Motion: volumetric clouds and consistent lighting. Audio: burner whoosh + soft whistle; line: “Morning from the sky!”

Modern office afternoon, sunlight on desk plants. Character: quiet “ninja” intern; hoodie with a smiley sticker mask. Action: tiptoes to refill coffee; folds tiny paper shuriken reading “Keep going!” for each desk; “shh” to camera. Camera: over-shoulder follow → paper close-up → co-workers’ reactions. Look: realistic with light comedy tone; controlled reflections. Motion: stable interactions, real paper bending. Audio: light percussion + paper rustle; whispered “Shh…”.

Похожие модели

README

Sora 2 Text-to-Video

Notice — Service Stability

The Sora 2 family is currently unstable. Generations may fall back to alternative models without notice and the service can be temporarily unavailable. OpenAI is also expected to discontinue this model in the future.

If you need an equally capable, stable alternative, we recommend Seedance 2: bytedance/seedance-2.0/text-to-video.

Sora 2 Text-to-Video

Sora 2 Text-to-Video is OpenAI's text-to-video model purpose-built for scenes featuring multiple distinct characters simultaneously. Describe the scene in natural language, reference your pre-defined character IDs, and the model renders a cohesive, temporally consistent video where every character looks and moves exactly as intended — no manual compositing required.

Why Choose This?

  • True multi-character consistency Reference two or more character IDs in a single generation. Each character retains its unique appearance, proportions, and style throughout every frame.

  • Natural-language scene control Describe interactions, environments, and actions in plain text. The model understands spatial relationships and character dynamics to produce believable compositions.

  • Flexible aspect ratio support Choose between portrait (720×1280) and landscape (1280×720) orientations to match your target platform.

  • Scalable duration Generate clips from 4 seconds up to 20 seconds in fixed steps, giving you full control over pacing and output cost.

  • Production-ready output Delivers smooth, artifact-free motion suitable for marketing content, storytelling, game cinematics, and social media video.

Parameters

ParameterRequiredDescription
promptYesText description of the scene, characters, actions, and environment.
sizeNoOutput resolution: 720×1280 (portrait) or 1280×720 (landscape).
durationNoClip length in seconds. Options: 4, 8, 12, 16, 20.
charactersNoList of character IDs to include. Add one or more char_... identifiers.

How to Use

  1. Write your prompt — describe what the characters are doing and where the scene takes place.
  2. Select size — portrait (720×1280) for mobile/social, landscape (1280×720) for widescreen.
  3. Set duration — choose 4, 8, 12, 16, or 20 seconds based on your scene length.
  4. Add character IDs — click Add Item under the characters section to include each character by their unique identifier.
  5. Submit — generate, preview, and download your video.

Pricing

DurationCost per Generation
4s$0.40
8s$0.80
12s$1.20
16s$1.60
20s$2.00

Billing Rules

  • Rate: $0.10 per second
  • Duration options: 4, 8, 12, 16, or 20 seconds
  • Billing is based on the selected duration, not actual playback length

Best Use Cases

  • Brand & Marketing Videos — Feature multiple characters or spokespeople in a single scene without manual compositing.
  • Social Media Content — Produce portrait-format multi-character clips optimized for Reels, TikTok, and Shorts.
  • Game & IP Storytelling — Render in-world scenes with established characters maintaining consistent visual identity.
  • Educational & Explainer Content — Animate two or more characters interacting to illustrate concepts or narratives.
  • Advertising & Campaigns — Generate diverse cast scenarios rapidly for A/B testing creative variations.

Pro Tips

  • Be specific about character positions and actions in your prompt for better spatial composition.
  • Use portrait mode (720×1280) for mobile-first platforms and landscape (1280×720) for cinematic or desktop use.
  • Start with a 4-second generation to validate composition and character rendering before committing to a longer duration.
  • Ensure all referenced character IDs are valid and accessible in your account before submitting.

Notes

  • Character IDs must be created and saved in advance — this model references existing character profiles and does not create new definitions.
  • Only prompt is a required field; size, duration, and characters are optional.
  • Complex multi-character scenes benefit from concise, clearly structured prompts.

Related Models

  • Sora 2 Characters — Create and save reusable character IDs for use in this model.
Доступность:Этот сайт использует модели ИИ, предоставляемые третьими лицами.

Sora 2 Text To Video API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/openai/sora-2/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Sora 2 Text To Video below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/openai/sora-2/text-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "size": "720*1280",
    "duration": 4
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("openai/sora-2/text-to-video", {
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "size": "720*1280",
        "duration": 4
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "openai/sora-2/text-to-video",
    {
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "size": "720*1280",
    "duration": 4
}
)

print(output["outputs"][0])  # → URL of the generated output

Sora 2 Text To Video API — Frequently asked questions

What is the Sora 2 Text To Video API?

Sora 2 Text To Video is a OpenAI model for video generation, exposed as a REST API on WaveSpeedAI. OpenAI Sora 2 is a state-of-the-art text-to-video model with realistic visuals, accurate physics, synchronized audio, and strong steerability. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Sora 2 Text To Video API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/openai/openai-sora-2-text-to-video.

How much does Sora 2 Text To Video cost per run?

Sora 2 Text To Video starts at $0.40 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Sora 2 Text To Video accept?

Key inputs: `prompt`, `duration`, `size`, `characters`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/openai/openai-sora-2-text-to-video.

How long does Sora 2 Text To Video take to generate?

Average end-to-end generation time on WaveSpeedAI is around 141 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Sora 2 Text To Video outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (OpenAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.