OpenAI Sora 2 is a state-of-the-art text-to-video model with realistic visuals, accurate physics, synchronized audio, and strong steerability. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
대기 중
$0.4실행당·~25 / $10
Peter and Joe are playing on the grass in a park.
In a 90s documentary-style interview, an old Swedish man sits in a study and says, "I still remember when I was young."
Format & Look Duration 4s; 180° shutter; digital capture emulating 65 mm photochemical contrast; fine grain; subtle halation on speculars; no gate weave. Lenses & Filtration 32 mm / 50 mm spherical primes; Black Pro-Mist 1/4; slight CPL rotation to manage glass reflections on train windows. Grade / Palette Highlights: clean morning sunlight with amber lift. Mids: balanced neutrals with slight teal cast in shadows. Blacks: soft, neutral with mild lift for haze retention. Lighting & Atmosphere Natural sunlight from camera left, low angle (07:30 AM). Bounce: 4×4 ultrabounce silver from trackside. Negative fill from opposite wall. Practical: sodium platform lights on dim fade. Atmos: gentle mist; train exhaust drift through light beam. Location & Framing Urban commuter platform, dawn. Foreground: yellow safety line, coffee cup on bench. Midground: waiting passengers silhouetted in haze. Background: arriving train braking to a stop. Avoid signage or corporate branding. Wardrobe / Props / Extras Main subject: mid-30s traveler, navy coat, backpack slung on one shoulder, holding phone loosely at side. Extras: commuters in muted tones; one cyclist pushing bike. Props: paper coffee cup, rolling luggage, LED departure board (generic destinations). Sound Diegetic only: faint rail screech, train brakes hiss, distant announcement muffled (-20 LUFS), low ambient hum. Footsteps and paper rustle; no score or added foley. Optimized Shot List (2 shots / 4 s total) 0.00–2.40 — “Arrival Drift” (32 mm, shoulder-mounted slow dolly left) Camera slides past platform signage edge; shallow focus reveals traveler mid-frame looking down tracks. Morning light blooms across lens; train headlights flare softly through mist. Purpose: establish setting and tone, hint anticipation. 2.40–4.00 — “Turn and Pause” (50 mm, slow arc in) Cut to tighter over-shoulder arc as train halts; traveler turns slightly toward camera, catching sunlight rim across cheek and phone screen refle
Convenience store entrance after rain; street reflections; meteors streak above. Characters: Night clerk (blue vest) + lone traveler. Action: Clerk hands over hot cocoa; both glance up to watch a meteor; traveler bows in thanks. Camera: Warm interior push-out → meteor reflected in puddle → shoulders-together upshot → rack focus back to cup steam. Look & Lighting: Anime-real blend; clean mirror-wet pavement with cool/warm contrast. Physics & Motion: Stable handoff; believable steam and drips. Audio: Distant city ambience + light electronic pad; soft “Thanks—the road feels closer now.”
Morning above cloud sea; toast-shaped balloons drift. Characters: two travel vloggers in basket. Action: orbiting drone shot; a seagull swoops; balloon makes a tiny “bounce.” Camera: orbit + gentle dolly; autofocus through clouds. Look: bright/clean sky blue; toasted surface texture. Motion: volumetric clouds and consistent lighting. Audio: burner whoosh + soft whistle; line: “Morning from the sky!”
Modern office afternoon, sunlight on desk plants. Character: quiet “ninja” intern; hoodie with a smiley sticker mask. Action: tiptoes to refill coffee; folds tiny paper shuriken reading “Keep going!” for each desk; “shh” to camera. Camera: over-shoulder follow → paper close-up → co-workers’ reactions. Look: realistic with light comedy tone; controlled reflections. Motion: stable interactions, real paper bending. Audio: light percussion + paper rustle; whispered “Shh…”.
Notice — Service Stability
The Sora 2 family is currently unstable. Generations may fall back to alternative models without notice and the service can be temporarily unavailable. OpenAI is also expected to discontinue this model in the future.
If you need an equally capable, stable alternative, we recommend Seedance 2: bytedance/seedance-2.0/text-to-video.
Sora 2 Text-to-Video is OpenAI's text-to-video model purpose-built for scenes featuring multiple distinct characters simultaneously. Describe the scene in natural language, reference your pre-defined character IDs, and the model renders a cohesive, temporally consistent video where every character looks and moves exactly as intended — no manual compositing required.
True multi-character consistency Reference two or more character IDs in a single generation. Each character retains its unique appearance, proportions, and style throughout every frame.
Natural-language scene control Describe interactions, environments, and actions in plain text. The model understands spatial relationships and character dynamics to produce believable compositions.
Flexible aspect ratio support Choose between portrait (720×1280) and landscape (1280×720) orientations to match your target platform.
Scalable duration Generate clips from 4 seconds up to 20 seconds in fixed steps, giving you full control over pacing and output cost.
Production-ready output Delivers smooth, artifact-free motion suitable for marketing content, storytelling, game cinematics, and social media video.
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Text description of the scene, characters, actions, and environment. |
| size | No | Output resolution: 720×1280 (portrait) or 1280×720 (landscape). |
| duration | No | Clip length in seconds. Options: 4, 8, 12, 16, 20. |
| characters | No | List of character IDs to include. Add one or more char_... identifiers. |
| Duration | Cost per Generation |
|---|---|
| 4s | $0.40 |
| 8s | $0.80 |
| 12s | $1.20 |
| 16s | $1.60 |
| 20s | $2.00 |
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/openai/sora-2/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Sora 2 Text To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/openai/sora-2/text-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"size": "720*1280",
"duration": 4
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("openai/sora-2/text-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"size": "720*1280",
"duration": 4
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"openai/sora-2/text-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"size": "720*1280",
"duration": 4
}
)
print(output["outputs"][0]) # → URL of the generated outputSora 2 Text To Video is a OpenAI model for video generation, exposed as a REST API on WaveSpeedAI. OpenAI Sora 2 is a state-of-the-art text-to-video model with realistic visuals, accurate physics, synchronized audio, and strong steerability. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/openai/openai-sora-2-text-to-video.
Sora 2 Text To Video starts at $0.40 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `duration`, `size`, `characters`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/openai/openai-sora-2-text-to-video.
Average end-to-end generation time on WaveSpeedAI is around 141 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (OpenAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.