Framepack Image-To-Video | Autoregressive Video Generation

FramePack — wavespeed-ai/framepack

FramePack is an image-to-video model designed for smooth, cinematic animation from a single input image. Upload a reference image to anchor composition and subject identity, then use a director-style prompt to control motion, pacing, and camera language (push-in, pull-back, reveals, etc.). FramePack exposes frame-level control via num_frames, making it convenient for generating clips at different lengths while keeping output stable and consistent.

Key capabilities

Image-to-video generation anchored to a reference image
Strong at cinematic camera moves (push-in, pull-back, reveal, orbit, tilt)
Frame-level length control via num_frames for flexible clip duration
Supports negative_prompt to reduce jitter, blur, distortion, and artifacts
Resolution and aspect_ratio controls for common output formats

Use cases

“Living poster” animations: bring key art to life with subtle motion
Cinematic reveals: close-up → pull-back to establish scene context
Mood shots and b-roll from a single still (rain, neon, dust motes, fog, wind)
Trailer-style beats for marketing and social content
Rapid iteration by keeping the same image and varying prompt/seed/frames

Pricing

Pricing scales with the number of frames generated.

Frames	Price per run
60	$0.066
120	$0.132
180	$0.198
240	$0.264

Inputs

image (required): reference image (subject/composition anchor)
prompt (required): motion + camera direction
negative_prompt (optional): what to avoid (blur, jitter, distortion, etc.)

Parameters

image: input image (upload or URL)
prompt: director-style motion description
negative_prompt: optional “avoid list”
aspect_ratio: output aspect ratio (e.g., 16:9)
resolution: output resolution (e.g., 720p)
num_inference_steps: sampling steps
num_frames: total frames to generate (controls clip length)
guidance_scale: prompt adherence strength
seed: random seed (set for reproducible results)

Prompting guide (I2V)

Write prompts like a shot list:

Start framing: close-up / medium / wide
Camera move: push-in / pull-back / pan / orbit
Motion: hair, cloth, rain, particles, light flicker, subtle facial change
Mood/lighting: neon, rim light, fog, bokeh, cinematic contrast
Constraints: keep the subject identity and composition consistent

Example prompts

Animate with a deliberate, unfolding sense of drama. Start with a tight close-up on the eyes, then slowly pull back to reveal the full figure on a rain-slick balcony, neon city lights shimmering in the background, subtle wind and drifting rain, cinematic lighting, smooth camera motion.
Slow push-in on the subject, soft fog rolls through the scene, gentle light flicker, filmic contrast, no jitter, stable face and hands.

Framepack is an efficient autoregressive Image-to-Video model that generates smooth, temporally consistent videos from a single image. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

ExamplesView all

README