Introducing Alibaba WAN 2.7 Image-to-Video on WaveSpeedAI
WAN 2.7 converts images into videos (720p/1080p) with optional audio, supporting first and last frame control. Ready-to-use REST inference API, best performance
Wan 2.7 Image-to-Video: Animate Any Photo Into Cinematic Video With First and Last Frame Control
Static images can tell a story, but motion sells it. Wan 2.7 Image-to-Video, Alibaba’s latest image-to-video generation model now available on WaveSpeedAI, transforms a single reference photo into a cinematic 720p or 1080p clip — with optional audio synchronization, negative prompt control, and the rare ability to lock both the starting and ending frames. For creators, marketers, and developers who need precise visual continuity rather than a “best guess” animation, this release closes one of the biggest gaps in the AI video generation API landscape.
Try it now on the Wan 2.7 Image-to-Video model page.
How Wan 2.7 Image-to-Video Works
Wan 2.7 Image-to-Video is a reference-grounded video diffusion model. You provide a start frame, write a natural-language prompt describing the motion and atmosphere, and the model generates a smooth animated clip that respects the appearance, lighting, and composition of the source image. Unlike pure text-to-video models that hallucinate subjects from scratch, Wan 2.7 anchors the output to the visual identity of your photo — meaning the same character, product, or environment carries from frame one to the final beat.
What makes Wan 2.7 stand out among image-to-video models:
- Dual-frame guidance: Supply both an
image(start frame) and alast_image(end frame). The model interpolates a coherent motion path between them, giving you scripted transitions instead of guesswork. - Native audio conditioning: Pass an
audiotrack and the generated video will synchronize pacing, rhythm, and mood — useful for music-driven content and lip-aligned scenes. - Resolution flexibility: Choose between 720p for fast standard output or 1080p for premium delivery, all from the same REST endpoint.
- Duration control: Generate 5s, 10s, or 15s clips with a single
durationparameter, no chunking required.
The technical specs developers care about: required inputs are image and prompt; optional inputs include last_image, audio, negative_prompt, resolution, duration, enable_prompt_expansion, and seed for reproducible results.
Key Features of Wan 2.7 Image-to-Video
- Image-grounded generation for visual consistency — Subject identity, clothing, lighting, and background composition are preserved from your reference photo, so brand assets and characters stay on-model.
- First and last frame control for narrative precision — Define exactly where a shot begins and ends. This is the feature most missing from competing image-to-video APIs and the reason Wan 2.7 is a strong fit for storyboarded work.
- Audio input for music-synced video — Upload a soundtrack or voiceover and the model paces motion to match. No more manually re-editing AI clips to fit a beat.
- Negative prompt support for cleaner output — Strip artifacts like blurry faces, distorted hands, or unwanted background motion by listing them in the
negative_promptfield. - Prompt expansion for short prompts — Toggle
enable_prompt_expansionand the model auto-enriches sparse prompts before generation, ideal for batch pipelines where prompt engineering doesn’t scale. - Up to 1080p output at predictable per-second pricing — Pay only for what you generate, with no minimums and no cold starts on WaveSpeedAI.
Best Use Cases for Wan 2.7 Image-to-Video
Cinematic Photo Animation From a Single Reference
Photographers and creators can take a single still — a portrait, a landscape, a product shot — and produce a 5- to 15-second motion piece without staging a video shoot. Wan 2.7’s reference grounding means the subject in your photo stays recognizably the same, so a wedding portrait becomes a moving keepsake, not a stranger’s face.
Scripted Scene Transitions With Start and End Frames
Storyboard artists, advertisers, and short-film makers can supply a beginning frame and an ending frame and let Wan 2.7 fill in the motion. This turns the model into a controllable “tween” engine for visual narrative — useful for camera moves, character transformations, or before/after product reveals where you need the final frame to land exactly where you specified.
Social Media Content at Scale
Reels, TikTok, and Shorts reward motion. A brand sitting on a catalog of static product images can convert that library into thumb-stopping vertical video. Combine enable_prompt_expansion with batch API calls and a small social team can publish dozens of animated variants per week without a video editor in the loop.
Music Videos and Audio-Visual Storytelling
The optional audio parameter makes Wan 2.7 a natural fit for indie musicians, podcast clip designers, and lyric-video creators. Drop in a 10-second audio clip alongside a hero image and prompt, and the generated motion follows the rhythm — tightening the production loop from hours to minutes.
Marketing, E-commerce, and Campaign Animation
Promotional emails, paid social ads, and landing-page hero videos all convert better with motion. Wan 2.7 lets a marketer animate an existing campaign asset — a packshot, a model photo, a lifestyle scene — without re-shooting or paying for stock video. Pair it with an end-frame image of your CTA card for a clean, on-brand outro.
Real Estate and Architectural Walkthroughs
Listing photos can be animated into pseudo-walkthrough clips: subtle dolly motion, light shifts, atmospheric movement. With last_image you can guide the camera to settle on a key feature like a fireplace or a view.
Fashion and Beauty Lookbooks
Stills shot for editorial use can be brought to life with hair, fabric, and ambient motion. The negative prompt control is particularly valuable here for excluding the “morphing face” artifact that plagues lower-tier image-to-video models.
Wan 2.7 Image-to-Video Pricing and API Access
Wan 2.7 Image-to-Video on WaveSpeedAI is billed by output duration and resolution:
| Duration | 720p | 1080p |
|---|---|---|
| 5s | $0.50 | $0.75 |
| 10s | $1.00 | $1.50 |
| 15s | $1.50 | $2.25 |
Billing rules are flat per second: $0.10/s at 720p and $0.15/s at 1080p (a 1.5× premium for the higher resolution). There are no subscription tiers or minimum spend.
Calling the model is straightforward via the WaveSpeed Python SDK:
import wavespeed
output = wavespeed.run(
"alibaba/wan-2.7/image-to-video",
{
"image": "https://example.com/start-frame.jpg",
"prompt": "Slow cinematic dolly-in, golden-hour light, gentle wind in the trees",
"last_image": "https://example.com/end-frame.jpg",
"resolution": "1080p",
"duration": 5,
},
)
print(output["outputs"][0])
The same call works against the REST inference API for any language. WaveSpeedAI runs Wan 2.7 with no cold starts, meaning your first request and your thousandth request hit the same warm capacity — important for production workloads with bursty traffic.
If you need text-only generation without a reference image, see the companion Wan 2.7 Text-to-Video model on WaveSpeedAI.
Tips for Best Results With Wan 2.7 Image-to-Video
- Start with a high-resolution, well-lit reference image with a clearly visible subject. Low-light or noisy inputs lead to muddier motion.
- Always supply a
last_imagewhen narrative matters. Even a roughly art-directed end frame dramatically improves motion direction and final-frame composition. - Use
negative_promptaggressively for human subjects. Phrases like “blurry face, extra fingers, warping, text artifacts” routinely improve perceived quality. - Enable prompt expansion for sparse prompts. If your prompt is under ~15 words, turn on
enable_prompt_expansionrather than hand-engineering a longer one. - Lock the seed once you find a winning composition and iterate on resolution or duration without losing the look.
- Match audio length to duration. A 10-second clip should pair with a 10-second audio file for tightest synchronization.
Wan 2.7 Image-to-Video FAQ
What is Wan 2.7 Image-to-Video? Wan 2.7 Image-to-Video is Alibaba’s reference-grounded video generation model that turns a still image into a 720p or 1080p cinematic clip, with optional audio, negative prompts, and first/last frame control.
How much does Wan 2.7 Image-to-Video cost? Pricing is $0.10 per second at 720p and $0.15 per second at 1080p — for example, $0.50 for a 5-second 720p clip or $2.25 for a 15-second 1080p clip on WaveSpeedAI.
Can I use Wan 2.7 Image-to-Video via API? Yes. Wan 2.7 is available through the WaveSpeedAI REST inference API and the official Python SDK with no cold starts and pay-per-use billing.
Does Wan 2.7 support audio-synced video generation?
Yes — pass an audio URL or file and the generated video will pace its motion to match the rhythm and mood of the soundtrack.
How does first and last frame control work?
Provide a start frame in the image parameter and an end frame in the optional last_image parameter, and the model interpolates a coherent motion path between them — ideal for storyboarded transitions and scripted shots.
Start Generating With Wan 2.7 Image-to-Video Today
Animate a single photo into a cinematic clip with first/last frame control, audio sync, and 1080p output — without managing GPUs or worrying about cold starts. Try Wan 2.7 Image-to-Video on WaveSpeedAI and ship motion content at API speed.



