Introducing Alibaba WAN 2.7 Image-to-Video on WaveSpeedAI

Wan 2.7 Image-to-Video: Animate Any Photo Into Cinematic Video With First and Last Frame Control

Static images can tell a story, but motion sells it. Wan 2.7 Image-to-Video, Alibaba’s latest image-to-video generation model now available on WaveSpeedAI, transforms a single reference photo into a cinematic 720p or 1080p clip — with optional audio synchronization, negative prompt control, and the rare ability to lock both the starting and ending frames. For creators, marketers, and developers who need precise visual continuity rather than a “best guess” animation, this release closes one of the biggest gaps in the AI video generation API landscape.

Try it now on the Wan 2.7 Image-to-Video model page.

How Wan 2.7 Image-to-Video Works

Wan 2.7 Image-to-Video is a reference-grounded video diffusion model. You provide a start frame, write a natural-language prompt describing the motion and atmosphere, and the model generates a smooth animated clip that respects the appearance, lighting, and composition of the source image. Unlike pure text-to-video models that hallucinate subjects from scratch, Wan 2.7 anchors the output to the visual identity of your photo — meaning the same character, product, or environment carries from frame one to the final beat.

What makes Wan 2.7 stand out among image-to-video models:

Dual-frame guidance: Supply both an image (start frame) and a last_image (end frame). The model interpolates a coherent motion path between them, giving you scripted transitions instead of guesswork.
Native audio conditioning: Pass an audio track and the generated video will synchronize pacing, rhythm, and mood — useful for music-driven content and lip-aligned scenes.
Resolution flexibility: Choose between 720p for fast standard output or 1080p for premium delivery, all from the same REST endpoint.
Duration control: Generate 5s, 10s, or 15s clips with a single duration parameter, no chunking required.

The technical specs developers care about: required inputs are image and prompt; optional inputs include last_image, audio, negative_prompt, resolution, duration, enable_prompt_expansion, and seed for reproducible results.

Key Features of Wan 2.7 Image-to-Video

Image-grounded generation for visual consistency — Subject identity, clothing, lighting, and background composition are preserved from your reference photo, so brand assets and characters stay on-model.
First and last frame control for narrative precision — Define exactly where a shot begins and ends. This is the feature most missing from competing image-to-video APIs and the reason Wan 2.7 is a strong fit for storyboarded work.
Audio input for music-synced video — Upload a soundtrack or voiceover and the model paces motion to match. No more manually re-editing AI clips to fit a beat.
Negative prompt support for cleaner output — Strip artifacts like blurry faces, distorted hands, or unwanted background motion by listing them in the negative_prompt field.
Prompt expansion for short prompts — Toggle enable_prompt_expansion and the model auto-enriches sparse prompts before generation, ideal for batch pipelines where prompt engineering doesn’t scale.
Up to 1080p output at predictable per-second pricing — Pay only for what you generate, with no minimums and no cold starts on WaveSpeedAI.

Best Use Cases for Wan 2.7 Image-to-Video

Cinematic Photo Animation From a Single Reference

Photographers and creators can take a single still — a portrait, a landscape, a product shot — and produce a 5- to 15-second motion piece without staging a video shoot. Wan 2.7’s reference grounding means the subject in your photo stays recognizably the same, so a wedding portrait becomes a moving keepsake, not a stranger’s face.

Scripted Scene Transitions With Start and End Frames

Storyboard artists, advertisers, and short-film makers can supply a beginning frame and an ending frame and let Wan 2.7 fill in the motion. This turns the model into a controllable “tween” engine for visual narrative — useful for camera moves, character transformations, or before/after product reveals where you need the final frame to land exactly where you specified.

Reels, TikTok, and Shorts reward motion. A brand sitting on a catalog of static product images can convert that library into thumb-stopping vertical video. Combine enable_prompt_expansion with batch API calls and a small social team can publish dozens of animated variants per week without a video editor in the loop.

Music Videos and Audio-Visual Storytelling

The optional audio parameter makes Wan 2.7 a natural fit for indie musicians, podcast clip designers, and lyric-video creators. Drop in a 10-second audio clip alongside a hero image and prompt, and the generated motion follows the rhythm — tightening the production loop from hours to minutes.

Marketing, E-commerce, and Campaign Animation

Promotional emails, paid social ads, and landing-page hero videos all convert better with motion. Wan 2.7 lets a marketer animate an existing campaign asset — a packshot, a model photo, a lifestyle scene — without re-shooting or paying for stock video. Pair it with an end-frame image of your CTA card for a clean, on-brand outro.

Real Estate and Architectural Walkthroughs

Listing photos can be animated into pseudo-walkthrough clips: subtle dolly motion, light shifts, atmospheric movement. With last_image you can guide the camera to settle on a key feature like a fireplace or a view.

Fashion and Beauty Lookbooks

Stills shot for editorial use can be brought to life with hair, fabric, and ambient motion. The negative prompt control is particularly valuable here for excluding the “morphing face” artifact that plagues lower-tier image-to-video models.

Wan 2.7 Image-to-Video Pricing and API Access

Wan 2.7 Image-to-Video on WaveSpeedAI is billed by output duration and resolution:

Duration	720p	1080p
5s	$0.50	$0.75
10s	$1.00	$1.50
15s	$1.50	$2.25

Billing rules are flat per second: $0.10/s at 720p and $0.15/s at 1080p (a 1.5× premium for the higher resolution). There are no subscription tiers or minimum spend.

Calling the model is straightforward via the WaveSpeed Python SDK:

import json
import os
import time
from urllib.request import Request, urlopen

api_key = os.environ["WAVESPEED_API_KEY"]
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
payload = {
    "prompt": "A cinematic ocean wave at sunrise, highly detailed",
    "image": "https://interactive-examples.mdn.mozilla.net/media/cc0-images/painted-hand-298-332.jpg",
    "resolution": "720p",
    "duration": 5,
    "enable_prompt_expansion": False,
    "seed": -1
}

def request_json(url, data=None):
    request = Request(url, data=data, headers=headers, method="POST" if data else "GET")
    with urlopen(request) as response:
        return json.load(response)

# 1. Submit the prediction.
submit_body = request_json("https://api.wavespeed.ai/api/v3/alibaba/wan-2.7/image-to-video", json.dumps(payload).encode())
task = submit_body.get("data", submit_body)
prediction_id = task.get("id")
if not prediction_id:
    raise RuntimeError("Submission response did not contain a prediction id")
result_url = task.get("urls", {}).get("get") or f"https://api.wavespeed.ai/api/v3/predictions/{prediction_id}/result"

# 2. Poll until the prediction finishes.
while True:
    body = request_json(result_url)
    result = body.get("data", body)
    status = result.get("status")
    if status == "completed":
        print(result.get("outputs", []))
        break
    if status in {"failed", "cancelled", "timeout"}:
        raise RuntimeError(result)
    if status not in {"created", "processing"}:
        raise RuntimeError(f"Unexpected status: {status}")
    time.sleep(2)

The same call works against the REST inference API for any language. WaveSpeedAI runs Wan 2.7 with no cold starts, meaning your first request and your thousandth request hit the same warm capacity — important for production workloads with bursty traffic.

If you need text-only generation without a reference image, see the companion Wan 2.7 Text-to-Video model on WaveSpeedAI.

Tips for Best Results With Wan 2.7 Image-to-Video

Start with a high-resolution, well-lit reference image with a clearly visible subject. Low-light or noisy inputs lead to muddier motion.
Always supply a last_image when narrative matters. Even a roughly art-directed end frame dramatically improves motion direction and final-frame composition.
Use negative_prompt aggressively for human subjects. Phrases like “blurry face, extra fingers, warping, text artifacts” routinely improve perceived quality.
Enable prompt expansion for sparse prompts. If your prompt is under ~15 words, turn on enable_prompt_expansion rather than hand-engineering a longer one.
Lock the seed once you find a winning composition and iterate on resolution or duration without losing the look.
Match audio length to duration. A 10-second clip should pair with a 10-second audio file for tightest synchronization.

Wan 2.7 Image-to-Video FAQ

What is Wan 2.7 Image-to-Video? Wan 2.7 Image-to-Video is Alibaba’s reference-grounded video generation model that turns a still image into a 720p or 1080p cinematic clip, with optional audio, negative prompts, and first/last frame control.

How much does Wan 2.7 Image-to-Video cost? Pricing is $0.10 per second at 720p and $0.15 per second at 1080p — for example, $0.50 for a 5-second 720p clip or $2.25 for a 15-second 1080p clip on WaveSpeedAI.

Can I use Wan 2.7 Image-to-Video via API? Yes. Wan 2.7 is available through the WaveSpeedAI REST inference API and the official Python SDK with no cold starts and pay-per-use billing.

Does Wan 2.7 support audio-synced video generation? Yes — pass an audio URL or file and the generated video will pace its motion to match the rhythm and mood of the soundtrack.

How does first and last frame control work? Provide a start frame in the image parameter and an end frame in the optional last_image parameter, and the model interpolates a coherent motion path between them — ideal for storyboarded transitions and scripted shots.

Start Generating With Wan 2.7 Image-to-Video Today

Animate a single photo into a cinematic clip with first/last frame control, audio sync, and 1080p output — without managing GPUs or worrying about cold starts. Try Wan 2.7 Image-to-Video on WaveSpeedAI and ship motion content at API speed.