
First & Last Frame Video — Controlled AI Transitions API
Define the beginning and the end — let AI create the journey. Upload start and end images and let AI generate smooth, controlled transitions using Kling and Luma models.
Controlled AI Video Transitions
First and Last Frame video generation gives you goal-oriented control — define where the video starts and ends, and AI fills the gap with natural motion.
Dual-Image Conditioning
Upload both a start frame and an end frame. The AI analyzes both inputs and generates intermediate frames that create a seamless, physically plausible transition between the two keyframes.

Goal-Oriented Video Control
Unlike standard image-to-video which only controls the start, First and Last Frame ensures the video ends exactly where you want. Critical for storytelling, editing, and precise visual narratives.

Multi-Model Keyframe Support
WaveSpeed aggregates the best models that offer precise start and end frame conditioning — including Kling and Luma. Choose the right model for your specific transition needs.

First & Last Frame on WaveSpeed vs. Standard Image-to-Video
See why creators choose keyframe-controlled video on WaveSpeed over standard methods.
Performance at a Glance
First and Last Frame video on WaveSpeed delivers controlled, reliable transitions at scale.
Examples

Young woman turning to smile at camera, breeze catching her scarf, soft bokeh background.

Dancer performing a graceful pirouette, flowing dress creating motion trails, spotlight.

Butterfly emerging from chrysalis in close-up, wings slowly unfurling, soft natural light.

Detective walking through foggy city streets, trench coat collar up, film noir atmosphere.
Integrate in Minutes
Production-ready SDKs for Python and JavaScript. REST API with full OpenAPI spec. Webhook support for async jobs.
- Dual-image input for start and end frame
- Multiple keyframe-capable models available
- Python & JavaScript SDKs + REST API
Get Any Tool You Want
1000+ models across image, video, audio, and 3D — all through one API.
FAQ
It is a video generation technique where the user provides both the starting image and the ending image. The AI's job is to generate the intermediate frames (interpolation) to create a video that starts exactly at image A and ends exactly at image B.
Standard Image-to-Video only lets you control the start. The ending is unpredictable. First and Last Frame gives you "Goal-Oriented" control, ensuring the video ends exactly where you want it to, which is crucial for storytelling and editing.
Yes, but the result will be a "morph" or a surreal transition. For realistic video, the two images should be logically connected (e.g., same character in different poses, or same room with different lighting).
Most models support 5 to 10 seconds of generation between frames. For longer sequences, you would generate multiple "bridges" (A to B, then B to C) and stitch them together.
Yes. The text prompt tells the AI the context of the change. If you have a Start Frame of a man standing and an End Frame of him sitting, the prompt "The man sits down slowly" helps the AI generate the correct motion.

