Depth Anything Video
Playground
Try it on WavespeedAI!Depth Anything Video estimates depth maps from video input with temporal consistency. Supports multiple model sizes and colormaps. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
Features
Wavespeed Depth Anything Video
Wavespeed Depth Anything Video (VDA) is a specialized model designed to estimate dense, pixel-wise depth from monocular video. By transforming standard 2D footage into a grayscale depth map, it provides essential spatial data for 3D reconstruction, augmented reality, and professional visual effects.
Why Choose This?
- Temporal Consistency Engineered to maintain depth stability across frames, preventing the “flickering” effect common in frame-by-frame processing.
- Scale Flexibility Offers three distinct model sizes to balance between real-time processing speed and high-fidelity depth precision.
- Fine-Grained Detail Excellent at capturing thin structures and complex silhouettes, such as foliage or distant architectural elements.
- Zero-Shot Generalization Performs reliably across diverse environments, from indoor studios to vast outdoor landscapes, without needing scene-specific tuning.
Parameters
| Parameter | Required | Description |
|---|---|---|
| video* | Yes | The input video file to process (Drag and drop a file or click to upload). |
| model | No | Selection of model scale: VDA-Small, VDA-Base, or VDA-Large (Default). |
How to Use
- Upload your video — Drag and drop your source file into the upload box or provide a direct media link.
- Select the model —
- VDA-Small: Fastest inference, best for mobile or quick previews.
- VDA-Base: Standard balance of speed and accuracy.
- VDA-Large: Maximum precision for professional VFX and 3D mapping.
- Run — Submit the task to generate and download your depth-encoded video.
Model Comparison
| Version | Use Case | Performance |
|---|---|---|
| VDA-Small | Real-time applications and low-latency feedback. | Optimized Speed |
| VDA-Base | General creative projects and social media content. | Balanced |
| VDA-Large | High-end cinematography and 3D environment scanning. | Best Quality |
Best Use Cases
- Cinematography & VFX — Create realistic depth-of-field, fog, and volumetric lighting effects in post-production.
- 3D Scene Reconstruction — Extract spatial data to build point clouds or 3D meshes from 2D video.
- AR Occlusion — Enable virtual objects to realistically pass behind physical objects in a video scene.
- Motion Graphics — Use depth data as a displacement map for unique visual transitions.
Pro Tips
- Check the Histogram: In the output, pure white represents the closest objects to the lens, while black represents the furthest distance.
- VDA-Large for Detail: Use the VDA-Large model if your video contains intricate foreground elements like hair or thin wires.
- Consistency: Ensure your video has steady lighting for the most accurate depth estimation results.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/depth-anything-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"model": "VDA-Large"
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| video | string | Yes | - | The URL of the input video to estimate depth for. | |
| model | string | No | VDA-Large | VDA-Small, VDA-Base, VDA-Large | Depth estimation model size. VDA-Large for best quality, VDA-Small for fastest speed. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |