Browse ModelsWavespeed AIDepth Anything Video

Depth Anything Video

Depth Anything Video

Playground

Try it on WavespeedAI!

Depth Anything Video estimates depth maps from video input with temporal consistency. Supports multiple model sizes and colormaps. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Features

Wavespeed Depth Anything Video

Wavespeed Depth Anything Video (VDA) is a specialized model designed to estimate dense, pixel-wise depth from monocular video. By transforming standard 2D footage into a grayscale depth map, it provides essential spatial data for 3D reconstruction, augmented reality, and professional visual effects.


Why Choose This?

  • Temporal Consistency Engineered to maintain depth stability across frames, preventing the “flickering” effect common in frame-by-frame processing.
  • Scale Flexibility Offers three distinct model sizes to balance between real-time processing speed and high-fidelity depth precision.
  • Fine-Grained Detail Excellent at capturing thin structures and complex silhouettes, such as foliage or distant architectural elements.
  • Zero-Shot Generalization Performs reliably across diverse environments, from indoor studios to vast outdoor landscapes, without needing scene-specific tuning.

Parameters

ParameterRequiredDescription
video*YesThe input video file to process (Drag and drop a file or click to upload).
modelNoSelection of model scale: VDA-Small, VDA-Base, or VDA-Large (Default).

How to Use

  1. Upload your video — Drag and drop your source file into the upload box or provide a direct media link.
  2. Select the model
  • VDA-Small: Fastest inference, best for mobile or quick previews.
  • VDA-Base: Standard balance of speed and accuracy.
  • VDA-Large: Maximum precision for professional VFX and 3D mapping.
  1. Run — Submit the task to generate and download your depth-encoded video.

Model Comparison

VersionUse CasePerformance
VDA-SmallReal-time applications and low-latency feedback.Optimized Speed
VDA-BaseGeneral creative projects and social media content.Balanced
VDA-LargeHigh-end cinematography and 3D environment scanning.Best Quality

Best Use Cases

  • Cinematography & VFX — Create realistic depth-of-field, fog, and volumetric lighting effects in post-production.
  • 3D Scene Reconstruction — Extract spatial data to build point clouds or 3D meshes from 2D video.
  • AR Occlusion — Enable virtual objects to realistically pass behind physical objects in a video scene.
  • Motion Graphics — Use depth data as a displacement map for unique visual transitions.

Pro Tips

  • Check the Histogram: In the output, pure white represents the closest objects to the lens, while black represents the furthest distance.
  • VDA-Large for Detail: Use the VDA-Large model if your video contains intricate foreground elements like hair or thin wires.
  • Consistency: Ensure your video has steady lighting for the most accurate depth estimation results.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/depth-anything-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "model": "VDA-Large"
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
videostringYes-The URL of the input video to estimate depth for.
modelstringNoVDA-LargeVDA-Small, VDA-Base, VDA-LargeDepth estimation model size. VDA-Large for best quality, VDA-Small for fastest speed.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.