Depth Anything Video

Playground

Depth Anything Video estimates depth maps from video input with temporal consistency. Supports multiple model sizes and colormaps. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Features

Wavespeed Depth Anything Video (VDA) is a specialized model designed to estimate dense, pixel-wise depth from monocular video. By transforming standard 2D footage into a grayscale depth map, it provides essential spatial data for 3D reconstruction, augmented reality, and professional visual effects.

Why Choose This?

Temporal Consistency Engineered to maintain depth stability across frames, preventing the “flickering” effect common in frame-by-frame processing.
Scale Flexibility Offers three distinct model sizes to balance between real-time processing speed and high-fidelity depth precision.
Fine-Grained Detail Excellent at capturing thin structures and complex silhouettes, such as foliage or distant architectural elements.
Zero-Shot Generalization Performs reliably across diverse environments, from indoor studios to vast outdoor landscapes, without needing scene-specific tuning.

Parameters

Parameter	Required	Description
video*	Yes	The input video file to process (Drag and drop a file or click to upload).
model	No	Selection of model scale: `VDA-Small`, `VDA-Base`, or `VDA-Large` (Default).

How to Use

Upload your video — Drag and drop your source file into the upload box or provide a direct media link.
Select the model —

VDA-Small: Fastest inference, best for mobile or quick previews.
VDA-Base: Standard balance of speed and accuracy.
VDA-Large: Maximum precision for professional VFX and 3D mapping.

Run — Submit the task to generate and download your depth-encoded video.

Model Comparison

Version	Use Case	Performance
VDA-Small	Real-time applications and low-latency feedback.	Optimized Speed
VDA-Base	General creative projects and social media content.	Balanced
VDA-Large	High-end cinematography and 3D environment scanning.	Best Quality

Best Use Cases

Cinematography & VFX — Create realistic depth-of-field, fog, and volumetric lighting effects in post-production.
3D Scene Reconstruction — Extract spatial data to build point clouds or 3D meshes from 2D video.
AR Occlusion — Enable virtual objects to realistically pass behind physical objects in a video scene.
Motion Graphics — Use depth data as a displacement map for unique visual transitions.

Pro Tips

Check the Histogram: In the output, pure white represents the closest objects to the lens, while black represents the furthest distance.
VDA-Large for Detail: Use the VDA-Large model if your video contains intricate foreground elements like hair or thin wires.
Consistency: Ensure your video has steady lighting for the most accurate depth estimation results.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result

set -euo pipefail

export WAVESPEED_API_KEY="your-api-key"

REQUEST_BODY=$(cat <<'JSON'
{
  "video": "https://interactive-examples.mdn.mozilla.net/media/cc0-videos/flower.mp4",
  "model": "VDA-Large"
}
JSON
)

# 1. Submit the prediction.
SUBMIT_RESPONSE=$(curl --silent --show-error --fail-with-body \
  -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/depth-anything/video" \
  -H "Authorization: Bearer ${WAVESPEED_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "${REQUEST_BODY}")

TASK=$(printf '%s' "${SUBMIT_RESPONSE}" | jq 'if type == "object" and has("data") then .data else . end')
PREDICTION_ID=$(printf '%s' "${TASK}" | jq -r '.id // empty')
if [ -z "${PREDICTION_ID}" ]; then
  printf 'Submission response did not contain a prediction id
' >&2
  exit 1
fi
RESULT_URL=$(printf '%s' "${TASK}" | jq -r '.urls.get // empty')
if [ -z "${RESULT_URL}" ]; then RESULT_URL="https://api.wavespeed.ai/api/v3/predictions/${PREDICTION_ID}/result"; fi

# 2. Poll until the prediction finishes.
while true; do
  RESPONSE=$(curl --silent --show-error --fail-with-body \
    "${RESULT_URL}" \
    -H "Authorization: Bearer ${WAVESPEED_API_KEY}")
  RESULT=$(printf '%s' "${RESPONSE}" | jq 'if type == "object" and has("data") then .data else . end')
  STATUS=$(printf '%s' "${RESULT}" | jq -r '.status // empty')

  case "${STATUS}" in
    completed) printf '%s\n' "${RESULT}" | jq '.outputs'; break ;;
    failed|cancelled|timeout) printf '%s\n' "${RESULT}" | jq . >&2; exit 1 ;;
    created|processing) sleep 2 ;;
    *) printf 'Unexpected status: %s
' "${STATUS}" >&2; exit 1 ;;
  esac
done

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
video	string	Yes		-	The URL of the input video to estimate depth for.
model	string	No	VDA-Large	VDA-Small, VDA-Base, VDA-Large	Depth estimation model size. VDA-Large for best quality, VDA-Small for fastest speed.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Output values, usually URL strings; some models return text strings or structured result objects (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction
data.model	string	Model ID used for the prediction
data.outputs	array<string \| object>	Array of generated outputs (empty when status is not completed). Items are usually URL strings, but may be text strings or structured result objects, depending on the model.
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to poll for the prediction result
data.status	string	Status: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Davinci Magihuman Text To Video Emu 3.5 Image Text To Image