Create Endpoint

Create Serverless Endpoint

Learn how to create and configure a serverless endpoint for your AI workloads.

What is an Endpoint?

An endpoint is a deployment unit that:

  • Runs your worker container image
  • Processes tasks from a dedicated queue
  • Auto-scales based on demand

Create via Console

Step 1: Navigate to Endpoints

  1. Go to wavespeed.ai/serverless
  2. Click Endpoints in the left sidebar
  3. Click Create Endpoint

Step 2: Configure Basic Settings

FieldRequiredDescription
NameYesUnique identifier (lowercase, no spaces)
ImageYesDocker image URL
GPU SpecYesGPU type (e.g., L4-24GB, A100-80GB)

Step 3: Configure Scaling

FieldDefaultDescription
Min Replicas0Minimum workers (0 = scale to zero)
Max Replicas5Maximum workers
Scale Up Threshold1Pending tasks to trigger scale up
Scale Down Idle Time60sIdle time before scale down

Step 4: Create

Click Create Endpoint and wait for deployment.

Using Private Images

If your image is in a private registry, you need to add credentials first.

Add Registry Credentials

  1. Go to Settings in the left sidebar
  2. Under Registry Credentials, click Add Credential
  3. Enter:
FieldDescription
NameFriendly name for this credential
RegistryRegistry URL (e.g., docker.io, gcr.io)
UsernameRegistry username
PasswordRegistry password or token
  1. Click Save

Use Credentials in Endpoint

When creating an endpoint with a private image, select the credential from the dropdown.

Endpoint Status

StatusDescription
ActiveEndpoint is running and accepting tasks
ScalingWorkers are scaling up or down
StoppedNo workers running (scaled to zero)
ErrorDeployment error (check logs)

Best Practices

Naming Conventions

  • Use descriptive names: flux-inference, llm-chat-prod
  • Include environment: my-model-dev, my-model-prod

Scaling Configuration

  • Development: Min 0, Max 1 (cost-effective)
  • Production: Min 1, Max 10 (always available)
  • Batch Processing: Min 0, Max 20 (handle bursts)

Resource Allocation

  • Start with the smallest GPU that fits your model
  • Monitor GPU memory usage and adjust if needed
  • Use multi-GPU only if single GPU is insufficient

Managing Endpoints

View Endpoint Details

Click an endpoint name to see:

  • Current status and replica count
  • Task statistics
  • Configuration details

Update Endpoint

  1. Click the endpoint name
  2. Click Edit
  3. Modify settings
  4. Click Save

Note: Some changes require redeployment.

Delete Endpoint

  1. Click the endpoint name
  2. Click Delete
  3. Confirm deletion

Deleting an endpoint:

  • Stops all workers
  • Cancels pending tasks
  • Removes the endpoint configuration
© 2025 WaveSpeedAI. All rights reserved.