Create Serverless Endpoint

Learn how to create and configure a serverless endpoint for your AI workloads.

What is an Endpoint?

An endpoint is a deployment unit that:

Runs your worker container image
Processes tasks from a dedicated queue
Auto-scales based on demand

Create via Console

Step 1: Navigate to Endpoints

Go to wavespeed.ai/serverless
Click Endpoints in the left sidebar
Click Create Endpoint

Step 2: Configure Basic Settings

Field	Required	Description
Name	Yes	Unique identifier (lowercase, no spaces)
Image	Yes	Docker image URL
GPU Spec	Yes	GPU type (e.g., L4-24GB, A100-80GB)

Step 3: Configure Scaling

Field	Default	Description
Min Replicas	0	Minimum workers (0 = scale to zero)
Max Replicas	5	Maximum workers
Scale Up Threshold	1	Pending tasks to trigger scale up
Scale Down Idle Time	60s	Idle time before scale down

Step 4: Create

Click Create Endpoint and wait for deployment.

Using Private Images

If your image is in a private registry, you need to add credentials first.

Add Registry Credentials

Go to Settings in the left sidebar
Under Registry Credentials, click Add Credential
Enter:

Field	Description
Name	Friendly name for this credential
Registry	Registry URL (e.g., `docker.io`, `gcr.io`)
Username	Registry username
Password	Registry password or token

Click Save

Use Credentials in Endpoint

When creating an endpoint with a private image, select the credential from the dropdown.

Endpoint Status

Status	Description
Active	Endpoint is running and accepting tasks
Scaling	Workers are scaling up or down
Stopped	No workers running (scaled to zero)
Error	Deployment error (check logs)

Best Practices

Naming Conventions

Use descriptive names: flux-inference, llm-chat-prod
Include environment: my-model-dev, my-model-prod

Scaling Configuration

Development: Min 0, Max 1 (cost-effective)
Production: Min 1, Max 10 (always available)
Batch Processing: Min 0, Max 20 (handle bursts)

Resource Allocation

Start with the smallest GPU that fits your model
Monitor GPU memory usage and adjust if needed
Use multi-GPU only if single GPU is insufficient

Managing Endpoints

View Endpoint Details

Click an endpoint name to see:

Current status and replica count
Task statistics
Configuration details

Update Endpoint

Click the endpoint name
Click Edit
Modify settings
Click Save

Note: Some changes require redeployment.

Delete Endpoint

Click the endpoint name
Click Delete
Confirm deletion

Deleting an endpoint:

Stops all workers
Cancels pending tasks
Removes the endpoint configuration

Build Worker — Write your handler code
GPU Pricing — Choose the right GPU
API Reference — Submit tasks via API

Quick Start Build Worker