Create Serverless Endpoint
Learn how to create and configure a serverless endpoint for your AI workloads.
What is an Endpoint?
An endpoint is a deployment unit that:
- Runs your worker container image
- Processes tasks from a dedicated queue
- Auto-scales based on demand
Create via Console
Step 1: Navigate to Endpoints
- Go to wavespeed.ai/serverless
- Click Endpoints in the left sidebar
- Click Create Endpoint
Step 2: Configure Basic Settings
| Field | Required | Description |
|---|---|---|
| Name | Yes | Unique identifier (lowercase, no spaces) |
| Image | Yes | Docker image URL |
| GPU Spec | Yes | GPU type (e.g., L4-24GB, A100-80GB) |
Step 3: Configure Scaling
| Field | Default | Description |
|---|---|---|
| Min Replicas | 0 | Minimum workers (0 = scale to zero) |
| Max Replicas | 5 | Maximum workers |
| Scale Up Threshold | 1 | Pending tasks to trigger scale up |
| Scale Down Idle Time | 60s | Idle time before scale down |
Step 4: Create
Click Create Endpoint and wait for deployment.
Using Private Images
If your image is in a private registry, you need to add credentials first.
Add Registry Credentials
- Go to Settings in the left sidebar
- Under Registry Credentials, click Add Credential
- Enter:
| Field | Description |
|---|---|
| Name | Friendly name for this credential |
| Registry | Registry URL (e.g., docker.io, gcr.io) |
| Username | Registry username |
| Password | Registry password or token |
- Click Save
Use Credentials in Endpoint
When creating an endpoint with a private image, select the credential from the dropdown.
Endpoint Status
| Status | Description |
|---|---|
| Active | Endpoint is running and accepting tasks |
| Scaling | Workers are scaling up or down |
| Stopped | No workers running (scaled to zero) |
| Error | Deployment error (check logs) |
Best Practices
Naming Conventions
- Use descriptive names:
flux-inference,llm-chat-prod - Include environment:
my-model-dev,my-model-prod
Scaling Configuration
- Development: Min 0, Max 1 (cost-effective)
- Production: Min 1, Max 10 (always available)
- Batch Processing: Min 0, Max 20 (handle bursts)
Resource Allocation
- Start with the smallest GPU that fits your model
- Monitor GPU memory usage and adjust if needed
- Use multi-GPU only if single GPU is insufficient
Managing Endpoints
View Endpoint Details
Click an endpoint name to see:
- Current status and replica count
- Task statistics
- Configuration details
Update Endpoint
- Click the endpoint name
- Click Edit
- Modify settings
- Click Save
Note: Some changes require redeployment.
Delete Endpoint
- Click the endpoint name
- Click Delete
- Confirm deletion
Deleting an endpoint:
- Stops all workers
- Cancels pending tasks
- Removes the endpoint configuration
Related Pages
- Build Worker — Write your handler code
- GPU Pricing — Choose the right GPU
- API Reference — Submit tasks via API