Serverless Overview
Deploy and run your AI workloads on high-performance GPUs with Waverless, WaveSpeedAI’s serverless GPU platform.
What is Waverless?
Waverless is a serverless GPU task orchestration system designed for AI inference and training workloads. It provides on-demand access to powerful GPUs without managing infrastructure.
Key Features
| Feature | Description |
|---|---|
| RunPod Compatible | Zero-code migration from RunPod with compatible API |
| Auto Scaling | Automatically adjusts worker count based on task queue depth |
| Multi-Endpoint | Isolate different applications through separate endpoints |
| Graceful Shutdown | Zero task loss during rolling updates and scale down |
| High Availability | Multi-replica deployment with no single point of failure |
How It Works
1. Create Endpoint → Define your worker image and GPU spec
2. Deploy Workers → Workers auto-scale based on demand
3. Submit Tasks → Send tasks via API
4. Get Results → Receive results via polling or webhookUse Cases
- Custom Model Deployment — Run your own AI models on dedicated GPUs
- Batch Processing — Process large volumes of data in parallel
- Training Workloads — Fine-tune models with on-demand compute
- High-Throughput Inference — Scale inference pipelines automatically
Architecture
Waverless uses a pull-based architecture where workers actively pull tasks from a queue:
- Task Queue — Tasks are queued and distributed to available workers
- Worker Pool — Workers pull tasks, execute them, and return results
- Auto Scaler — Monitors queue depth and adjusts worker count
Getting Started
- View GPU Pricing — See available GPUs and costs
- Quick Start — Get up and running in minutes
- Create Endpoint — Deploy your first endpoint
- Build Worker — Write your handler code
Enterprise Access
Waverless is currently available for enterprise customers. To request access:
- Go to wavespeed.ai/serverless
- Fill out the request form
- Our team will contact you to discuss your use case