Serverless Overview
WaveSpeedAI Serverless is a planned direction for running custom AI workloads on managed GPU infrastructure. This page is kept as a high-level overview for users researching serverless GPU inference, AI worker deployment, and future WaveSpeedAI infrastructure options.
Serverless is not part of the standard public workflow right now. For current production usage, use the model APIs, web tools, SDKs, and integrations documented elsewhere in WaveSpeedAI Docs.
What Serverless May Support
The goal of a serverless GPU platform is to let teams run custom model workers without managing GPU machines, queues, scaling logic, or deployment infrastructure directly.
If this capability becomes available, it may focus on workflows such as:
| Area | Possible use |
|---|---|
| Custom AI workers | Run project-specific inference code behind an API |
| GPU task orchestration | Queue jobs and route them to available GPU workers |
| Autoscaling | Adjust worker capacity based on demand |
| Batch workloads | Process large numbers of media or AI tasks |
| Private deployments | Isolate custom workloads for enterprise use cases |
Possible Architecture
A future serverless GPU workflow may look like this:
Your app
-> Serverless endpoint
-> Task queue
-> GPU worker
-> Result, webhook, or polling responseThis model is useful when a team needs custom code or private model logic that does not fit a standard hosted model API.
Current Recommended Alternatives
Most users should start with the currently available WaveSpeedAI workflows:
| Need | Recommended page |
|---|---|
| Run hosted image, video, audio, or 3D models | REST API |
| Build with Python | Python SDK |
| Build with JavaScript or TypeScript | JavaScript SDK |
| Use LLMs through an API | LLM Service Overview |
| Test models without code | Web Interface |
Availability
Serverless GPU infrastructure may be offered in the future for selected use cases. Details such as pricing, endpoint creation, worker runtime, supported GPUs, API format, and public availability are not finalized in this documentation.
If your team needs custom AI worker deployment, contact WaveSpeedAI support with your use case, expected workload, model type, latency requirements, and preferred deployment environment.