Serverless GPU Pricing
Choose from a range of high-performance GPUs for your serverless workloads.
Available GPUs
| GPU | VRAM | Price | CPU | RAM per GPU |
|---|---|---|---|---|
| L4 | 24 GB | $0.55/hr/GPU | 20 | 64 GB |
| A100 | 40 GB | $1.19/hr/GPU | 10 | 128 GB |
| A100 | 80 GB | $1.49/hr/GPU | 10 | 128 GB |
| H100 | 80 GB | $3.07/hr/GPU | 20 | 224 GB |
| H200 | 141 GB | $3.59/hr/GPU | 20 | 224 GB |
| B200 | 180 GB | $5.19/hr/GPU | 20 | 224 GB |
All GPUs support 1-8x configurations per endpoint.
GPU Selection Guide
L4 (24 GB) — Best for Cost-Effective Inference
- Ideal for smaller models and light inference workloads
- Good balance of performance and cost
- Recommended for: Stable Diffusion, small LLMs, image processing
A100 (40/80 GB) — Best for General AI Workloads
- Industry standard for AI training and inference
- High memory bandwidth for large models
- Recommended for: Medium LLMs, video generation, fine-tuning
H100 (80 GB) — Best for High-Performance AI
- Latest generation data center GPU
- Excellent for transformer models
- Recommended for: Large LLMs, high-throughput inference
H200 (141 GB) — Best for Large Models
- Extended memory for very large models
- Ideal when model doesn’t fit in 80 GB
- Recommended for: 70B+ parameter models, long context
B200 (180 GB) — Best for Maximum Performance
- Highest VRAM available
- Ultimate performance for demanding workloads
- Recommended for: Largest models, research workloads
Billing
- Per-GPU-Hour — Billed by the hour per GPU
- No Idle Charges — Pay only when workers are running tasks
- Auto Scale Down — Workers automatically scale to zero when idle
Cost Optimization Tips
- Start Small — Begin with L4 for development, scale up for production
- Right-Size — Choose the smallest GPU that fits your model
- Batch Requests — Group similar tasks to maximize GPU utilization
- Set Scale Limits — Configure max replicas to control costs