Serverless GPU Pricing

Choose from a range of high-performance GPUs for your serverless workloads.

Available GPUs

GPU	VRAM	Price	CPU	RAM per GPU
L4	24 GB	$0.55/hr/GPU	20	64 GB
A100	40 GB	$1.19/hr/GPU	10	128 GB
A100	80 GB	$1.49/hr/GPU	10	128 GB
H100	80 GB	$3.07/hr/GPU	20	224 GB
H200	141 GB	$3.59/hr/GPU	20	224 GB
B200	180 GB	$5.19/hr/GPU	20	224 GB

All GPUs support 1-8x configurations per endpoint.

GPU Selection Guide

L4 (24 GB) — Best for Cost-Effective Inference

Ideal for smaller models and light inference workloads
Good balance of performance and cost
Recommended for: Stable Diffusion, small LLMs, image processing

A100 (40/80 GB) — Best for General AI Workloads

Industry standard for AI training and inference
High memory bandwidth for large models
Recommended for: Medium LLMs, video generation, fine-tuning

H100 (80 GB) — Best for High-Performance AI

Latest generation data center GPU
Excellent for transformer models
Recommended for: Large LLMs, high-throughput inference

H200 (141 GB) — Best for Large Models

Extended memory for very large models
Ideal when model doesn’t fit in 80 GB
Recommended for: 70B+ parameter models, long context

B200 (180 GB) — Best for Maximum Performance

Highest VRAM available
Ultimate performance for demanding workloads
Recommended for: Largest models, research workloads

Billing

Per-GPU-Hour — Billed by the hour per GPU
No Idle Charges — Pay only when workers are running tasks
Auto Scale Down — Workers automatically scale to zero when idle

Cost Optimization Tips

Start Small — Begin with L4 for development, scale up for production
Right-Size — Choose the smallest GPU that fits your model
Batch Requests — Group similar tasks to maximize GPU utilization
Set Scale Limits — Configure max replicas to control costs

Serverless Overview Quick Start