GPU Pricing

Serverless GPU Pricing

Choose from a range of high-performance GPUs for your serverless workloads.

Available GPUs

GPUVRAMPriceCPURAM per GPU
L424 GB$0.55/hr/GPU2064 GB
A10040 GB$1.19/hr/GPU10128 GB
A10080 GB$1.49/hr/GPU10128 GB
H10080 GB$3.07/hr/GPU20224 GB
H200141 GB$3.59/hr/GPU20224 GB
B200180 GB$5.19/hr/GPU20224 GB

All GPUs support 1-8x configurations per endpoint.

GPU Selection Guide

L4 (24 GB) — Best for Cost-Effective Inference

  • Ideal for smaller models and light inference workloads
  • Good balance of performance and cost
  • Recommended for: Stable Diffusion, small LLMs, image processing

A100 (40/80 GB) — Best for General AI Workloads

  • Industry standard for AI training and inference
  • High memory bandwidth for large models
  • Recommended for: Medium LLMs, video generation, fine-tuning

H100 (80 GB) — Best for High-Performance AI

  • Latest generation data center GPU
  • Excellent for transformer models
  • Recommended for: Large LLMs, high-throughput inference

H200 (141 GB) — Best for Large Models

  • Extended memory for very large models
  • Ideal when model doesn’t fit in 80 GB
  • Recommended for: 70B+ parameter models, long context

B200 (180 GB) — Best for Maximum Performance

  • Highest VRAM available
  • Ultimate performance for demanding workloads
  • Recommended for: Largest models, research workloads

Billing

  • Per-GPU-Hour — Billed by the hour per GPU
  • No Idle Charges — Pay only when workers are running tasks
  • Auto Scale Down — Workers automatically scale to zero when idle

Cost Optimization Tips

  1. Start Small — Begin with L4 for development, scale up for production
  2. Right-Size — Choose the smallest GPU that fits your model
  3. Batch Requests — Group similar tasks to maximize GPU utilization
  4. Set Scale Limits — Configure max replicas to control costs
© 2025 WaveSpeedAI. All rights reserved.