Supported LLM Models
Use this supported LLM models guide to choose a model family before you build with WaveSpeedAI LLM. For the latest available models, context windows, and token prices, check the LLM Playground before implementation.
How to Choose an LLM Model
| Need | Start with |
|---|---|
| Lower latency | Fast or flash models |
| Strong coding ability | Claude, GPT, Qwen Coder, DeepSeek |
| Long documents | Models with a large context window |
| Creative writing | Claude or GPT-style models |
| Cost-sensitive usage | Compare input and output token prices in the Playground |
| Production reliability | Test two candidate models with your real prompts |
LLM Pricing and Model Details in the Playground
When you select a model in the Playground, check:
| Property | Why it matters |
|---|---|
| Context window | Maximum amount of prompt and conversation history |
| Input price | Cost for prompts, history, and tool context |
| Output price | Cost for generated responses |
| Capabilities | Whether the model supports reasoning, coding, vision, or other modes |
LLM Model ID Format
WaveSpeedAI model IDs use a provider prefix:
provider/model-nameExamples:
anthropic/claude-opus-4.7
openai/gpt-5.5
bytedance-seed/seed-1.6-flash
qwen/qwen3-coder
deepseek/deepseek-chatUse the full model ID in API requests and agent configuration.
Recommended Evaluation Flow
- Pick two or three candidate models from the Playground.
- Test them with prompts that match your actual workflow.
- Compare quality, latency, context size, and token cost.
- Use a cheaper model for simple tasks and a stronger model for complex reasoning or coding.
Cost Notes
LLM billing is token-based:
| Token type | Meaning |
|---|---|
| Input tokens | Your system prompt, user messages, history, and tool context |
| Output tokens | Text generated by the model |
Long conversations can become expensive because previous messages are usually sent again as context. Trim history, summarize old turns, and choose smaller models when quality requirements allow it.
When your app or coding tool sends the same long instructions, examples, tool definitions, or repository context repeatedly, use Prompt Caching to improve cache hit rate and inspect cached-token usage.