LLM Service Overview

Access large language models through WaveSpeedAI’s unified API.

What is LLM Service?

WaveSpeedAI provides access to multiple large language models (LLMs) from different providers through a single API. Instead of managing multiple API keys and integrations, use one platform to access models from OpenAI, Google, ByteDance, Mistral, and more.

Features

Feature	Description
Multiple providers	Access OpenAI, Google, ByteDance, Mistral, NVIDIA models
Web Playground	Test models directly at wavespeed.ai/llm
Streaming	Real-time response streaming
Enable Thinking	Some models support reasoning/thinking mode
View Code	Generate API code directly from the Playground

Web Playground

Try LLMs directly in your browser:

Go to wavespeed.ai/llm
Select a model from the dropdown
Configure parameters (temperature, max_tokens, etc.)
Type your message and start chatting
Click View Code to get the API code for your configuration

Available Parameters

Parameter	Description	Range
`max_tokens`	Maximum response length	Up to 16,384
`temperature`	Creativity/randomness	0.0 - 2.0
`top_p`	Nucleus sampling	0.0 - 1.0
`top_k`	Top-k sampling	1 - 100
`presence_penalty`	Penalize repeated topics	-2.0 - 2.0
`frequency_penalty`	Penalize repeated words	-2.0 - 2.0

Use Cases

Use Case	Description
Chatbots	Build conversational AI assistants
Content Generation	Write articles, marketing copy, stories
Code Generation	Generate, explain, and debug code
Analysis	Summarize documents, extract information
Translation	Translate between languages
Reasoning	Complex problem solving with thinking mode

Pricing

LLM pricing is based on tokens:

Token Type	Description
Input tokens	Your messages and prompts
Output tokens	Model’s response

Example pricing (varies by model):

Input: $0.0750 / 1M tokens
Output: $0.3000 / 1M tokens

Check the Playground for each model’s specific pricing.

Next Steps

LLM Quick Start — Start using LLMs
Supported LLM Models — Available models and pricing

Complete Workflow Tutorial LLM Quick Start