qwen/qwen3.6-max-preview
262,144 context · $1.30/M input tokens · $7.80/M output tokens
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse Mixture-of-Experts architecture with approximately 1T total parameters. It is optimized for high-end reasoning, agentic coding, tool use, instruction following, and complex text generation workflows. The model supports a 262K-token context window, up to 64K output tokens, thinking mode, function calling, and structured outputs, making it suitable for demanding production tasks that require stronger reasoning capability over raw throughput.
Pay-per-use
No upfront costs, pay only for what you use
Use the following code examples to integrate with our API:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="qwen/qwen3.6-max-preview",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse Mixture-of-Experts architecture with approximately 1T total parameters. It is optimized for high-end reasoning, agentic coding, tool use, instruction following, and complex text generation workflows.
| Specification | Value |
|---|---|
| Provider | alibaba |
| Model Type | Chat Completions model |
| Architecture | Sparse Mixture-of-Experts |
| Parameters | approximately 1T |
| Modalities | text->text |
| Context Window | 262,144 tokens |
| Max Input | 196,608 tokens |
| Max Output | 65,536 tokens |
| Thinking Budget | 128K tokens |
| Input | Text |
| Output | Text |
| Vision | Not listed |
| Function Calling | Supported |
| Structured Outputs | Supported |
| Thinking Mode | Supported |
| Release | April 2026 |
| Token Type | Cost |
|---|---|
| Input | $1.04 per million tokens |
| Output | $6.24 per million tokens |
| Cache Write | $1.30 per million tokens |
Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: qwen/qwen3.6-max-preview
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="qwen/qwen3.6-max-preview",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
curl https://llm.wavespeed.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "qwen/qwen3.6-max-preview",
"messages": [{"role": "user", "content": "Hello!"}]
}'
qwen/qwen3.6-max-preview
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse Mixture-of-Experts architecture with approximately 1T total parameters. It is optimized for high-end reasoning, agentic coding, tool use, instruction following, and complex text generation workflows. The model supports a 262K-token context window, up to 64K output tokens, thinking mode, function calling, and structured outputs, making it suitable for demanding production tasks that require stronger reasoning capability over raw throughput.
Input
$1.3 /M
Output
$7.8 /M
Context
262K
Max Output
66K
Tool Use
Supported
Access Qwen3.6 Max Preview through our unified API — OpenAI-compatible, no cold starts, transparent pricing.
Pricing on WaveSpeedAI: $1.30 per million input tokens and $7.80 per million output tokens. Prompt caching and batch processing are billed separately and reduce effective cost on long, repetitive workloads.
Qwen3.6 Max Preview supports up to 262K tokens of context with up to 66K tokens of output per request.
Yes. WaveSpeedAI exposes Qwen3.6 Max Preview through an OpenAI-compatible endpoint at https://llm.wavespeed.ai/v1. Point the official OpenAI SDK at this base URL with your WaveSpeedAI API key — no other code changes required.
Sign in to WaveSpeedAI, create an API key in Access Keys, then send a request to https://llm.wavespeed.ai/v1/chat/completions with model id set to the value shown above. New accounts receive free credits to evaluate Qwen3.6 Max Preview before paying per token.