z-ai/glm-5.2
प्रकाशन तिथि: 2026-06-17
1,048,576 context · $1.40/M input tokens · $4.40/M output tokens
GLM 5.2 is Z.ai’s most advanced reasoning model, built for long-context, agentic, and engineering-intensive workloads. With support for a 1M-token context window and configurable High/XHigh reasoning modes, it delivers state-of-the-art performance in coding, tool use, and complex task execution.From requirements gathering and architecture design to implementation, testing, and multi-platform deployment, GLM 5.2 can maintain project-level context and consistently follow engineering best practices throughout the entire software development lifecycle.
Pay-per-use
No upfront costs, pay only for what you use
Use the following code examples to integrate with our API:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="z-ai/glm-5.2",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)GLM 5.2 is Z.ai’s latest large-scale reasoning model, designed for long-context understanding, advanced coding, and complex agent workflows. With support for a 1M-token context window and configurable reasoning levels, it can maintain project-scale context across extended interactions, making it well-suited for software engineering, research, automation, and multi-step problem solving.
The model supports both High and XHigh reasoning modes, with XHigh enabling its maximum reasoning capability. GLM 5.2 excels at code generation, tool use, structured outputs, and long-horizon task execution, allowing developers to build sophisticated AI agents and automation systems that operate reliably over large amounts of context.
This model is available through the WaveSpeed AI OpenAI-compatible API and can be integrated into existing applications with minimal changes.
| Specification | Value |
|---|---|
| Provider | chatglm |
| Model Type | Chat Completions |
| Architecture | Text → Text |
| Context Window | 1,048,576 tokens |
| Max Input | 786,432 tokens |
| Max Output | 262,144 tokens |
| Input | Text |
| Output | Text |
| Function Calling | Supported |
| Structured Outputs | Supported |
Base URL
https://llm.wavespeed.ai/v1
Endpoint
POST /chat/completions
Model ID
z-ai/glm-5.2
z-ai/glm-5.2chatglmz-ai/glm-5.2
GLM 5.2 is Z.ai’s most advanced reasoning model, built for long-context, agentic, and engineering-intensive workloads. With support for a 1M-token context window and configurable High/XHigh reasoning modes, it delivers state-of-the-art performance in coding, tool use, and complex task execution.From requirements gathering and architecture design to implementation, testing, and multi-platform deployment, GLM 5.2 can maintain project-level context and consistently follow engineering best practices throughout the entire software development lifecycle.
Input
$1.4 /M
Output
$4.4 /M
Context
1049K
Max Output
262K
Tool Use
Supported
Access GLM 5.2 through our unified API — OpenAI-compatible, no cold starts, transparent pricing.
Pricing on WaveSpeedAI: $1.40 per million input tokens and $4.40 per million output tokens. Prompt caching and batch processing are billed separately and reduce effective cost on long, repetitive workloads.
GLM 5.2 supports up to 1049K tokens of context with up to 262K tokens of output per request.
Yes. WaveSpeedAI exposes GLM 5.2 through an OpenAI-compatible endpoint at https://llm.wavespeed.ai/v1. Point the official OpenAI SDK at this base URL with your WaveSpeedAI API key — no other code changes required.
Sign in to WaveSpeedAI, create an API key in Access Keys, then send a request to https://llm.wavespeed.ai/v1/chat/completions with model id set to the value shown above. New accounts receive free credits to evaluate GLM 5.2 before paying per token.