deepseek/deepseek-v4-flash
1,048,576 context · $0.17/M input tokens · $0.34/M output tokens
DeepSeek V4 Flash is DeepSeek's efficiency-first open-source model released in April 2026, built on a 284B-parameter Mixture-of-Experts architecture with just 13B parameters active per token — the smallest activation footprint among current Tier-1 models. It shares the same 1M-token context window and hybrid attention design as V4 Pro, delivering near-equivalent reasoning capability (LiveCodeBench 91.6, Codeforces 3052, SWE-bench Verified 79.0) while running significantly faster and at dramatically lower cost. Pre-trained on 32T tokens, V4 Flash is purpose-built for high-throughput, latency-sensitive scenarios such as coding assistants, conversational agents, and batch processing pipelines. It supports thinking and non-thinking modes, function calling, JSON output, and FIM completion.
Pagamento por uso
Sem custo inicial, pague apenas pelo que usar
Use os exemplos de código abaixo para integrar com nossa API:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="deepseek/deepseek-v4-flash",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)DeepSeek-V4-Flash is DeepSeek's cost-efficient open-source model, released on April 24, 2026. It is a 284B parameter Mixture-of-Experts (MoE) language model with only 13B active parameters, pre-trained on 32T tokens, supporting a context length of one million tokens. V4-Flash delivers reasoning performance approaching V4-Pro while being significantly faster and cheaper — making it ideal for high-volume, latency-sensitive workloads.
| Benchmark | V4-Flash | V4-Pro | Claude Opus 4.6 | GPT-5.4 |
|---|---|---|---|---|
| SWE-bench Verified | 79.0 | 80.6 | 80.8 | — |
| LiveCodeBench | 91.6 | 93.5 | 88.8 | 91.7 |
| Codeforces Rating | 3052 | 3206 | — | 3168 |
| MMLU-Pro | 86.2 | 87.5 | 89.1 | 87.5 |
| Terminal Bench 2.0 | 56.9 | 67.9 | 65.4 | 75.1 |
| Specification | Value |
|---|---|
| Provider | Deepseek |
| Model Type | Large Language Model (LLM) |
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 284B (13B active) |
| Context Window | 1000000 tokens |
| Max Output | 384000 tokens |
| Input | Text |
| Output | Text |
| Vision | Not Supported |
| Function Calling | Supported |
| Thinking Mode | Supported (high / max) |
| Release Date | April 24, 2026 |
Base URL: https://llm.wavespeed.ai/v1 API Endpoint: chat/completions Model ID: deepseek/deepseek-v4-flash
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="deepseek/deepseek-v4-flash",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
curl https://llm.wavespeed.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "deepseek/deepseek-v4-flash",
"messages": [{"role": "user", "content": "Hello!"}]
}'
deepseek/deepseek-v4-flash
DeepSeek V4 Flash is DeepSeek's efficiency-first open-source model released in April 2026, built on a 284B-parameter Mixture-of-Experts architecture with just 13B parameters active per token — the smallest activation footprint among current Tier-1 models. It shares the same 1M-token context window and hybrid attention design as V4 Pro, delivering near-equivalent reasoning capability (LiveCodeBench 91.6, Codeforces 3052, SWE-bench Verified 79.0) while running significantly faster and at dramatically lower cost. Pre-trained on 32T tokens, V4 Flash is purpose-built for high-throughput, latency-sensitive scenarios such as coding assistants, conversational agents, and batch processing pipelines. It supports thinking and non-thinking modes, function calling, JSON output, and FIM completion.
Entrada
$0.17 /M
Saída
$0.34 /M
Contexto
1049K
Saída máx.
384K
Uso de ferramentas
Suportado
Acesse DeepSeek V4 Flash através da nossa API unificada — compatível com OpenAI, sem inicializações a frio, preços transparentes.
Abrir PlaygroundPreços no WaveSpeedAI: $0.17 por milhão de tokens de entrada e $0.34 por milhão de tokens de saída. Prompt caching e batch processing são cobrados separadamente e reduzem o custo efetivo em cargas longas e repetitivas.
DeepSeek V4 Flash suporta até 1049K tokens de contexto e até 384K tokens de saída por requisição.
Sim. O WaveSpeedAI expõe o DeepSeek V4 Flash através de um endpoint compatível com OpenAI em https://llm.wavespeed.ai/v1. Aponte o SDK oficial da OpenAI para esta base URL com sua chave API do WaveSpeedAI — sem outras alterações no código.
Entre no WaveSpeedAI, crie uma chave API em Access Keys, então envie uma requisição para https://llm.wavespeed.ai/v1/chat/completions com o model id mostrado acima. Contas novas recebem créditos grátis para avaliar o DeepSeek V4 Flash.