deepseek/deepseek-v4-flash
1,048,576 context · $0.17/M input tokens · $0.34/M output tokens
DeepSeek V4 Flash is DeepSeek's efficiency-first open-source model released in April 2026, built on a 284B-parameter Mixture-of-Experts architecture with just 13B parameters active per token — the smallest activation footprint among current Tier-1 models. It shares the same 1M-token context window and hybrid attention design as V4 Pro, delivering near-equivalent reasoning capability (LiveCodeBench 91.6, Codeforces 3052, SWE-bench Verified 79.0) while running significantly faster and at dramatically lower cost. Pre-trained on 32T tokens, V4 Flash is purpose-built for high-throughput, latency-sensitive scenarios such as coding assistants, conversational agents, and batch processing pipelines. It supports thinking and non-thinking modes, function calling, JSON output, and FIM completion.
Pago por uso
Sin costos iniciales, paga solo por lo que uses
Usa los siguientes ejemplos de código para integrar con nuestra API:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="deepseek/deepseek-v4-flash",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)DeepSeek-V4-Flash is DeepSeek's cost-efficient open-source model, released on April 24, 2026. It is a 284B parameter Mixture-of-Experts (MoE) language model with only 13B active parameters, pre-trained on 32T tokens, supporting a context length of one million tokens. V4-Flash delivers reasoning performance approaching V4-Pro while being significantly faster and cheaper — making it ideal for high-volume, latency-sensitive workloads.
| Benchmark | V4-Flash | V4-Pro | Claude Opus 4.6 | GPT-5.4 |
|---|---|---|---|---|
| SWE-bench Verified | 79.0 | 80.6 | 80.8 | — |
| LiveCodeBench | 91.6 | 93.5 | 88.8 | 91.7 |
| Codeforces Rating | 3052 | 3206 | — | 3168 |
| MMLU-Pro | 86.2 | 87.5 | 89.1 | 87.5 |
| Terminal Bench 2.0 | 56.9 | 67.9 | 65.4 | 75.1 |
| Specification | Value |
|---|---|
| Provider | Deepseek |
| Model Type | Large Language Model (LLM) |
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 284B (13B active) |
| Context Window | 1000000 tokens |
| Max Output | 384000 tokens |
| Input | Text |
| Output | Text |
| Vision | Not Supported |
| Function Calling | Supported |
| Thinking Mode | Supported (high / max) |
| Release Date | April 24, 2026 |
Base URL: https://llm.wavespeed.ai/v1 API Endpoint: chat/completions Model ID: deepseek/deepseek-v4-flash
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="deepseek/deepseek-v4-flash",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
curl https://llm.wavespeed.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "deepseek/deepseek-v4-flash",
"messages": [{"role": "user", "content": "Hello!"}]
}'
deepseek/deepseek-v4-flash
DeepSeek V4 Flash is DeepSeek's efficiency-first open-source model released in April 2026, built on a 284B-parameter Mixture-of-Experts architecture with just 13B parameters active per token — the smallest activation footprint among current Tier-1 models. It shares the same 1M-token context window and hybrid attention design as V4 Pro, delivering near-equivalent reasoning capability (LiveCodeBench 91.6, Codeforces 3052, SWE-bench Verified 79.0) while running significantly faster and at dramatically lower cost. Pre-trained on 32T tokens, V4 Flash is purpose-built for high-throughput, latency-sensitive scenarios such as coding assistants, conversational agents, and batch processing pipelines. It supports thinking and non-thinking modes, function calling, JSON output, and FIM completion.
Entrada
$0.17 /M
Salida
$0.34 /M
Contexto
1049K
Salida máx.
384K
Uso de herramientas
Compatible
Accede a DeepSeek V4 Flash mediante nuestra API unificada — compatible con OpenAI, sin arranques en frío, precios transparentes.
Abrir PlaygroundPrecios en WaveSpeedAI: $0.17 por millón de tokens de entrada y $0.34 por millón de tokens de salida. El prompt caching y el procesamiento por lotes se facturan por separado y reducen el coste efectivo en cargas largas y repetitivas.
DeepSeek V4 Flash admite hasta 1049K tokens de contexto y hasta 384K tokens de salida por solicitud.
Sí. WaveSpeedAI expone DeepSeek V4 Flash a través de un endpoint compatible con OpenAI en https://llm.wavespeed.ai/v1. Apunta el SDK oficial de OpenAI a esta base URL con tu clave API de WaveSpeedAI — sin más cambios de código.
Inicia sesión en WaveSpeedAI, crea una clave API en Access Keys y envía una solicitud a https://llm.wavespeed.ai/v1/chat/completions con el id de modelo mostrado arriba. Las cuentas nuevas reciben créditos gratuitos para evaluar DeepSeek V4 Flash antes de pagar por token.