Seedance 2.0 | Special Offer ✦ 10% OFF NOW
deepseek
deepseek/deepseek-v4-flash

deepseek/deepseek-v4-flash

1,048,576 context · $0.17/M input tokens · $0.34/M output tokens

DeepSeek V4 Flash is DeepSeek's efficiency-first open-source model released in April 2026, built on a 284B-parameter Mixture-of-Experts architecture with just 13B parameters active per token — the smallest activation footprint among current Tier-1 models. It shares the same 1M-token context window and hybrid attention design as V4 Pro, delivering near-equivalent reasoning capability (LiveCodeBench 91.6, Codeforces 3052, SWE-bench Verified 79.0) while running significantly faster and at dramatically lower cost. Pre-trained on 32T tokens, V4 Flash is purpose-built for high-throughput, latency-sensitive scenarios such as coding assistants, conversational agents, and batch processing pipelines. It supports thinking and non-thinking modes, function calling, JSON output, and FIM completion.

Tarification

Paiement à l'usage

Aucun coût initial, payez uniquement ce que vous utilisez

Entrée$0.17 / M Tokens
Sortie$0.34 / M Tokens

Utilisation de l'API

Utilisez les exemples de code suivants pour intégrer notre API :

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://llm.wavespeed.ai/v1"
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Introduction au modèle

Deepseek deepseek-v4-flash

DeepSeek-V4-Flash is DeepSeek's cost-efficient open-source model, released on April 24, 2026. It is a 284B parameter Mixture-of-Experts (MoE) language model with only 13B active parameters, pre-trained on 32T tokens, supporting a context length of one million tokens. V4-Flash delivers reasoning performance approaching V4-Pro while being significantly faster and cheaper — making it ideal for high-volume, latency-sensitive workloads.


Why It Looks Great

  • Mixture-of-Experts architecture with 284B total parameters and only 13B active — the smallest activation among Tier-1 models
  • 1000000 context window powered by Compressed Sparse Attention (CSA) and DeepSeek Sparse Attention (DSA)
  • Near V4-Pro reasoning performance at a fraction of the cost

Key Features

  • Context Window: 1000000 tokens
  • Max Output: 384000 tokens
  • Vision: Not Supported
  • Function Calling: Supported
  • Thinking Mode: Supported (non-thinking / high / max)
  • JSON Output: Supported
  • FIM Completion: Supported (non-thinking mode only)

Benchmarks

BenchmarkV4-FlashV4-ProClaude Opus 4.6GPT-5.4
SWE-bench Verified79.080.680.8
LiveCodeBench91.693.588.891.7
Codeforces Rating305232063168
MMLU-Pro86.287.589.187.5
Terminal Bench 2.056.967.965.475.1

Specifications

SpecificationValue
ProviderDeepseek
Model TypeLarge Language Model (LLM)
ArchitectureMixture-of-Experts (MoE)
Total Parameters284B (13B active)
Context Window1000000 tokens
Max Output384000 tokens
InputText
OutputText
VisionNot Supported
Function CallingSupported
Thinking ModeSupported (high / max)
Release DateApril 24, 2026

How to Use

  1. Write your prompt — describe the task, provide context, and specify desired output format.
  2. Submit — the model processes your request and returns the response.

API Integration

Base URL: https://llm.wavespeed.ai/v1 API Endpoint: chat/completions Model ID: deepseek/deepseek-v4-flash


API Usage

Python SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://llm.wavespeed.ai/v1"
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

cURL

curl https://llm.wavespeed.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Notes

  • Model: deepseek/deepseek-v4-flash
  • Provider: Deepseek
  • Open-source weights available on HuggingFace and ModelScope
  • Supports both OpenAI and Anthropic API formats
  • For simple Agent tasks, V4-Flash performs on par with V4-Pro; for complex agentic workflows, consider V4-Pro

Infos

Fournisseurdeepseek
Typellm

Fonctionnalités prises en charge

Entrée
Texte
Sortie
Texte
Contexte1,048,576
Sortie max384,000
Vision-
Function Calling✓ Pris en charge

Guide d'accès API

Base URLhttps://llm.wavespeed.ai/v1
API Endpointchat/completions
ID du modèledeepseek/deepseek-v4-flash

DeepSeek V4 Flash API

deepseek/deepseek-v4-flash

DeepSeek V4 Flash is DeepSeek's efficiency-first open-source model released in April 2026, built on a 284B-parameter Mixture-of-Experts architecture with just 13B parameters active per token — the smallest activation footprint among current Tier-1 models. It shares the same 1M-token context window and hybrid attention design as V4 Pro, delivering near-equivalent reasoning capability (LiveCodeBench 91.6, Codeforces 3052, SWE-bench Verified 79.0) while running significantly faster and at dramatically lower cost. Pre-trained on 32T tokens, V4 Flash is purpose-built for high-throughput, latency-sensitive scenarios such as coding assistants, conversational agents, and batch processing pipelines. It supports thinking and non-thinking modes, function calling, JSON output, and FIM completion.

Entrée

$0.17 /M

Sortie

$0.34 /M

Contexte

1049K

Sortie max.

384K

Utilisation d'outils

Pris en charge

Essayez DeepSeek V4 Flash sur WaveSpeedAI

Accédez à DeepSeek V4 Flash via notre API unifiée — compatible OpenAI, sans démarrages à froid, prix transparents.

Ouvrir le Playground

Questions fréquentes sur DeepSeek V4 Flash

Combien coûte l'API DeepSeek V4 Flash ?+

Tarification sur WaveSpeedAI : $0.17 par million de tokens d'entrée et $0.34 par million de tokens de sortie. Le prompt caching et le traitement par batch sont facturés séparément et réduisent le coût effectif sur les charges longues et répétitives.

Quelle est la fenêtre de contexte de DeepSeek V4 Flash ?+

DeepSeek V4 Flash prend en charge jusqu'à 1049K tokens de contexte et jusqu'à 384K tokens de sortie par requête.

DeepSeek V4 Flash est-il compatible avec OpenAI ?+

Oui. WaveSpeedAI expose DeepSeek V4 Flash via un endpoint compatible OpenAI à https://llm.wavespeed.ai/v1. Pointez le SDK officiel d'OpenAI vers cette base URL avec votre clé API WaveSpeedAI — aucune autre modification de code requise.

Comment démarrer avec DeepSeek V4 Flash ?+

Connectez-vous à WaveSpeedAI, créez une clé API dans Access Keys, puis envoyez une requête à https://llm.wavespeed.ai/v1/chat/completions avec l'id du modèle affiché ci-dessus. Les nouveaux comptes reçoivent des crédits gratuits pour évaluer DeepSeek V4 Flash.

APIs LLM associées