google/gemini-3.1-flash-lite
1,048,576 context · $0.25/M input tokens · $1.50/M output tokens
Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.
Pagamento por uso
Sem custo inicial, pague apenas pelo que usar
Use os exemplos de código abaixo para integrar com nossa API:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="google/gemini-3.1-flash-lite",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, classification, summarization, and applications where responsiveness and API cost are the primary constraints.
| Specification | Value |
|---|---|
| Provider | |
| Model Type | Chat Completions model |
| Architecture | text+image+file+audio+video->text |
| Context Window | 1,048,576 tokens |
| Max Input | 983,040 tokens |
| Max Output | 65,536 tokens |
| Input | Text, Image, Video, Audio, PDF |
| Output | Text |
| Vision | Supported |
| Function Calling | Supported |
| Structured Outputs | Supported |
| Audio Input | Supported |
| Thinking Levels | minimal, low, medium, high |
| Token Type | Cost |
|---|---|
| Input | $0.25 per million tokens |
| Output | $1.50 per million tokens |
| Cached Input | $0.025 per million tokens |
| Cache Write | $0.083333 per million tokens |
| Reasoning Output | $1.50 per million tokens |
Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: google/gemini-3.1-flash-lite
google/gemini-3.1-flash-lite
Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.
Entrada
$0.25 /M
Saída
$1.5 /M
Contexto
1049K
Saída máx.
66K
Vision
Suportado
Uso de ferramentas
Suportado
Acesse Gemini 3.1 Flash Lite através da nossa API unificada — compatível com OpenAI, sem inicializações a frio, preços transparentes.
Preços no WaveSpeedAI: $0.25 por milhão de tokens de entrada e $1.50 por milhão de tokens de saída. Prompt caching e batch processing são cobrados separadamente e reduzem o custo efetivo em cargas longas e repetitivas.
Gemini 3.1 Flash Lite suporta até 1049K tokens de contexto e até 66K tokens de saída por requisição.
Sim. O WaveSpeedAI expõe o Gemini 3.1 Flash Lite através de um endpoint compatível com OpenAI em https://llm.wavespeed.ai/v1. Aponte o SDK oficial da OpenAI para esta base URL com sua chave API do WaveSpeedAI — sem outras alterações no código.
Entre no WaveSpeedAI, crie uma chave API em Access Keys, então envie uma requisição para https://llm.wavespeed.ai/v1/chat/completions com o model id mostrado acima. Contas novas recebem créditos grátis para avaliar o Gemini 3.1 Flash Lite.