google/gemini-3.1-flash-lite
1,048,576 context · $0.25/M input tokens · $1.50/M output tokens
Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.
Kullandıkça öde
Ön ödeme yok, yalnızca kullandığınız kadar ödeyin
API'mizle entegre etmek için aşağıdaki kod örneklerini kullanın:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="google/gemini-3.1-flash-lite",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, classification, summarization, and applications where responsiveness and API cost are the primary constraints.
| Specification | Value |
|---|---|
| Provider | |
| Model Type | Chat Completions model |
| Architecture | text+image+file+audio+video->text |
| Context Window | 1,048,576 tokens |
| Max Input | 983,040 tokens |
| Max Output | 65,536 tokens |
| Input | Text, Image, Video, Audio, PDF |
| Output | Text |
| Vision | Supported |
| Function Calling | Supported |
| Structured Outputs | Supported |
| Audio Input | Supported |
| Thinking Levels | minimal, low, medium, high |
| Token Type | Cost |
|---|---|
| Input | $0.25 per million tokens |
| Output | $1.50 per million tokens |
| Cached Input | $0.025 per million tokens |
| Cache Write | $0.083333 per million tokens |
| Reasoning Output | $1.50 per million tokens |
Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: google/gemini-3.1-flash-lite
google/gemini-3.1-flash-lite
Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.
Giriş
$0.25 /M
Çıkış
$1.5 /M
Bağlam
1049K
Maks. Çıkış
66K
Vision
Destekleniyor
Araç Kullanımı
Destekleniyor
Birleşik API'miz aracılığıyla Gemini 3.1 Flash Lite'e erişin — OpenAI uyumlu, soğuk başlatma yok, şeffaf fiyatlandırma.
WaveSpeedAI fiyatlandırması: milyon giriş tokenı başına $0.25 ve milyon çıkış tokenı başına $1.50. Prompt caching ve toplu işleme ayrı faturalanır ve uzun, tekrar eden yüklerde etkin maliyeti düşürür.
Gemini 3.1 Flash Lite istek başına 1049K bağlam tokenını ve 66K çıkış tokenını destekler.
Evet. WaveSpeedAI, Gemini 3.1 Flash Lite modelini https://llm.wavespeed.ai/v1 adresindeki OpenAI uyumlu endpoint üzerinden sunar. Resmi OpenAI SDK'sını WaveSpeedAI API anahtarınızla bu base URL'ye yöneltin — başka kod değişikliği gerekmez.
WaveSpeedAI'a giriş yapın, Access Keys'te bir API anahtarı oluşturun, ardından yukarıda gösterilen model id ile https://llm.wavespeed.ai/v1/chat/completions adresine bir istek gönderin. Yeni hesaplar Gemini 3.1 Flash Lite'i değerlendirmek için ücretsiz krediler alır.