50% di sconto sui modelli Vidu Q3 e Q3 Pro · Solo su WaveSpeedAI | 20 maggio – 2 giugno
google
google/gemini-3.1-flash-lite

google/gemini-3.1-flash-lite

1,048,576 context · $0.25/M input tokens · $1.50/M output tokens

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

Prezzi

Pay-per-use

Nessun costo iniziale, paga solo per ciò che usi

Input$0.25 / M Tokens
Output$1.50 / M Tokens
Cache Read$0.03 / M Tokens
Cache Write$0.08 / M Tokens

Prova il modello

google/gemini-3.1-flash-lite
Online
google
Ciao! Sono un assistente IA utile. Come posso aiutarti?

Utilizzo API

Usa i seguenti esempi di codice per integrare la nostra API:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://llm.wavespeed.ai/v1"
)

response = client.chat.completions.create(
    model="google/gemini-3.1-flash-lite",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Introduzione al modello

Google: Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, classification, summarization, and applications where responsiveness and API cost are the primary constraints.


Why It Looks Great

  • High-efficiency multimodal model for text, image, video, audio, and PDF understanding
  • Optimized for low-latency, high-volume production workloads
  • Supports a 1M-token context window for long prompts, document analysis, and multi-turn workflows
  • Up to 64K output tokens for extended responses and structured generation
  • Thinking levels from minimal to high for cost, latency, and quality trade-offs
  • Priced at half the cost of Gemini 3 Flash
  • Strong fit for lightweight agents, simple extraction tasks, summarization, classification, and responsive app experiences

Key Features

  • Context Window: 1,048,576 tokens
  • Max Input: 983,040 tokens
  • Max Output: 65,536 tokens
  • Input: Text, Image, Video, Audio, PDF
  • Output: Text
  • Vision: Supported
  • Audio Input: Supported
  • Function Calling: Supported
  • Structured Outputs: Supported
  • Thinking Levels: minimal, low, medium, high

Specifications

SpecificationValue
Providergoogle
Model TypeChat Completions model
Architecturetext+image+file+audio+video->text
Context Window1,048,576 tokens
Max Input983,040 tokens
Max Output65,536 tokens
InputText, Image, Video, Audio, PDF
OutputText
VisionSupported
Function CallingSupported
Structured OutputsSupported
Audio InputSupported
Thinking Levelsminimal, low, medium, high

Pricing

Token TypeCost
Input$0.25 per million tokens
Output$1.50 per million tokens
Cached Input$0.025 per million tokens
Cache Write$0.083333 per million tokens
Reasoning Output$1.50 per million tokens

How to Use

API Integration

Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: google/gemini-3.1-flash-lite

Info

Providergoogle
Tipollm

Funzionalità supportate

Input
TestoImmagineAudio
Output
Testo
Contesto1,048,576
Output massimo65,536
Vision✓ Supportato
Function Calling✓ Supportato

Guida all'accesso API

Base URLhttps://llm.wavespeed.ai/v1
API Endpointchat/completions
ID modellogoogle/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite API

google/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

Input

$0.25 /M

Output

$1.5 /M

Contesto

1049K

Output max

66K

Vision

Supportato

Uso strumenti

Supportato

Prova Gemini 3.1 Flash Lite su WaveSpeedAI

Accedi a Gemini 3.1 Flash Lite tramite la nostra API unificata — compatibile con OpenAI, senza cold start, prezzi trasparenti.

Domande frequenti su Gemini 3.1 Flash Lite

Quanto costa Gemini 3.1 Flash Lite via API?+

Prezzi su WaveSpeedAI: $0.25 per milione di token in input e $1.50 per milione di token in output. Prompt caching e batch processing sono fatturati separatamente e riducono il costo effettivo su carichi lunghi e ripetitivi.

Qual è la context window di Gemini 3.1 Flash Lite?+

Gemini 3.1 Flash Lite supporta fino a 1049K token di contesto e fino a 66K token di output per richiesta.

Gemini 3.1 Flash Lite è compatibile con OpenAI?+

Sì. WaveSpeedAI espone Gemini 3.1 Flash Lite tramite un endpoint compatibile con OpenAI all'indirizzo https://llm.wavespeed.ai/v1. Punta l'SDK ufficiale di OpenAI a questa base URL con la tua API key WaveSpeedAI — senza altre modifiche al codice.

Come si inizia con Gemini 3.1 Flash Lite?+

Accedi a WaveSpeedAI, crea una API key in Access Keys, poi invia una richiesta a https://llm.wavespeed.ai/v1/chat/completions con il model id mostrato sopra. I nuovi account ricevono crediti gratuiti per testare Gemini 3.1 Flash Lite.

API LLM correlate