Gemini 3.1 Flash Lite | Google Efficient LLM API

Name: Gemini 3.1 Flash Lite API
Brand: google
Price: 0.25 USD
Availability: InStock

google/gemini-3.1-flash-lite

Data di rilascio: 2026-05-07

1,048,576 context · $0.25/M input tokens · $1.50/M output tokens

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

Prezzi

Pay-per-use

Nessun costo iniziale, paga solo per ciò che usi

Input$0.25 / M Tokens

Output$1.50 / M Tokens

Cache Read$0.03 / M Tokens

Cache Write$0.08 / M Tokens

Prova il modello

google/gemini-3.1-flash-lite

Online

Ciao! Sono un assistente IA utile. Come posso aiutarti?

Pronto a usare questo modello in un coding agent locale?Setup agent

Utilizzo API

Usa i seguenti esempi di codice per integrare la nostra API:

import OpenAI from 'openai';

if (!process.env.WAVESPEED_API_KEY) throw new Error('Set WAVESPEED_API_KEY');
const client = new OpenAI({
  apiKey: process.env.WAVESPEED_API_KEY,
  baseURL: 'https://llm.wavespeed.ai/v1',
  timeout: 120_000,
  maxRetries: 2,
});

try {
  const response = await client.chat.completions.create({
    model: 'google/gemini-3.1-flash-lite',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log(response.choices[0]?.message?.content ?? '');
} catch (error) {
  console.error('LLM request failed:', error);
  process.exitCode = 1;
}

import OpenAI from 'openai';

if (!process.env.WAVESPEED_API_KEY) throw new Error('Set WAVESPEED_API_KEY');
const client = new OpenAI({
  apiKey: process.env.WAVESPEED_API_KEY,
  baseURL: 'https://llm.wavespeed.ai/v1',
  timeout: 120_000,
  maxRetries: 2,
});

try {
  const response = await client.chat.completions.create({
    model: 'google/gemini-3.1-flash-lite',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log(response.choices[0]?.message?.content ?? '');
} catch (error) {
  console.error('LLM request failed:', error);
  process.exitCode = 1;
}

Introduzione al modello

Google: Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, classification, summarization, and applications where responsiveness and API cost are the primary constraints.

Why It Looks Great

High-efficiency multimodal model for text, image, video, audio, and PDF understanding
Optimized for low-latency, high-volume production workloads
Supports a 1M-token context window for long prompts, document analysis, and multi-turn workflows
Up to 64K output tokens for extended responses and structured generation
Thinking levels from minimal to high for cost, latency, and quality trade-offs
Priced at half the cost of Gemini 3 Flash
Strong fit for lightweight agents, simple extraction tasks, summarization, classification, and responsive app experiences

Key Features

Context Window: 1,048,576 tokens
Max Input: 983,040 tokens
Max Output: 65,536 tokens
Input: Text, Image, Video, Audio, PDF
Output: Text
Vision: Supported
Audio Input: Supported
Function Calling: Supported
Structured Outputs: Supported
Thinking Levels: minimal, low, medium, high

Specifications

Specification	Value
Provider	google
Model Type	Chat Completions model
Architecture	text+image+file+audio+video->text
Context Window	1,048,576 tokens
Max Input	983,040 tokens
Max Output	65,536 tokens
Input	Text, Image, Video, Audio, PDF
Output	Text
Vision	Supported
Function Calling	Supported
Structured Outputs	Supported
Audio Input	Supported
Thinking Levels	minimal, low, medium, high

Pricing

Token Type	Cost
Input	$0.25 per million tokens
Output	$1.50 per million tokens
Cached Input	$0.025 per million tokens
Cache Write	$0.083333 per million tokens
Reasoning Output	$1.50 per million tokens

How to Use

API Integration

Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: google/gemini-3.1-flash-lite

Info

Providergoogle

Tipollm

Funzionalità supportate

Input

TestoImmagineAudio

Output

Testo

Contesto1,048,576

Output massimo65,536

Vision✓ Supportato

Function Calling✓ Supportato

Guida all'accesso API

Base URLhttps://llm.wavespeed.ai/v1

API Endpointchat/completions

ID modellogoogle/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite API

google/gemini-3.1-flash-lite

Domande frequenti su Gemini 3.1 Flash Lite

Quanto costa Gemini 3.1 Flash Lite via API?+

Prezzi su WaveSpeedAI: $0.25 per milione di token in input e $1.50 per milione di token in output. Prompt caching e batch processing sono fatturati separatamente e riducono il costo effettivo su carichi lunghi e ripetitivi.

Qual è la context window di Gemini 3.1 Flash Lite?+

Gemini 3.1 Flash Lite supporta fino a 1049K token di contesto e fino a 66K token di output per richiesta.

Gemini 3.1 Flash Lite è compatibile con OpenAI?+

WaveSpeedAI espone Gemini 3.1 Flash Lite su https://llm.wavespeed.ai/v1 tramite l’interfaccia Chat Completions compatibile con OpenAI. Per la maggior parte dei client OpenAI SDK basta cambiare base URL e API key; i campi opzionali dipendono dal modello.

Come si inizia con Gemini 3.1 Flash Lite?+

Accedi a WaveSpeedAI, crea una API key in Access Keys e invia una richiesta a https://llm.wavespeed.ai/v1/chat/completions con il model id mostrato sopra. Consulta il catalogo attuale per disponibilità, funzionalità e prezzi.

Prezzi

Prova il modello

Utilizzo API

Introduzione al modello

Google: Gemini 3.1 Flash Lite

Why It Looks Great

Key Features

Specifications

Pricing

How to Use

API Integration

Info

Funzionalità supportate

Guida all'accesso API

Gemini 3.1 Flash Lite API

Prova Gemini 3.1 Flash Lite su WaveSpeedAI

Domande frequenti su Gemini 3.1 Flash Lite

API LLM correlate