Gemini 3.1 Flash Lite | Google Efficient LLM API

Name: Gemini 3.1 Flash Lite API
Brand: google
Price: 0.25 USD
Availability: InStock

google/gemini-3.1-flash-lite

发布时间: 2026-05-07

1,048,576 context · $0.25/M input tokens · $1.50/M output tokens

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

定价

按量付费

无需预付费用，仅按实际使用量付费

输入$0.25 / M Tokens

输出$1.50 / M Tokens

Cache Read$0.03 / M Tokens

Cache Write$0.08 / M Tokens

试用模型

google/gemini-3.1-flash-lite

在线

你好！我是乐于助人的 AI 助手。需要我帮你做什么？

准备在本地编码 Agent 中使用这个模型吗？Agent 配置

API 使用

使用以下代码示例接入我们的 API：

import OpenAI from 'openai';

if (!process.env.WAVESPEED_API_KEY) throw new Error('Set WAVESPEED_API_KEY');
const client = new OpenAI({
  apiKey: process.env.WAVESPEED_API_KEY,
  baseURL: 'https://llm.wavespeed.ai/v1',
  timeout: 120_000,
  maxRetries: 2,
});

try {
  const response = await client.chat.completions.create({
    model: 'google/gemini-3.1-flash-lite',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log(response.choices[0]?.message?.content ?? '');
} catch (error) {
  console.error('LLM request failed:', error);
  process.exitCode = 1;
}

import OpenAI from 'openai';

if (!process.env.WAVESPEED_API_KEY) throw new Error('Set WAVESPEED_API_KEY');
const client = new OpenAI({
  apiKey: process.env.WAVESPEED_API_KEY,
  baseURL: 'https://llm.wavespeed.ai/v1',
  timeout: 120_000,
  maxRetries: 2,
});

try {
  const response = await client.chat.completions.create({
    model: 'google/gemini-3.1-flash-lite',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log(response.choices[0]?.message?.content ?? '');
} catch (error) {
  console.error('LLM request failed:', error);
  process.exitCode = 1;
}

模型介绍

Google: Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, classification, summarization, and applications where responsiveness and API cost are the primary constraints.

Why It Looks Great

High-efficiency multimodal model for text, image, video, audio, and PDF understanding
Optimized for low-latency, high-volume production workloads
Supports a 1M-token context window for long prompts, document analysis, and multi-turn workflows
Up to 64K output tokens for extended responses and structured generation
Thinking levels from minimal to high for cost, latency, and quality trade-offs
Priced at half the cost of Gemini 3 Flash
Strong fit for lightweight agents, simple extraction tasks, summarization, classification, and responsive app experiences

Key Features

Context Window: 1,048,576 tokens
Max Input: 983,040 tokens
Max Output: 65,536 tokens
Input: Text, Image, Video, Audio, PDF
Output: Text
Vision: Supported
Audio Input: Supported
Function Calling: Supported
Structured Outputs: Supported
Thinking Levels: minimal, low, medium, high

Specifications

Specification	Value
Provider	google
Model Type	Chat Completions model
Architecture	text+image+file+audio+video->text
Context Window	1,048,576 tokens
Max Input	983,040 tokens
Max Output	65,536 tokens
Input	Text, Image, Video, Audio, PDF
Output	Text
Vision	Supported
Function Calling	Supported
Structured Outputs	Supported
Audio Input	Supported
Thinking Levels	minimal, low, medium, high

Pricing

Token Type	Cost
Input	$0.25 per million tokens
Output	$1.50 per million tokens
Cached Input	$0.025 per million tokens
Cache Write	$0.083333 per million tokens
Reasoning Output	$1.50 per million tokens

How to Use

API Integration

Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: google/gemini-3.1-flash-lite

信息

提供商google

类型llm

支持功能

输入

文本图像音频

输出

文本

上下文1,048,576

最大输出65,536

视觉✓ 支持

函数调用✓ 支持

API 访问指南

Base URLhttps://llm.wavespeed.ai/v1

API 端点chat/completions

Model IDgoogle/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite API

google/gemini-3.1-flash-lite

关于 Gemini 3.1 Flash Lite 的常见问题

Gemini 3.1 Flash Lite API 多少钱?+

WaveSpeedAI 定价:输入每百万 token $0.25,输出每百万 token $1.50。Prompt 缓存和批处理单独计费,可显著降低长上下文、高重复任务的实际成本。

Gemini 3.1 Flash Lite 的上下文窗口是多大?+

Gemini 3.1 Flash Lite 单次请求最多支持 1049K 上下文 token,输出最多 66K token。

Gemini 3.1 Flash Lite 是否兼容 OpenAI?+

WaveSpeedAI 通过 https://llm.wavespeed.ai/v1 的 OpenAI 兼容 Chat Completions 接口提供 Gemini 3.1 Flash Lite。大多数 OpenAI SDK 客户端只需更换 base URL 和 API Key；可选字段取决于具体模型。

如何开始使用 Gemini 3.1 Flash Lite?+

登录 WaveSpeedAI，在 Access Keys 中创建 API Key，然后使用上方显示的 model id 向 https://llm.wavespeed.ai/v1/chat/completions 发送请求。模型可用性、能力和价格请以当前模型目录为准。

定价

试用模型

API 使用

模型介绍

Google: Gemini 3.1 Flash Lite

Why It Looks Great

Key Features

Specifications

Pricing

How to Use

API Integration

信息

支持功能

API 访问指南

Gemini 3.1 Flash Lite API

在 WaveSpeedAI 试用 Gemini 3.1 Flash Lite

关于 Gemini 3.1 Flash Lite 的常见问题

相关 LLM API