Gemini 3.1 Flash Lite | Google Efficient LLM API

Name: Gemini 3.1 Flash Lite API
Brand: google
Price: 0.25 USD
Availability: InStock

google/gemini-3.1-flash-lite

リリース日: 2026-05-07

1,048,576 context · $0.25/M input tokens · $1.50/M output tokens

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

料金

従量課金

初期費用なし、使った分だけお支払い

入力$0.25 / M Tokens

出力$1.50 / M Tokens

Cache Read$0.03 / M Tokens

Cache Write$0.08 / M Tokens

モデルを試す

google/gemini-3.1-flash-lite

オンライン

こんにちは！お手伝いできるAIアシスタントです。何かお手伝いできることはありますか？

このモデルをローカル coding agent で使いますか？Agent 設定

API 利用

以下のコード例を使用して API と連携してください:

import OpenAI from 'openai';

if (!process.env.WAVESPEED_API_KEY) throw new Error('Set WAVESPEED_API_KEY');
const client = new OpenAI({
  apiKey: process.env.WAVESPEED_API_KEY,
  baseURL: 'https://llm.wavespeed.ai/v1',
  timeout: 120_000,
  maxRetries: 2,
});

try {
  const response = await client.chat.completions.create({
    model: 'google/gemini-3.1-flash-lite',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log(response.choices[0]?.message?.content ?? '');
} catch (error) {
  console.error('LLM request failed:', error);
  process.exitCode = 1;
}

import OpenAI from 'openai';

if (!process.env.WAVESPEED_API_KEY) throw new Error('Set WAVESPEED_API_KEY');
const client = new OpenAI({
  apiKey: process.env.WAVESPEED_API_KEY,
  baseURL: 'https://llm.wavespeed.ai/v1',
  timeout: 120_000,
  maxRetries: 2,
});

try {
  const response = await client.chat.completions.create({
    model: 'google/gemini-3.1-flash-lite',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log(response.choices[0]?.message?.content ?? '');
} catch (error) {
  console.error('LLM request failed:', error);
  process.exitCode = 1;
}

モデル紹介

Google: Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, classification, summarization, and applications where responsiveness and API cost are the primary constraints.

Why It Looks Great

High-efficiency multimodal model for text, image, video, audio, and PDF understanding
Optimized for low-latency, high-volume production workloads
Supports a 1M-token context window for long prompts, document analysis, and multi-turn workflows
Up to 64K output tokens for extended responses and structured generation
Thinking levels from minimal to high for cost, latency, and quality trade-offs
Priced at half the cost of Gemini 3 Flash
Strong fit for lightweight agents, simple extraction tasks, summarization, classification, and responsive app experiences

Key Features

Context Window: 1,048,576 tokens
Max Input: 983,040 tokens
Max Output: 65,536 tokens
Input: Text, Image, Video, Audio, PDF
Output: Text
Vision: Supported
Audio Input: Supported
Function Calling: Supported
Structured Outputs: Supported
Thinking Levels: minimal, low, medium, high

Specifications

Specification	Value
Provider	google
Model Type	Chat Completions model
Architecture	text+image+file+audio+video->text
Context Window	1,048,576 tokens
Max Input	983,040 tokens
Max Output	65,536 tokens
Input	Text, Image, Video, Audio, PDF
Output	Text
Vision	Supported
Function Calling	Supported
Structured Outputs	Supported
Audio Input	Supported
Thinking Levels	minimal, low, medium, high

Pricing

Token Type	Cost
Input	$0.25 per million tokens
Output	$1.50 per million tokens
Cached Input	$0.025 per million tokens
Cache Write	$0.083333 per million tokens
Reasoning Output	$1.50 per million tokens

How to Use

API Integration

Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: google/gemini-3.1-flash-lite

情報

プロバイダーgoogle

タイプllm

対応機能

入力

テキスト画像音声

出力

テキスト

コンテキスト1,048,576

最大出力65,536

Vision✓ 対応

Function Calling✓ 対応

API アクセスガイド

Base URLhttps://llm.wavespeed.ai/v1

API エンドポイントchat/completions

モデル IDgoogle/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite API

google/gemini-3.1-flash-lite

Gemini 3.1 Flash Liteに関するよくある質問

Gemini 3.1 Flash Lite API の料金はいくらですか?+

WaveSpeedAI の料金: 入力 100 万トークンあたり $0.25、出力 100 万トークンあたり $1.50。プロンプトキャッシュとバッチ処理は別途料金で、長く反復的なワークロードでは実効コストを下げられます。

Gemini 3.1 Flash Lite のコンテキストウィンドウはどのくらいですか?+

Gemini 3.1 Flash Lite はリクエストあたり最大 1049K のコンテキストトークンと最大 66K の出力トークンをサポートします。

Gemini 3.1 Flash Lite は OpenAI 互換ですか?+

WaveSpeedAI は https://llm.wavespeed.ai/v1 の OpenAI 互換 Chat Completions インターフェースで Gemini 3.1 Flash Lite を提供します。多くの OpenAI SDK クライアントはベース URL と API キーの変更で利用できますが、オプション項目はモデルごとに異なります。

Gemini 3.1 Flash Lite を使い始めるには?+

WaveSpeedAI にサインインし、Access Keys で API キーを作成して、上に表示されたモデル ID を指定して https://llm.wavespeed.ai/v1/chat/completions にリクエストを送信してください。提供状況、機能、料金は最新のモデルカタログで確認してください。

料金

モデルを試す

API 利用

モデル紹介

Google: Gemini 3.1 Flash Lite

Why It Looks Great

Key Features

Specifications

Pricing

How to Use

API Integration

情報

対応機能

API アクセスガイド

Gemini 3.1 Flash Lite API

WaveSpeedAIでGemini 3.1 Flash Liteを試す

Gemini 3.1 Flash Liteに関するよくある質問

関連 LLM API