Vidu Q3 與 Q3 Pro 模型 5 折 · 僅限 WaveSpeedAI | 5月20日 – 6月2日
google
google/gemini-3.1-flash-lite

google/gemini-3.1-flash-lite

1,048,576 context · $0.25/M input tokens · $1.50/M output tokens

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

定價

按用量付費

無需預付費用,僅按實際使用量付費

輸入$0.25 / M Tokens
輸出$1.50 / M Tokens
Cache Read$0.03 / M Tokens
Cache Write$0.08 / M Tokens

試用模型

google/gemini-3.1-flash-lite
線上
google
嗨!我是樂於助人的 AI 助理。有什麼可以幫你的嗎?

API 使用

使用以下程式碼範例整合我們的 API:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://llm.wavespeed.ai/v1"
)

response = client.chat.completions.create(
    model="google/gemini-3.1-flash-lite",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

模型介紹

Google: Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, classification, summarization, and applications where responsiveness and API cost are the primary constraints.


Why It Looks Great

  • High-efficiency multimodal model for text, image, video, audio, and PDF understanding
  • Optimized for low-latency, high-volume production workloads
  • Supports a 1M-token context window for long prompts, document analysis, and multi-turn workflows
  • Up to 64K output tokens for extended responses and structured generation
  • Thinking levels from minimal to high for cost, latency, and quality trade-offs
  • Priced at half the cost of Gemini 3 Flash
  • Strong fit for lightweight agents, simple extraction tasks, summarization, classification, and responsive app experiences

Key Features

  • Context Window: 1,048,576 tokens
  • Max Input: 983,040 tokens
  • Max Output: 65,536 tokens
  • Input: Text, Image, Video, Audio, PDF
  • Output: Text
  • Vision: Supported
  • Audio Input: Supported
  • Function Calling: Supported
  • Structured Outputs: Supported
  • Thinking Levels: minimal, low, medium, high

Specifications

SpecificationValue
Providergoogle
Model TypeChat Completions model
Architecturetext+image+file+audio+video->text
Context Window1,048,576 tokens
Max Input983,040 tokens
Max Output65,536 tokens
InputText, Image, Video, Audio, PDF
OutputText
VisionSupported
Function CallingSupported
Structured OutputsSupported
Audio InputSupported
Thinking Levelsminimal, low, medium, high

Pricing

Token TypeCost
Input$0.25 per million tokens
Output$1.50 per million tokens
Cached Input$0.025 per million tokens
Cache Write$0.083333 per million tokens
Reasoning Output$1.50 per million tokens

How to Use

API Integration

Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: google/gemini-3.1-flash-lite

資訊

提供商google
類型llm

支援功能

輸入
文字影像音訊
輸出
文字
上下文1,048,576
最大輸出65,536
視覺✓ 支援
函式呼叫✓ 支援

API 存取指南

Base URLhttps://llm.wavespeed.ai/v1
API 端點chat/completions
Model IDgoogle/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite API

google/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

輸入

$0.25 /M

輸出

$1.5 /M

上下文

1049K

最大輸出

66K

Vision

支援

工具調用

支援

在 WaveSpeedAI 試用 Gemini 3.1 Flash Lite

透過我們的統一 API 接入 Gemini 3.1 Flash Lite — 相容 OpenAI、無冷啟動、透明計費。

關於 Gemini 3.1 Flash Lite 的常見問題

Gemini 3.1 Flash Lite API 多少錢?+

WaveSpeedAI 定價:輸入每百萬 token $0.25,輸出每百萬 token $1.50。Prompt 快取與批次處理分別計費,可顯著降低長上下文、高重複任務的實際成本。

Gemini 3.1 Flash Lite 的上下文視窗有多大?+

Gemini 3.1 Flash Lite 每次請求最多支援 1049K 上下文 token,輸出最多 66K token。

Gemini 3.1 Flash Lite 是否相容 OpenAI?+

是的。WaveSpeedAI 透過 https://llm.wavespeed.ai/v1 的 OpenAI 相容端點提供 Gemini 3.1 Flash Lite。將官方 OpenAI SDK 的 base URL 指向該位址,使用 WaveSpeedAI 的 API Key 即可,無需其他程式碼變更。

如何開始使用 Gemini 3.1 Flash Lite?+

登入 WaveSpeedAI,在 Access Keys 建立 API Key,使用上方顯示的 model id 向 https://llm.wavespeed.ai/v1/chat/completions 發送請求。新帳號將獲得免費額度,用於試用 Gemini 3.1 Flash Lite。

相關 LLM API