Diskon 50% model Vidu Q3 & Q3 Pro · Hanya di WaveSpeedAI | 20 Mei – 2 Jun
google
google/gemini-3.1-flash-lite

google/gemini-3.1-flash-lite

1,048,576 context · $0.25/M input tokens · $1.50/M output tokens

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

Harga

Bayar sesuai pemakaian

Tanpa biaya di muka, bayar hanya sesuai penggunaan

Input$0.25 / M Tokens
Output$1.50 / M Tokens
Cache Read$0.03 / M Tokens
Cache Write$0.08 / M Tokens

Coba model

google/gemini-3.1-flash-lite
Online
google
Hai! Saya asisten AI yang siap membantu. Ada yang bisa saya bantu?

Penggunaan API

Gunakan contoh kode berikut untuk integrasi dengan API kami:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://llm.wavespeed.ai/v1"
)

response = client.chat.completions.create(
    model="google/gemini-3.1-flash-lite",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Pengenalan Model

Google: Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, classification, summarization, and applications where responsiveness and API cost are the primary constraints.


Why It Looks Great

  • High-efficiency multimodal model for text, image, video, audio, and PDF understanding
  • Optimized for low-latency, high-volume production workloads
  • Supports a 1M-token context window for long prompts, document analysis, and multi-turn workflows
  • Up to 64K output tokens for extended responses and structured generation
  • Thinking levels from minimal to high for cost, latency, and quality trade-offs
  • Priced at half the cost of Gemini 3 Flash
  • Strong fit for lightweight agents, simple extraction tasks, summarization, classification, and responsive app experiences

Key Features

  • Context Window: 1,048,576 tokens
  • Max Input: 983,040 tokens
  • Max Output: 65,536 tokens
  • Input: Text, Image, Video, Audio, PDF
  • Output: Text
  • Vision: Supported
  • Audio Input: Supported
  • Function Calling: Supported
  • Structured Outputs: Supported
  • Thinking Levels: minimal, low, medium, high

Specifications

SpecificationValue
Providergoogle
Model TypeChat Completions model
Architecturetext+image+file+audio+video->text
Context Window1,048,576 tokens
Max Input983,040 tokens
Max Output65,536 tokens
InputText, Image, Video, Audio, PDF
OutputText
VisionSupported
Function CallingSupported
Structured OutputsSupported
Audio InputSupported
Thinking Levelsminimal, low, medium, high

Pricing

Token TypeCost
Input$0.25 per million tokens
Output$1.50 per million tokens
Cached Input$0.025 per million tokens
Cache Write$0.083333 per million tokens
Reasoning Output$1.50 per million tokens

How to Use

API Integration

Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: google/gemini-3.1-flash-lite

Info

Penyediagoogle
Tipellm

Fitur yang Didukung

Input
TeksGambarAudio
Output
Teks
Konteks1,048,576
Output Maks65,536
Vision✓ Didukung
Function Calling✓ Didukung

Panduan Akses API

Base URLhttps://llm.wavespeed.ai/v1
API Endpointchat/completions
Model IDgoogle/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite API

google/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

Input

$0.25 /M

Output

$1.5 /M

Konteks

1049K

Output Maks.

66K

Vision

Didukung

Penggunaan Tool

Didukung

Coba Gemini 3.1 Flash Lite di WaveSpeedAI

Akses Gemini 3.1 Flash Lite melalui API terpadu kami — kompatibel dengan OpenAI, tanpa cold start, harga transparan.

Pertanyaan Umum tentang Gemini 3.1 Flash Lite

Berapa biaya Gemini 3.1 Flash Lite melalui API?+

Harga di WaveSpeedAI: $0.25 per juta token input dan $1.50 per juta token output. Prompt caching dan batch processing ditagih terpisah dan mengurangi biaya efektif pada beban kerja yang panjang dan berulang.

Berapa context window Gemini 3.1 Flash Lite?+

Gemini 3.1 Flash Lite mendukung hingga 1049K token konteks dengan hingga 66K token output per permintaan.

Apakah Gemini 3.1 Flash Lite kompatibel dengan OpenAI?+

Ya. WaveSpeedAI menyediakan Gemini 3.1 Flash Lite melalui endpoint yang kompatibel dengan OpenAI di https://llm.wavespeed.ai/v1. Arahkan OpenAI SDK resmi ke base URL ini dengan API key WaveSpeedAI Anda — tanpa perubahan kode lainnya.

Bagaimana memulai dengan Gemini 3.1 Flash Lite?+

Masuk ke WaveSpeedAI, buat API key di Access Keys, lalu kirim permintaan ke https://llm.wavespeed.ai/v1/chat/completions dengan model id seperti ditampilkan di atas. Akun baru menerima kredit gratis untuk menguji Gemini 3.1 Flash Lite.

API LLM terkait