Gemini 3.1 Flash Lite | Google Efficient LLM API

Name: Gemini 3.1 Flash Lite API
Brand: google
Price: 0.25 USD
Availability: InStock

google/gemini-3.1-flash-lite

출시일: 2026-05-07

1,048,576 context · $0.25/M input tokens · $1.50/M output tokens

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, with a 1M-token context window and up to 64K output tokens. The model is designed for lightweight agentic workflows, simple data extraction, classification, summarization, document understanding, and responsive applications where API cost and speed are primary constraints. It supports thinking levels from minimal to high for fine-grained cost/performance control and is priced at half the cost of Gemini 3 Flash.

가격

사용량 기반 과금

선결제 없이 사용한 만큼만 지불

입력$0.25 / M Tokens

출력$1.50 / M Tokens

Cache Read$0.03 / M Tokens

Cache Write$0.08 / M Tokens

모델 사용해 보기

google/gemini-3.1-flash-lite

온라인

안녕하세요! 도움이 되는 AI 어시스턴트입니다. 무엇을 도와드릴까요?

이 모델을 로컬 coding agent에서 사용할 준비가 되었나요?Agent 설정

API 사용법

다음 코드 예시를 사용해 API와 연동하세요:

import OpenAI from 'openai';

if (!process.env.WAVESPEED_API_KEY) throw new Error('Set WAVESPEED_API_KEY');
const client = new OpenAI({
  apiKey: process.env.WAVESPEED_API_KEY,
  baseURL: 'https://llm.wavespeed.ai/v1',
  timeout: 120_000,
  maxRetries: 2,
});

try {
  const response = await client.chat.completions.create({
    model: 'google/gemini-3.1-flash-lite',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log(response.choices[0]?.message?.content ?? '');
} catch (error) {
  console.error('LLM request failed:', error);
  process.exitCode = 1;
}

import OpenAI from 'openai';

if (!process.env.WAVESPEED_API_KEY) throw new Error('Set WAVESPEED_API_KEY');
const client = new OpenAI({
  apiKey: process.env.WAVESPEED_API_KEY,
  baseURL: 'https://llm.wavespeed.ai/v1',
  timeout: 120_000,
  maxRetries: 2,
});

try {
  const response = await client.chat.completions.create({
    model: 'google/gemini-3.1-flash-lite',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
  console.log(response.choices[0]?.message?.content ?? '');
} catch (error) {
  console.error('LLM request failed:', error);
  process.exitCode = 1;
}

모델 소개

Google: Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic workflows, simple data extraction, classification, summarization, and applications where responsiveness and API cost are the primary constraints.

Why It Looks Great

High-efficiency multimodal model for text, image, video, audio, and PDF understanding
Optimized for low-latency, high-volume production workloads
Supports a 1M-token context window for long prompts, document analysis, and multi-turn workflows
Up to 64K output tokens for extended responses and structured generation
Thinking levels from minimal to high for cost, latency, and quality trade-offs
Priced at half the cost of Gemini 3 Flash
Strong fit for lightweight agents, simple extraction tasks, summarization, classification, and responsive app experiences

Key Features

Context Window: 1,048,576 tokens
Max Input: 983,040 tokens
Max Output: 65,536 tokens
Input: Text, Image, Video, Audio, PDF
Output: Text
Vision: Supported
Audio Input: Supported
Function Calling: Supported
Structured Outputs: Supported
Thinking Levels: minimal, low, medium, high

Specifications

Specification	Value
Provider	google
Model Type	Chat Completions model
Architecture	text+image+file+audio+video->text
Context Window	1,048,576 tokens
Max Input	983,040 tokens
Max Output	65,536 tokens
Input	Text, Image, Video, Audio, PDF
Output	Text
Vision	Supported
Function Calling	Supported
Structured Outputs	Supported
Audio Input	Supported
Thinking Levels	minimal, low, medium, high

Pricing

Token Type	Cost
Input	$0.25 per million tokens
Output	$1.50 per million tokens
Cached Input	$0.025 per million tokens
Cache Write	$0.083333 per million tokens
Reasoning Output	$1.50 per million tokens

How to Use

API Integration

Base URL: https://llm.wavespeed.ai/v1
API Endpoint: chat/completions
Model ID: google/gemini-3.1-flash-lite

정보

제공자google

유형llm

지원 기능

입력

텍스트이미지오디오

출력

텍스트

컨텍스트1,048,576

최대 출력65,536

Vision✓ 지원

Function Calling✓ 지원

API 접근 가이드

Base URLhttps://llm.wavespeed.ai/v1

API 엔드포인트chat/completions

모델 IDgoogle/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite API

google/gemini-3.1-flash-lite

Gemini 3.1 Flash Lite에 대해 자주 묻는 질문

Gemini 3.1 Flash Lite API 비용은 얼마인가요?+

WaveSpeedAI 가격: 입력 토큰 100만 개당 $0.25, 출력 토큰 100만 개당 $1.50. 프롬프트 캐싱과 배치 처리는 별도로 청구되며 긴 반복 작업에서 실질 비용을 줄여 줍니다.

Gemini 3.1 Flash Lite의 컨텍스트 윈도우는 얼마나 되나요?+

Gemini 3.1 Flash Lite은 요청당 최대 1049K 컨텍스트 토큰과 최대 66K 출력 토큰을 지원합니다.

Gemini 3.1 Flash Lite은 OpenAI 호환인가요?+

WaveSpeedAI는 https://llm.wavespeed.ai/v1의 OpenAI 호환 Chat Completions 인터페이스를 통해 Gemini 3.1 Flash Lite을 제공합니다. 대부분의 OpenAI SDK 클라이언트는 base URL과 API 키를 변경해 사용할 수 있으며, 선택 필드는 모델에 따라 다릅니다.

Gemini 3.1 Flash Lite을 어떻게 시작하나요?+

WaveSpeedAI에 로그인하고 Access Keys에서 API 키를 만든 다음, 위에 표시된 모델 ID로 https://llm.wavespeed.ai/v1/chat/completions에 요청을 보내세요. 제공 여부, 기능 및 가격은 최신 모델 카탈로그를 확인하세요.

가격

모델 사용해 보기

API 사용법

모델 소개

Google: Gemini 3.1 Flash Lite

Why It Looks Great

Key Features

Specifications

Pricing

How to Use

API Integration

정보

지원 기능

API 접근 가이드

Gemini 3.1 Flash Lite API

WaveSpeedAI에서 Gemini 3.1 Flash Lite 체험

Gemini 3.1 Flash Lite에 대해 자주 묻는 질문

관련 LLM API