Google’s model lineup now includes Video Extend - Faster, Cheaper Pricing- WaveSpeedAI

Google Cloud's Vertex AI platform offers a comprehensive suite of state-of-the-art AI models for image, video, audio, and speech generation. These models represent the cutting edge of generative AI technology, combining high performance with enterprise-grade reliability.

🎵 Lyric Series — AI Music Generation

Google's Lyric models bring high-fidelity AI music composition to the platform, enabling prompt-driven audio creation with professional production quality.

Lyric 3 Ultra — Flagship music generation model delivering studio-grade audio with rich instrumentation, dynamic range, and expressive composition control. Lyric 3 Pro — Balanced music generation with strong quality and faster output, ideal for background scoring, social content, and rapid creative iteration.

🗣️ Text-to-Speech

Gemini 2.5 Pro / Text-to-Speech — Natural, expressive voice synthesis powered by Gemini 2.5 Pro, supporting multilingual output with human-like prosody and intonation control.

🧩 Veo 3.1 — Video Extend (Continue an existing Veo video)

Google's Video Extend lets you extend a previously Veo-generated video into a longer, continuous clip—preserving motion style, framing, lighting, and synchronized audio for seamless story continuation.

Veo 3.1 Video Extend — Extend an existing Veo video with cinematic continuity (scene, motion, and audio) for "what happens next" storytelling. Veo 3.1 Fast Video Extend — High-speed, cost-efficient extend workflow for rapid iteration, previews, and multi-branch continuations.

💡 Both endpoints require a Veo-generated input video and return a single merged result containing the original clip plus the extension.

🎬 Veo Series — Text & Image to Video

Google's Veo family brings cinematic storytelling to AI generation, combining realistic motion, synchronized audio, and true-to-life lighting.

Veo 3.1 — Generates cinematic motion with native dialogue, spatial sound, and realistic scene continuity.

Veo 3.1 Fast — 30% faster and 62.5% cheaper than the base model, while preserving high visual fidelity.

Veo 3.1 I2V — Turns a still image into smooth, lifelike motion with natural ambient audio.

Veo 3.1 Fast I2V — High-performance version for rapid testing, previews, and content iteration.

Veo 3.1 R2V — Transforms a single reference video into a new, high-fidelity scene while preserving motion style, framing, and cinematic tone.

Veo 3.1 Lite — Lightweight start-and-end frame guided generation; define the opening and closing shots and let Veo fill in the cinematic middle.

Veo 3 — Flagship text-to-video model from DeepMind, supporting native dialogue, ambient sound, and realistic motion.

Veo 3 Fast — 30% faster and 62.5% cheaper; optimized for short-form and social content.

Veo 3 I2V — Converts still images into smooth, lifelike motion with synchronized audio.

Veo 3 Fast I2V — High-speed, cost-efficient version for rapid iteration.

Veo 2 I2V — Image-to-video generation with nostalgic or stylized motion characteristics.

Veo 2 Fast — Streamlined text-to-video generation optimized for speed and cost-efficient short-form output.

💡 All Veo models include synchronized audio (speech, ambiance, and music) and support up to 1080p output.

🖼️ Imagen Series — Text & Image Generation

The Imagen series excels in realism, lighting control, and precise text rendering, making it ideal for photography, design, and illustration.

Imagen 4 Ultra — Premium 2K photorealistic generation with advanced lighting and texture fidelity.

Imagen 4 Fast — Streamlined version offering strong quality with faster, lower-cost output.

Imagen 4 — Standard high-fidelity generation with excellent text handling and composition accuracy.

Imagen 3 Fast — Lightweight, fast model ideal for lifestyle or blog-style imagery.

Imagen 3 — Balanced base model for portraits, scenery, and artistic concept generation.

🪄 Nano-Banana & Gemini — Lightweight Creative Tools

For quick everyday creation, Google's lightweight models deliver expressive results with speed and efficiency.

Nano-Banana-2 / Text-to-Image — Generate high-fidelity 4K images with Pro-level quality at Flash-tier speed.

Nano-Banana-2 / Text-to-Image Fast — Ultra-fast variant of Nano-Banana-2 for rapid prototyping and high-volume generation at reduced cost.

Nano-Banana-2 / Edit — Transform images with context-aware editing, camera controls, and multilingual text rendering.

Nano-Banana / Text-to-Image — Create quick, expressive visuals from text prompts. Nano-Banana / Edit — Modify or enhance existing images with natural language instructions. Gemini 2.5 Flash Text-to-Image — Generate soft, detailed visuals through Google's Gemini integration.

Gemini 2.5 Flash Edit — Smart, context-aware photo editing with lighting consistency.

Nano-Banana Pro / Text to Image — Produce sharper, higher-fidelity images with improved prompt control for production use.

Nano-Banana Pro / Edit — Apply precise, region-aware edits that preserve identity, lighting, and overall composition.

Nano-Banana Pro / Ultra — Generate ultra-detailed, high-resolution visuals for hero shots, key art, and premium campaigns.

Nano-Banana Pro / Multi — Combine multiple reference images or styles to build complex, consistent characters and scenes.

📝 Notes

Please ensure your prompts comply with Google's Safety Guidelines. If an error occurs, review your prompt for restricted content, adjust it, and try again.

Google Models