WaveSpeed.ai
Accueil/Explorer/Speech Generation/wavespeed-ai/ace-step
text-to-audio

text-to-audio

ACE-Step

wavespeed-ai/ace-step

ACE-Step generates up to 4-minute music with lyrics from text and high acoustic fidelity; supports voice cloning, lyric edits, and remixing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Idle

Votre requête coûtera $0.0002 par exécution.

Pour $1 vous pouvez exécuter ce modèle environ 5000 fois.

ExemplesTout voir

README

ACE-Step — Text-to-Audio

ACE-Step Text-to-Audio is a next-generation AI music generation model that composes complete songs — including vocals, instrumentals, and lyrics — directly from text descriptions. Produce professional-quality music up to 4 minutes long from simple style tags and optional lyrics.

Why It Stands Out

  • Text-to-music generation: Transform style tags into coherent music tracks with melody, rhythm, and vocals.
  • Style tag control: Enter multiple tags to guide genre, tempo, and energy.
  • Vocal and lyric creation: Generates original vocals and synchronized lyrics that fit your prompt's tone.
  • Fine-grained acoustic fidelity: Maintains dynamic balance, spatial quality, and instrument clarity.
  • Flexible duration: Adjustable from a few seconds to 4 minutes (240 seconds).
  • Reproducibility: Use the seed parameter to recreate exact results.

Parameters

ParameterRequiredDescription
tagsYesList of genres or styles (e.g., lofi, hiphop, drum and bass, chill)
lyricsNoProvide custom lyrics or leave blank for auto-generated ones.
durationNoMusic length in seconds, up to 240 (default: 240).
seedNoSet for reproducibility; -1 for random.

How to Use

  1. Enter style tags — add genres and moods like "lofi, hiphop, chill, trap."
  2. Add lyrics (optional) — provide custom lyrics or leave blank for AI-generated ones.
  3. Set duration — choose length from a few seconds up to 240 seconds (4 minutes).
  4. Set a seed (optional) for reproducible results.
  5. Click Run and wait for your music to generate.
  6. Preview and download the result.

Best Use Cases

  • Music Production & Songwriting — Generate complete demos or backing tracks instantly.
  • Film, Game & Media Scoring — Create mood-specific tracks with precise control.
  • Advertising & Content Creation — Design catchy audio for short-form content.
  • Education & Experimentation — Teach structure, genre, or lyric composition.
  • Soundtrack Prototyping — Preview musical direction before full studio production.

Pricing

DurationPrice
30 seconds$0.006
60 seconds$0.012
120 seconds$0.024
240 seconds$0.048

Billing Rules

  • Billed per second at $0.0002
  • Maximum duration: 240 seconds (4 minutes)

Pro Tips for Best Quality

  • Use multiple style tags to define genre, mood, and energy level.
  • Combine contrasting tags (e.g., "chill, trap") for unique blends.
  • Provide structured lyrics with line breaks for better vocal synchronization.
  • Start with shorter durations to test style combinations.
  • Fix the seed when iterating to compare different tag or lyric variations.

Notes

  • Processing time varies based on duration and current queue load.
  • Please ensure your content complies with usage guidelines.