text-to-audio
Idle
Your request will cost $0.0002 per run.
For $1 you can run this model approximately 5000 times.
ACE-Step Text-to-Audio is a next-generation AI music generation model that composes complete songs β including vocals, instrumentals, and lyrics β directly from text descriptions. It enables creators to produce professional-quality music up to 4 minutes long, from a simple prompt and a few style tags.
πΆ Text-to-Music Generation Transform plain descriptions into coherent music tracks with melody, rhythm, and lyrics. Example: βA soulful R&B song with emotional vocals and smooth piano chords.β
π Style Tag Control Enter multiple tags such as lofi, hiphop, drum and bass, trap, chill to guide genre, tempo, and energy.
π€ Vocal & Lyric Creation Generates original vocals and synchronized lyrics that fit your promptβs tone and rhythm.
πͺ Voice Cloning & Remixing (Advanced) Optionally replicate vocal tone or remix existing musical ideas using the same control interface.
π§ Fine-Grained Acoustic Fidelity Maintains dynamic balance, spatial quality, and instrument clarity for professional-grade sound.
π Flexible Duration Adjustable from a few seconds to 4 minutes (240 seconds) β ideal for everything from jingles to full songs.
Parameter | Description |
---|---|
tags* | List of genres or styles (e.g., lofi, hiphop, drum and bass, chill) |
lyrics | (Optional) Provide custom lyrics or leave blank for auto-generated ones |
duration | Music length in seconds (up to 240) |
seed | Fix for reproducibility or randomize for new variations |
Metric | Price |
---|---|
Per second of generated audio | $0.0002 / s |
ACE-Step Text-to-Audio empowers musicians, content creators, and storytellers to compose songs from words alone β blending lyrical intelligence, genre control, and acoustic quality into one seamless creative tool.