Skip to content

Commit f92e49b

Browse files
committed
update inspiremusic
1 parent fede711 commit f92e49b

1 file changed

Lines changed: 4 additions & 2 deletions

File tree

inspiremusic/index.html

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,10 @@ <h2>InspireMusic: A Unified Framework for Controlled High-Fidelity Long-Form Mus
4343
<p><b>Alibaba Group</b></p>
4444
</div>
4545
<p><b>Abstract</b>
46-
47-
Recent advances in generative modeling have transformed the landscape of music and audio generation. In this work, we introduce <b>InspireMusic</b>, a unified framework designed to generate high-fidelity music, songs, and audio, which integrates an autoregressive transformer with a super-resolution flow-matching model. This framework enables the direct generation of high-fidelity long-form audio at 48kHz from both text and audio modalities. Unlike prior systems that focus solely on symbolic or raw audio generation, our approach employs dual audio tokenizers to capture both the global musical structure and the fine-grained acoustic details, allowing for high quality audio generation with long-form coherence. This framework represents a significant advancement in music generation by directly modeling raw audio, ensuring both diversity and high-fidelity output.</p>
46+
We introduce <b>InspireMusic</b>, a unified framework designed to generate high-fidelity music, songs, and audio, which integrates an autoregressive transformer with a super-resolution flow-matching model.
47+
This framework enables the direct generation of high-fidelity long-form audio at 48kHz from both text and audio modalities. Our model differs from previous approaches, we utilize dual audio tokenizers: a high-bitrate compression audio tokenizer contains richer semantic information,
48+
thereby reducing training costs and enhancing efficiency, and an acoustic codec that preserves fine-grained acoustic details during flow-matching model training. This combination enables us to achieve high-quality audio generation with long-form coherence.
49+
</p>
4850
</p>
4951

5052
<p><b>Highlights</b>

0 commit comments

Comments
 (0)