本周发布了两项重要的人工智能音乐更新,但都不是来自 Suno。
ElevenLabs, the Polish-founded voice AI company sitting at an $11 billion valuation after a $500 million Series D in February, launched Music v2. Stability AI—the Stable Diffusion people—dropped Stable Audio 3.0, a four-model family with open weights and tracks that run past six minutes.
The backdrop is the Recording Industry Association of America copyright suits from 2024 against Suno and Udio, which made "trained on licensed data" the most important phrase in any AI music announcement. Both ElevenLabs and Stability are leaning on that hard, making sure you won’t have issues with the outputs you generate.
Music v2: One track, opera to heavy metal, no breakdown
Music v2 is ElevenLabs' second music model, arriving roughly 10 months after the first.核心音高是压力下的连贯性。 According to Elevenlabs, a single track can shift from opera to heavy metal and back, hold together through fast rap, and embed non-musical sound effects—all without the composition coming apart.
Generative audio tends to fall apart exactly when prompts get complicated, so this is the thing worth watching, especially in longer compositions.
Inpainting is now actually useful: select a section, regenerate it, leave everything else untouched.用户还可以逐节构建歌曲(前奏、主歌、副歌),模型保持整个过程的连续性,而不是将每个剪辑视为独立的生成。 Multilingual support has improved too, though ElevenLabs didn't publish specifics.
该模型支持三个平台:针对创作者的 ElevenMusic、针对开发者的 ElevenAPI 以及针对品牌的 ElevenCreative。现已在 ElevenMusic 和 ElevenCreative 上直播; API access is early-entry via the sales team.
ElevenLabs also cut Music v1 and v2 pricing by up to 50% for ElevenAPI and up to 40% for ElevenCreative self-serve. The company hit $500 million in annual recurring revenue in April 2026. Music is still a small slice of that—but ElevenMusic, which launched as a consumer app in April, is a direct shot at Suno's user base.
稳定音频 3.0:设备上的开放权重,实际上更长
Stable Audio 2.0 topped out at three minutes and was already behind Suno when it launched in 2024. Stable Audio 3.0 ships four models: Small SFX (on-device sound effects), Small (full music composition on-device), Medium (up to 6:20, stronger hardware), and Large (API-only). Three of the four have open weights on Hugging Face.
The Small models run at 459 million parameters each—no GPU needed. (参数本质上是衡量 AI 模型能力的指标。)Medium 拥有 14 亿个参数,并在 H200 GPU 上约 1.31 秒内生成 6:20 的输出。 Large 为 27 亿,仅适用于收入超过 100 万美元的组织。每秒生成粒度意味着您可以准确获得所需的轨道长度,而不是近似值。
It’s also supported in ComfyUI for local setups
该架构是新的:语义声学自动编码器稳定性调用 SAME,旨在保持较长输出的旋律连贯性。支持 LoRA 微调,因此艺术家可以根据自己的目录调整模型。修复也是如此——单段、多段和因果延续,以将轨道延伸到其原始端点之外。
就上下文而言,LoRA(低秩适应模型)就像一个微型模型,它决定了完整模型如何生成其输出。如果您在布鲁斯上训练 LoRA,该模型将产生布鲁斯,如果您在 BB King 布鲁斯上训练 LoRA,该模型将产生听起来像 BB King 的歌曲。修复意味着模型可以修复其创建过程中的小错误。 So, for example, if the model hallucinates something at the 2:30 mark, you can select a few seconds of the song, ask the model to change it into whatever you want, and the model will generate a piece of the song that fits perfectly in that timeframe and blends with the actual song as a whole.
多年来,AI 音乐的稳定性在技术上是可信的,但在商业上却没有突破。开放权重播放是应用于音频的稳定扩散策略 - 为开发者社区提供种子,看看会构建什么。 The licensing is cleaner than anything Stable Audio has shipped before, with partnerships in place with Universal Music Group and Warner Music Group.
The target: Suno, the AI music king
如果说ChatGPT是AI文字之王,那么Suno就是AI音乐之王。 The company behind the model hit a $2.45 billion valuation in November 2025, crossed $300 million in annual recurring revenue, and has been used by roughly 100 million people.
It generates around 7 million songs per day.华纳音乐于 2025 年 11 月与 Suno 达成和解; Sony and UMG are still in federal court.
为了避免这些版权战争,ElevenLabs 与 Believe、Kobalt 和 Merlin 签订了许可协议。稳定的有华纳和环球。 Udio 与所有三个专业达成了协议,现在是一个有围墙的花园 - 您生成的任何内容都不能离开该平台。
稳定音频 3.0 小号和中号现已在 Hugging Face 上推出。 Large 通过 Stability AI API 上线。 Music v2 对 ElevenMusic 用户免费,并通过 ElevenCreative 和 ElevenAPI 提供商业层。
