Microsoft Open-Sources Cutting-Edge Voice AI Family VibeVoice: Processes 90-Minute Multi-Speaker Conversations in One Go, Rapidly Gains 27K Stars on GitHub

marsbitPublished on 2026-03-30Last updated on 2026-03-30

Abstract

Microsoft has open-sourced VibeVoice, a cutting-edge family of speech AI models for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). The project, gaining 27K stars on GitHub, offers powerful long-audio processing, multi-speaker dialogue generation, and real-time capabilities under an MIT license for local deployment. Key models include: - **VibeVoice-ASR-7B**: Processes up to 60 minutes of audio, outputs structured transcriptions with speaker identification, timestamps, and supports over 50 languages. - **VibeVoice-TTS-1.5B**: Generates expressive, 90-minute multi-speaker (up to 4 voices) conversations with natural flow and emotional nuance. - **VibeVoice-Realtime-0.5B**: Enables real-time TTS with ~300ms latency for interactive applications like voice assistants. The framework addresses limitations in long-sequence processing, speaker consistency, and naturalness. It includes safety features like audio watermarking and has sparked community-developed tools (e.g., a voice input method). Available on GitHub and Hugging Face, VibeVoice aims to advance innovation in content creation, accessibility, and voice interaction.

Microsoft recently open-sourced a cutting-edge voice AI model family named VibeVoice, which encompasses capabilities such as automatic speech recognition (ASR) and text-to-speech (TTS). The project has quickly garnered attention in the developer community due to its powerful long-audio processing, multi-speaker natural conversation generation, and real-time low-latency features. It has already gained approximately 27K Stars on GitHub.

As an open-source research framework, VibeVoice uses the MIT license, supports local deployment, requires no cloud subscription fees, and aims to promote collaboration and innovation in the field of speech synthesis. The model family mainly includes three core members, each with its own focus, collectively addressing the pain points of traditional voice AI in long-sequence processing, speaker consistency, and natural fluency.

VibeVoice-ASR-7B: A Structured Speech-to-Text Tool for Up to 60 Minutes

VibeVoice-ASR-7B is a unified speech-to-text model capable of processing audio files up to 60 minutes long in one go, directly outputting structured transcription results. The output includes not only "who is speaking" (speaker identification) and "when they speak" (precise timestamps), but also "what was said" (detailed content), and supports custom hotwords to effectively improve the recognition accuracy of proper nouns or technical terms. The model supports over 50 languages and is suitable for complex scenarios like long meeting recordings and podcast transcriptions.

Community developers have already built practical tools based on this model, such as a voice input method named Vibing, which supports macOS and Windows platforms. User feedback indicates that its recognition speed and accuracy perform well, significantly improving daily voice input efficiency.

VibeVoice-TTS-1.5B: Expressive Speech Generation for 90-Minute Multi-Speaker Content

VibeVoice-TTS-1.5B is a core model focused on text-to-speech, capable of producing continuous audio up to 90 minutes long in a single generation, supporting natural dialogue simulation with up to 4 different speakers. The generated speech is expressive, sounds natural and fluent, and can simulate realistic pauses, emphasis, and emotional transitions, making it very suitable for producing podcasts, long-form audio narratives, audiobooks, or multi-character dialogue content.

Compared to many traditional TTS models that only support 1-2 speakers, VibeVoice-TTS has achieved significant breakthroughs in long-form, multi-speaker consistency. Its underlying architecture uses continuous speech tokenizers (acoustic and semantic tokenizers) combined with a low frame rate design (7.5Hz), greatly improving computational efficiency for long-sequence handling.

VibeVoice-Realtime-0.5B: Real-Time TTS with ~300ms Latency

VibeVoice-Realtime-0.5B focuses on real-time scenarios, supporting streaming text input with an initial audio output latency of approximately 300 milliseconds, while also being able to generate long-form speech of about 10 minutes. This model is particularly suitable for interactive applications requiring immediate responses, such as real-time voice assistants or live broadcast dubbing scenarios.

Additionally, the project introduces experimental speaker support, including multilingual voices and various English style variants, providing developers with more customization options.

AIbase Review: Microsoft's open-sourcing of VibeVoice not only lowers the barrier to using high-performance voice AI but also provides a complete solution for local deployment. The project was briefly taken down due to potential misuse risks but was later re-released with safety mechanisms such as embedded watermarks and audible disclaimers, reflecting the principles of responsible AI development. Currently, developers can obtain model weights on the GitHub repository and Hugging Face, and quickly try them out on platforms like Colab.

With continued contributions from the open-source community (such as optimized forks for Apple Silicon), VibeVoice is expected to accelerate adoption in fields like content creation, accessibility tools, and voice interaction. Interested developers can visit the official Microsoft project page to explore further.

Project address: https://github.com/microsoft/VibeVoice

Trending Cryptos

CitreaCTR

wrapped stUSDTWSTUSDT

Hopes for a Bull Market-Friendly Crypto Law, the Clarity Act, Are Fading: It's Not Even on the Agenda!

Uncertainty persists regarding the CLARITY Act, a major legislative proposal for regulating the US cryptocurrency sector. Hopes for its passage before the Senate's August recess are fading, as the bill was notably absent from the Senate's published schedule on August 3rd. This significantly reduces the chance it will be brought to a vote before the Senate's break begins on August 10th. The bill requires 60 votes to pass the Senate, meaning it needs the support of at least seven Democratic senators even with full Republican backing. Some Democrats argue the current draft lacks sufficient consumer protections and ethical safeguards, calling for further negotiations. Experts warn that if the bill is not approved in August, its passage could be delayed until September. However, with the November Congressional elections approaching, some analysts predict lawmakers may be too focused on campaigning to advance any major legislation, potentially postponing the CLARITY Act's adoption until 2027. While not formally off the agenda, the bill's fate hinges on last-minute procedural maneuvers and securing crucial Democratic support.

cryptonews.ru17m ago

Hopes for a Bull Market-Friendly Crypto Law, the Clarity Act, Are Fading: It's Not Even on the Agenda!

cryptonews.ru17m ago

Dan Koe: The Counterintuitive Truth—You Don't Need to Remember Everything You Read

Dan Koe: The Counterintuitive Truth — You Don't Need to Remember Everything You Read The central idea is that deliberately trying to remember information is often misguided. True learning isn't about memorizing facts but about having important knowledge surface naturally when needed through use. Most forgetting is normal, not a failure. The article reframes learning using a control theory framework—a four-step feedback loop: having a clear Goal, accurately Sensing your current state, Comparing the gap, and Acting to close it. Most learning stalls because people only do step 2 (blind input) without a goal to create the necessary "error signal" for focused action. The most effective method is to start with output, not input. Begin a meaningful personal project first, and learn only what's necessary to complete it. This project-driven, "just-in-time" learning ensures knowledge is contextual and retained. The concept of a "Second Brain" often fails because it becomes a digital graveyard—over-collected and under-utilized. The goal should be building a "Second Subconscious"—a dynamic system that proactively surfaces relevant ideas during creation, not a static storage vault. Tools like Obsidian+Claude or Eden can help by automating organization and enabling semantic search, but their value depends on linking knowledge to active projects. Ultimately, what matters is not what you store, but what you filter and internalize. Focus on ideas that shape your worldview, use projects as filters, and transform collected material through writing and sharing. AI should be used to reduce friction in research and editing, not to formulate your core views. In conclusion, remembering is a byproduct, not the goal. Knowledge that sticks comes from pursuing personal goals, applying it in real projects, and digesting it through creation. The tools are merely aids; the crucial step is to start doing meaningful work and let the necessary knowledge find you.

marsbit1h ago

A New Era: The Fundamental Transformation of China's Entrepreneurs

A profound generational shift is underway among Chinese entrepreneurs. The wealth and influence once dominated by real estate and internet giants is now being claimed by a new wave of founders driving breakthroughs in AI, semiconductors, and robotics. This change is vividly reflected in 2026's wealth rankings. Figures like Zhang Yiming (ByteDance), Liang Wenfeng (DeepSeek), Chen Tianshi (Cambricon), and Wang Xingxing (Unitree Robotics) are ascending. Their wealth stems not from traditional business models but from market expectations for future technological competitiveness, with AI, chips, and smart hardware becoming the primary engines of wealth creation. Their common trait is a foundational focus on technology, often starting from the laboratory rather than a business plan. Examples include Chen Tianshi's decade-long push in AI chips, Liang Wenfeng's core algorithmic innovations at DeepSeek with a compact team, and Zhu Yiming's "no salary until profitable" 9-year journey to build Changxin Memory into a global DRAM player. This transition marks a fundamental shift in China's economic imperative: from commercial expansion and learning to indigenous innovation and deep industrial capability. While the previous generation built the foundational market and infrastructure, this new cohort is tasked with achieving global leadership in core technologies, moving China from "keeping pace" to pioneering original, breakthrough innovations that are industrialized at scale. The baton is being passed to those competing on the world stage through technological originality.

marsbit1h ago

A New Era: The Fundamental Transformation of China's Entrepreneurs

marsbit1h ago

Japanese Financial Giant Holding $42 Billion in XRP Explains the Reason for XRP's Price Drop!

A Japanese financial giant, SBI Holdings, which holds $42.2 billion worth of XRP, has attributed the recent price decline of XRP to regulatory uncertainty in the United States. CEO Yoshitaka Kitao stated that investors are cautiously awaiting the outcome of the U.S. Clarity Act, a pivotal cryptocurrency regulation bill currently pending Senate approval. This hesitation, he explained, has exerted downward pressure on the broader crypto market and specifically on XRP's price. Despite the current market weakness, SBI reaffirmed its long-term confidence in XRP, emphasizing its continued status as a strategic asset for the company. The firm's substantial holdings underscore its commitment to the digital asset despite short-term regulatory headwinds.

cryptonews.ru1h ago

Japanese Financial Giant Holding $42 Billion in XRP Explains the Reason for XRP's Price Drop!

cryptonews.ru1h ago

How Fake World Assets and onchain gacha became crypto’s latest craze

The crypto space is buzzing with a new trend called Fake World Assets (FWAs), a gamified onchain "gacha" system. Users pay to spin for a random NFT backed by Ether, with prizes from collections like CryptoPunks and Azuki. Shortly after launch, FWA briefly became Ethereum's top gas consumer, generating $1.53 million in daily fees at its peak. While its Total Value Locked has climbed over $6 million, critics like investor Simon Dedic argue the frenzy is driven more by token incentives than genuine demand, labeling it gambling for "crypto degens." Proponents see it as an innovative fusion of collectibles, speculation, and play. The protocol has two sides: NFT holders provide liquidity to earn fees, while players chase valuable prizes. Analysts compare the appeal to a lottery, offering hope and entertainment. The key question is whether onchain gacha represents a lasting retail use case for crypto or just another short-lived speculative craze. Its longevity will be tested as initial hype and rewards diminish.

cointelegraph1h ago

How Fake World Assets and onchain gacha became crypto’s latest craze

cointelegraph1h ago

Trading

Spot

Hot Articles

What Is Superchain? Understanding How Superchain Governs and Works in One Article

OP Chain has become a catchy term recently. What is an OP Chain? And what is Superchain? How do Superchain and OP Chains relate? How does Superchain operate and manage?

3.6k Total ViewsPublished 2023.08.13Updated 2024.02.18

What Is Superchain? Understanding How Superchain Governs and Works in One Article

How to Buy ONE

Welcome to HTX.com! We've made purchasing Harmony (ONE) simple and convenient. Follow our step-by-step guide to embark on your crypto journey.Step 1: Create Your HTX AccountUse your email or phone number to sign up for a free account on HTX. Experience a hassle-free registration journey and unlock all features.Get My AccountStep 2: Go to Buy Crypto and Choose Your Payment MethodCredit/Debit Card: Use your Visa or Mastercard to buy Harmony (ONE) instantly.Balance: Use funds from your HTX account balance to trade seamlessly.Third Parties: We've added popular payment methods such as Google Pay and Apple Pay to enhance convenience.P2P: Trade directly with other users on HTX.Over-the-Counter (OTC): We offer tailor-made services and competitive exchange rates for traders.Step 3: Store Your Harmony (ONE)After purchasing your Harmony (ONE), store it in your HTX account. Alternatively, you can send it elsewhere via blockchain transfer or use it to trade other cryptocurrencies.Step 4: Trade Harmony (ONE)Easily trade Harmony (ONE) on HTX's spot market. Simply access your account, select your trading pair, execute your trades, and monitor in real-time. We offer a user-friendly experience for both beginners and seasoned traders.

4.7k Total ViewsPublished 2024.03.29Updated 2026.06.02

Understanding Bitcoin Halving in One Article

In this article, we'll delve into key concepts related to Bitcoin halving.

18.9k Total ViewsPublished 2024.04.16Updated 2024.04.16

Understanding Bitcoin Halving in One Article

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of ONE (ONE) are presented below.

Microsoft Open-Sources Cutting-Edge Voice AI Family VibeVoice: Processes 90-Minute Multi-Speaker Conversations in One Go, Rapidly Gains 27K Stars on GitHub

Abstract

VibeVoice-ASR-7B: A Structured Speech-to-Text Tool for Up to 60 Minutes

VibeVoice-TTS-1.5B: Expressive Speech Generation for 90-Minute Multi-Speaker Content

VibeVoice-Realtime-0.5B: Real-Time TTS with ~300ms Latency

Trending Cryptos

Related Questions

Related Reads

Hopes for a Bull Market-Friendly Crypto Law, the Clarity Act, Are Fading: It's Not Even on the Agenda!

Dan Koe: The Counterintuitive Truth—You Don't Need to Remember Everything You Read

A New Era: The Fundamental Transformation of China's Entrepreneurs

Japanese Financial Giant Holding $42 Billion in XRP Explains the Reason for XRP's Price Drop!

How Fake World Assets and onchain gacha became crypto’s latest craze

Trading

Hot Articles

What Is Superchain? Understanding How Superchain Governs and Works in One Article

How to Buy ONE

Understanding Bitcoin Halving in One Article

Discussions

Top Questions