Microsoft Open-Sources Cutting-Edge Voice AI Family VibeVoice: Processes 90-Minute Multi-Speaker Conversations in One Go, Rapidly Gains 27K Stars on GitHub

marsbit2026-03-30 tarihinde yayınlandı2026-03-30 tarihinde güncellendi

Özet

Microsoft has open-sourced VibeVoice, a cutting-edge family of speech AI models for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). The project, gaining 27K stars on GitHub, offers powerful long-audio processing, multi-speaker dialogue generation, and real-time capabilities under an MIT license for local deployment. Key models include: - **VibeVoice-ASR-7B**: Processes up to 60 minutes of audio, outputs structured transcriptions with speaker identification, timestamps, and supports over 50 languages. - **VibeVoice-TTS-1.5B**: Generates expressive, 90-minute multi-speaker (up to 4 voices) conversations with natural flow and emotional nuance. - **VibeVoice-Realtime-0.5B**: Enables real-time TTS with ~300ms latency for interactive applications like voice assistants. The framework addresses limitations in long-sequence processing, speaker consistency, and naturalness. It includes safety features like audio watermarking and has sparked community-developed tools (e.g., a voice input method). Available on GitHub and Hugging Face, VibeVoice aims to advance innovation in content creation, accessibility, and voice interaction.

Microsoft recently open-sourced a cutting-edge voice AI model family named VibeVoice, which encompasses capabilities such as automatic speech recognition (ASR) and text-to-speech (TTS). The project has quickly garnered attention in the developer community due to its powerful long-audio processing, multi-speaker natural conversation generation, and real-time low-latency features. It has already gained approximately 27K Stars on GitHub.

As an open-source research framework, VibeVoice uses the MIT license, supports local deployment, requires no cloud subscription fees, and aims to promote collaboration and innovation in the field of speech synthesis. The model family mainly includes three core members, each with its own focus, collectively addressing the pain points of traditional voice AI in long-sequence processing, speaker consistency, and natural fluency.

VibeVoice-ASR-7B: A Structured Speech-to-Text Tool for Up to 60 Minutes

VibeVoice-ASR-7B is a unified speech-to-text model capable of processing audio files up to 60 minutes long in one go, directly outputting structured transcription results. The output includes not only "who is speaking" (speaker identification) and "when they speak" (precise timestamps), but also "what was said" (detailed content), and supports custom hotwords to effectively improve the recognition accuracy of proper nouns or technical terms. The model supports over 50 languages and is suitable for complex scenarios like long meeting recordings and podcast transcriptions.

Community developers have already built practical tools based on this model, such as a voice input method named Vibing, which supports macOS and Windows platforms. User feedback indicates that its recognition speed and accuracy perform well, significantly improving daily voice input efficiency.

VibeVoice-TTS-1.5B: Expressive Speech Generation for 90-Minute Multi-Speaker Content

VibeVoice-TTS-1.5B is a core model focused on text-to-speech, capable of producing continuous audio up to 90 minutes long in a single generation, supporting natural dialogue simulation with up to 4 different speakers. The generated speech is expressive, sounds natural and fluent, and can simulate realistic pauses, emphasis, and emotional transitions, making it very suitable for producing podcasts, long-form audio narratives, audiobooks, or multi-character dialogue content.

Compared to many traditional TTS models that only support 1-2 speakers, VibeVoice-TTS has achieved significant breakthroughs in long-form, multi-speaker consistency. Its underlying architecture uses continuous speech tokenizers (acoustic and semantic tokenizers) combined with a low frame rate design (7.5Hz), greatly improving computational efficiency for long-sequence handling.

VibeVoice-Realtime-0.5B: Real-Time TTS with ~300ms Latency

VibeVoice-Realtime-0.5B focuses on real-time scenarios, supporting streaming text input with an initial audio output latency of approximately 300 milliseconds, while also being able to generate long-form speech of about 10 minutes. This model is particularly suitable for interactive applications requiring immediate responses, such as real-time voice assistants or live broadcast dubbing scenarios.

Additionally, the project introduces experimental speaker support, including multilingual voices and various English style variants, providing developers with more customization options.

AIbase Review: Microsoft's open-sourcing of VibeVoice not only lowers the barrier to using high-performance voice AI but also provides a complete solution for local deployment. The project was briefly taken down due to potential misuse risks but was later re-released with safety mechanisms such as embedded watermarks and audible disclaimers, reflecting the principles of responsible AI development. Currently, developers can obtain model weights on the GitHub repository and Hugging Face, and quickly try them out on platforms like Colab.

With continued contributions from the open-source community (such as optimized forks for Apple Silicon), VibeVoice is expected to accelerate adoption in fields like content creation, accessibility tools, and voice interaction. Interested developers can visit the official Microsoft project page to explore further.

Project address: https://github.com/microsoft/VibeVoice

Trend Kriptolar

CitreaCTR

wrapped stUSDTWSTUSDT

İlgili Sorular

QWhat is the name of the open-source voice AI model family recently released by Microsoft, and how many stars has it received on GitHub?

AThe open-source voice AI model family is called VibeVoice, and it has received approximately 27,000 stars on GitHub.

QWhat are the three core models in the VibeVoice family and their primary capabilities?

AThe three core models are: 1) VibeVoice-ASR-7B, which handles automatic speech recognition for up to 60 minutes of audio; 2) VibeVoice-TTS-1.5B, which generates expressive speech for up to 90 minutes with multiple speakers; and 3) VibeVoice-Realtime-0.5B, which provides real-time text-to-speech with about 300ms latency.

QWhat is a key feature of the VibeVoice-ASR-7B model regarding its output?

AA key feature is its ability to output structured transcriptions that include speaker identification (who is speaking), precise timestamps (when they speak), and the detailed content (what was said).

QHow does the VibeVoice-TTS-1.5B model achieve efficient long-sequence processing?

AIt uses continuous speech tokenizers (acoustic and semantic tokenizers) combined with a low frame rate design (7.5Hz), which significantly improves computational efficiency for long-sequence processing.

QWhat safety measures were implemented in the VibeVoice project to address potential misuse risks?

AThe project implemented embedded audio watermarks and audible disclaimer mechanisms as safety measures to address potential misuse risks.

İlgili Okumalar

Once-Popular Web3 Enters Wave of Layoffs

The once-hot Web3 industry is experiencing a severe wave of layoffs. While many companies attribute job cuts to AI-driven restructuring, the primary reason is often financial pressure. The Web3 sector, at the intersection of tech and finance, has been hit particularly hard. Employees at major cryptocurrency exchanges report sudden, impersonal layoffs—often with system access revoked overnight—and minimal or no severance. Common tactics include setting impossible performance targets or terminating employees for minor policy violations. The working atmosphere has become toxic, marked by intense monitoring, excessive meetings, and management obsessed with control and internal politics rather than product innovation. The industry's core business model is collapsing. Exchange revenue from trading fees and listing charges has plummeted due to a decline in quality projects and retail investor exodus. Events like the massive forced liquidation on October 10th further shattered confidence. Competition from on-chain derivatives platforms and prediction markets is intensifying the downturn. As layoffs continue, displaced workers struggle to find new opportunities. Many transition to the AI sector, but face significant bias from traditional finance and even some AI firms, which view crypto industry experience with suspicion. The current downturn appears more structural than cyclical, driven by unsustainable practices, internal strife, and a failure to innovate, raising questions about the industry's future trajectory.

marsbit10 dk önce

Once-Popular Web3 Enters Wave of Layoffs

marsbit10 dk önce

Hopes for Bitcoin's Price Rise Strengthen! Analytical Firm Reveals 'Fate Level' for BTC!

Bitcoin, the leading cryptocurrency, started the new week trading around $62,000. Despite lingering uncertainty regarding BTC's behavior amidst new developments in U.S.-Iran relations, analysis from trading platform BIT (formerly Matrixport) indicates that fears of a significant price drop in the options market are beginning to fade. The firm suggests Bitcoin may be forming a higher bottom, with the most intense selling fears likely having passed. BIT emphasized that maintaining a positive outlook for Bitcoin and holding the $62,000 support level is crucial for a recovery. The analysis notes that current market participants hold relatively small positions, which should limit further selling pressure. Furthermore, price movements in the options market this week will be critical for Bitcoin's short-term direction, as a narrowing of the options curve's negative skew could reignite upward momentum. However, the company also cautioned that macroeconomic pressures on the crypto market persist. According to the analysis, geopolitical events and rising U.S. Treasury yields remain the most significant risk factors for Bitcoin and other risk assets. BIT concluded by stating that its assessment is solely market analysis and should not be considered investment advice.

cryptonews.ru11 dk önce

Hopes for Bitcoin's Price Rise Strengthen! Analytical Firm Reveals 'Fate Level' for BTC!

cryptonews.ru11 dk önce

Sales Drop 26% But Prices Rise? Xiaomi's Dilemma

Xiaomi, facing a significant 26.3% year-on-year decline in global smartphone shipments in Q2 2026, has implemented its third price hike of the year. On August 2nd, prices were raised for nine models, including the flagship Mi 17 series (up 400-500 yuan) and Redmi K90/Turbo 5 series (up 300 yuan). This move completes a pattern where cost pressure, originating from surging memory chip prices, has climbed from entry-level to mid-range and now flagship products. The primary driver is a severe supply squeeze on consumer-grade DRAM and NAND flash memory, as major manufacturers like Samsung shift advanced capacity to more profitable HBM for AI applications. According to Xiaomi President Lu Weibing, memory prices for the same configuration have skyrocketed nearly fourfold since Q1 2025, adding roughly 1500 yuan to the cost of a mainstream 12GB+512GB phone. IDC estimates consumer memory costs have risen nearly 300% year-on-year. While the price increases hurt demand and contributed to the sales slump, Xiaomi's strategy of reducing entry-level models and upgrading its product mix also played a role. Domestically, its market share in China fell to 12% (5th place), while leaders Huawei and Apple saw shipments grow over 24%. To mitigate future risks, Xiaomi is accelerating its in-house "Surge" chip development and optimizing memory configurations across its lineup. Xiaomi is not alone; major brands like OPPO, vivo, and Apple have already raised prices in 2026, with industry insiders predicting another round of increases (200-800 yuan) in the second half. A full-scale industry-wide涨价 cycle is underway, forcing both manufacturers and consumers to recalibrate their strategies and purchasing decisions amid sustained cost pressures.

marsbit21 dk önce

Sales Drop 26% But Prices Rise? Xiaomi's Dilemma

marsbit21 dk önce

Three Consecutive Quarters of Decline: The Crypto Market is Experiencing Its Longest Ebb Since 2022

The cryptocurrency market experienced its third consecutive quarterly decline in Q2 2026, marking its longest downturn since 2022, according to a CoinGecko report. The total market capitalization fell 12.6% to $2.1 trillion, a retreat of roughly 52% from its October 2025 peak. Multiple indicators signal an orderly capital exit from the sector. For the first time since Q3 2023, the total stablecoin market cap shrank (-1.6% to $305.1B), indicating funds are leaving the ecosystem entirely, not just rotating to safer crypto assets. Trading volumes on centralized exchanges dropped 27.9%, while DeFi's Total Value Locked (TVL) plummeted 23.4%. Both Bitcoin (-14.2%) and Ethereum (-25.4%) underperformed traditional risk assets like equities in Q2, breaking from previous correlative narratives. Ethereum saw its first-ever three-quarter losing streak, with its market share falling to around 10%. A few areas saw growth. Prediction market volumes surged 48.7%, largely driven by sports betting. Hyperliquid's HYPE token entered the top 10 by market cap, and tokenized collectibles platforms grew, though primarily via gamified mechanics. Despite a ~9.8% Bitcoin rebound in July, historical trends suggest caution for August. The market, now ~49% below its 2025 high, is undergoing a measured retreat. Its recovery hinges on future Federal Reserve policy and the industry's ability to develop sustainable revenue streams beyond speculation.

marsbit35 dk önce

Three Consecutive Quarters of Decline: The Crypto Market is Experiencing Its Longest Ebb Since 2022

marsbit35 dk önce

Bithumb sets 2028 IPO timetable as it overhauls internal controls

South Korean cryptocurrency exchange Bithumb announced plans to apply for a preliminary listing review in 2027 and complete an IPO in 2028. To prepare, it has reorganized its business structure, including spinning off Bithumb Asset, to clarify responsibilities and reduce conflicts of interest. Preparations also involve upgrading internal controls and adopting the K-IFRS international accounting standard. Bithumb's IPO push aligns with a trend of South Korean crypto exchanges deepening ties with traditional finance. The announcement follows a major operational error in February, where Bithumb mistakenly credited customers with 620,000 Bitcoin instead of cash rewards, recovering 99.7% after some was sold. The exchange's preparations proceed as two related listed companies, Vidente and Bucket Studio, face ongoing audit and listing issues, having had their share trading suspended since March 2023. Bithumb noted its IPO timetable could change based on market conditions and regulatory reviews.

cointelegraph45 dk önce

Bithumb sets 2028 IPO timetable as it overhauls internal controls

cointelegraph45 dk önce

İşlemler

Spot

Popüler Makaleler

ONE Nasıl Satın Alınır

HTX.com’a hoş geldiniz! Harmony (ONE) satın alma işlemlerini basit ve kullanışlı bir hâle getirdik. Adım adım açıkladığımız rehberimizi takip ederek kripto yolculuğunuza başlayın. 1. Adım: HTX Hesabınızı OluşturunHTX'te ücretsiz bir hesap açmak için e-posta adresinizi veya telefon numaranızı kullanın. Sorunsuzca kaydolun ve tüm özelliklerin kilidini açın. Hesabımı Aç2. Adım: Kripto Satın Al Bölümüne Gidin ve Ödeme Yönteminizi SeçinKredi/Banka Kartı: Visa veya Mastercard'ınızı kullanarak anında Harmony (ONE) satın alın.Bakiye: Sorunsuz bir şekilde işlem yapmak için HTX hesap bakiyenizdeki fonları kullanın.Üçüncü Taraflar: Kullanımı kolaylaştırmak için Google Pay ve Apple Pay gibi popüler ödeme yöntemlerini ekledik.P2P: HTX'teki diğer kullanıcılarla doğrudan işlem yapın.Borsa Dışı (OTC): Yatırımcılar için kişiye özel hizmetler ve rekabetçi döviz kurları sunuyoruz.3. Adım: Harmony (ONE) Varlıklarınızı SaklayınHarmony (ONE) satın aldıktan sonra HTX hesabınızda saklayın. Alternatif olarak, blok zinciri transferi yoluyla başka bir yere gönderebilir veya diğer kripto para birimlerini takas etmek için kullanabilirsiniz.4. Adım: Harmony (ONE) Varlıklarınızla İşlem YapınHTX'in spot piyasasında Harmony (ONE) ile kolayca işlemler yapın.Hesabınıza erişin, işlem çiftinizi seçin, işlemlerinizi gerçekleştirin ve gerçek zamanlı olarak izleyin. Hem yeni başlayanlar hem de deneyimli yatırımcılar için kullanıcı dostu bir deneyim sunuyoruz.

607 Toplam GörüntülenmeYayınlanma 2024.12.12Güncellenme 2026.06.02

Tartışmalar

HTX Topluluğuna hoş geldiniz. Burada, en son platform gelişmeleri hakkında bilgi sahibi olabilir ve profesyonel piyasa görüşlerine erişebilirsiniz. Kullanıcıların ONE (ONE) fiyatı hakkındaki görüşleri aşağıda sunulmaktadır.