Microsoft Open-Sources Cutting-Edge Voice AI Family VibeVoice: Processes 90-Minute Multi-Speaker Conversations in One Go, Rapidly Gains 27K Stars on GitHub

marsbitPublicado a 2026-03-30Actualizado a 2026-03-30

Resumen

Microsoft has open-sourced VibeVoice, a cutting-edge family of speech AI models for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). The project, gaining 27K stars on GitHub, offers powerful long-audio processing, multi-speaker dialogue generation, and real-time capabilities under an MIT license for local deployment. Key models include: - **VibeVoice-ASR-7B**: Processes up to 60 minutes of audio, outputs structured transcriptions with speaker identification, timestamps, and supports over 50 languages. - **VibeVoice-TTS-1.5B**: Generates expressive, 90-minute multi-speaker (up to 4 voices) conversations with natural flow and emotional nuance. - **VibeVoice-Realtime-0.5B**: Enables real-time TTS with ~300ms latency for interactive applications like voice assistants. The framework addresses limitations in long-sequence processing, speaker consistency, and naturalness. It includes safety features like audio watermarking and has sparked community-developed tools (e.g., a voice input method). Available on GitHub and Hugging Face, VibeVoice aims to advance innovation in content creation, accessibility, and voice interaction.

Microsoft recently open-sourced a cutting-edge voice AI model family named VibeVoice, which encompasses capabilities such as automatic speech recognition (ASR) and text-to-speech (TTS). The project has quickly garnered attention in the developer community due to its powerful long-audio processing, multi-speaker natural conversation generation, and real-time low-latency features. It has already gained approximately 27K Stars on GitHub.

As an open-source research framework, VibeVoice uses the MIT license, supports local deployment, requires no cloud subscription fees, and aims to promote collaboration and innovation in the field of speech synthesis. The model family mainly includes three core members, each with its own focus, collectively addressing the pain points of traditional voice AI in long-sequence processing, speaker consistency, and natural fluency.

VibeVoice-ASR-7B: A Structured Speech-to-Text Tool for Up to 60 Minutes

VibeVoice-ASR-7B is a unified speech-to-text model capable of processing audio files up to 60 minutes long in one go, directly outputting structured transcription results. The output includes not only "who is speaking" (speaker identification) and "when they speak" (precise timestamps), but also "what was said" (detailed content), and supports custom hotwords to effectively improve the recognition accuracy of proper nouns or technical terms. The model supports over 50 languages and is suitable for complex scenarios like long meeting recordings and podcast transcriptions.

Community developers have already built practical tools based on this model, such as a voice input method named Vibing, which supports macOS and Windows platforms. User feedback indicates that its recognition speed and accuracy perform well, significantly improving daily voice input efficiency.

VibeVoice-TTS-1.5B: Expressive Speech Generation for 90-Minute Multi-Speaker Content

VibeVoice-TTS-1.5B is a core model focused on text-to-speech, capable of producing continuous audio up to 90 minutes long in a single generation, supporting natural dialogue simulation with up to 4 different speakers. The generated speech is expressive, sounds natural and fluent, and can simulate realistic pauses, emphasis, and emotional transitions, making it very suitable for producing podcasts, long-form audio narratives, audiobooks, or multi-character dialogue content.

Compared to many traditional TTS models that only support 1-2 speakers, VibeVoice-TTS has achieved significant breakthroughs in long-form, multi-speaker consistency. Its underlying architecture uses continuous speech tokenizers (acoustic and semantic tokenizers) combined with a low frame rate design (7.5Hz), greatly improving computational efficiency for long-sequence handling.

VibeVoice-Realtime-0.5B: Real-Time TTS with ~300ms Latency

VibeVoice-Realtime-0.5B focuses on real-time scenarios, supporting streaming text input with an initial audio output latency of approximately 300 milliseconds, while also being able to generate long-form speech of about 10 minutes. This model is particularly suitable for interactive applications requiring immediate responses, such as real-time voice assistants or live broadcast dubbing scenarios.

Additionally, the project introduces experimental speaker support, including multilingual voices and various English style variants, providing developers with more customization options.

AIbase Review: Microsoft's open-sourcing of VibeVoice not only lowers the barrier to using high-performance voice AI but also provides a complete solution for local deployment. The project was briefly taken down due to potential misuse risks but was later re-released with safety mechanisms such as embedded watermarks and audible disclaimers, reflecting the principles of responsible AI development. Currently, developers can obtain model weights on the GitHub repository and Hugging Face, and quickly try them out on platforms like Colab.

With continued contributions from the open-source community (such as optimized forks for Apple Silicon), VibeVoice is expected to accelerate adoption in fields like content creation, accessibility tools, and voice interaction. Interested developers can visit the official Microsoft project page to explore further.

Project address: https://github.com/microsoft/VibeVoice

Criptos en tendencia

CitreaCTR

wrapped stUSDTWSTUSDT

Preguntas relacionadas

QWhat is the name of the open-source voice AI model family recently released by Microsoft, and how many stars has it received on GitHub?

AThe open-source voice AI model family is called VibeVoice, and it has received approximately 27,000 stars on GitHub.

QWhat are the three core models in the VibeVoice family and their primary capabilities?

AThe three core models are: 1) VibeVoice-ASR-7B, which handles automatic speech recognition for up to 60 minutes of audio; 2) VibeVoice-TTS-1.5B, which generates expressive speech for up to 90 minutes with multiple speakers; and 3) VibeVoice-Realtime-0.5B, which provides real-time text-to-speech with about 300ms latency.

QWhat is a key feature of the VibeVoice-ASR-7B model regarding its output?

AA key feature is its ability to output structured transcriptions that include speaker identification (who is speaking), precise timestamps (when they speak), and the detailed content (what was said).

QHow does the VibeVoice-TTS-1.5B model achieve efficient long-sequence processing?

AIt uses continuous speech tokenizers (acoustic and semantic tokenizers) combined with a low frame rate design (7.5Hz), which significantly improves computational efficiency for long-sequence processing.

QWhat safety measures were implemented in the VibeVoice project to address potential misuse risks?

AThe project implemented embedded audio watermarks and audible disclaimer mechanisms as safety measures to address potential misuse risks.

Lecturas Relacionadas

Once-Popular Web3 Enters Wave of Layoffs

The once-hot Web3 industry is experiencing a severe wave of layoffs. While many companies attribute job cuts to AI-driven restructuring, the primary reason is often financial pressure. The Web3 sector, at the intersection of tech and finance, has been hit particularly hard. Employees at major cryptocurrency exchanges report sudden, impersonal layoffs—often with system access revoked overnight—and minimal or no severance. Common tactics include setting impossible performance targets or terminating employees for minor policy violations. The working atmosphere has become toxic, marked by intense monitoring, excessive meetings, and management obsessed with control and internal politics rather than product innovation. The industry's core business model is collapsing. Exchange revenue from trading fees and listing charges has plummeted due to a decline in quality projects and retail investor exodus. Events like the massive forced liquidation on October 10th further shattered confidence. Competition from on-chain derivatives platforms and prediction markets is intensifying the downturn. As layoffs continue, displaced workers struggle to find new opportunities. Many transition to the AI sector, but face significant bias from traditional finance and even some AI firms, which view crypto industry experience with suspicion. The current downturn appears more structural than cyclical, driven by unsustainable practices, internal strife, and a failure to innovate, raising questions about the industry's future trajectory.

marsbitHace 10 min(s)

Once-Popular Web3 Enters Wave of Layoffs

marsbitHace 10 min(s)

Hopes for Bitcoin's Price Rise Strengthen! Analytical Firm Reveals 'Fate Level' for BTC!

Bitcoin, the leading cryptocurrency, started the new week trading around $62,000. Despite lingering uncertainty regarding BTC's behavior amidst new developments in U.S.-Iran relations, analysis from trading platform BIT (formerly Matrixport) indicates that fears of a significant price drop in the options market are beginning to fade. The firm suggests Bitcoin may be forming a higher bottom, with the most intense selling fears likely having passed. BIT emphasized that maintaining a positive outlook for Bitcoin and holding the $62,000 support level is crucial for a recovery. The analysis notes that current market participants hold relatively small positions, which should limit further selling pressure. Furthermore, price movements in the options market this week will be critical for Bitcoin's short-term direction, as a narrowing of the options curve's negative skew could reignite upward momentum. However, the company also cautioned that macroeconomic pressures on the crypto market persist. According to the analysis, geopolitical events and rising U.S. Treasury yields remain the most significant risk factors for Bitcoin and other risk assets. BIT concluded by stating that its assessment is solely market analysis and should not be considered investment advice.

cryptonews.ruHace 11 min(s)

Hopes for Bitcoin's Price Rise Strengthen! Analytical Firm Reveals 'Fate Level' for BTC!

cryptonews.ruHace 11 min(s)

Sales Drop 26% But Prices Rise? Xiaomi's Dilemma

Xiaomi, facing a significant 26.3% year-on-year decline in global smartphone shipments in Q2 2026, has implemented its third price hike of the year. On August 2nd, prices were raised for nine models, including the flagship Mi 17 series (up 400-500 yuan) and Redmi K90/Turbo 5 series (up 300 yuan). This move completes a pattern where cost pressure, originating from surging memory chip prices, has climbed from entry-level to mid-range and now flagship products. The primary driver is a severe supply squeeze on consumer-grade DRAM and NAND flash memory, as major manufacturers like Samsung shift advanced capacity to more profitable HBM for AI applications. According to Xiaomi President Lu Weibing, memory prices for the same configuration have skyrocketed nearly fourfold since Q1 2025, adding roughly 1500 yuan to the cost of a mainstream 12GB+512GB phone. IDC estimates consumer memory costs have risen nearly 300% year-on-year. While the price increases hurt demand and contributed to the sales slump, Xiaomi's strategy of reducing entry-level models and upgrading its product mix also played a role. Domestically, its market share in China fell to 12% (5th place), while leaders Huawei and Apple saw shipments grow over 24%. To mitigate future risks, Xiaomi is accelerating its in-house "Surge" chip development and optimizing memory configurations across its lineup. Xiaomi is not alone; major brands like OPPO, vivo, and Apple have already raised prices in 2026, with industry insiders predicting another round of increases (200-800 yuan) in the second half. A full-scale industry-wide涨价 cycle is underway, forcing both manufacturers and consumers to recalibrate their strategies and purchasing decisions amid sustained cost pressures.

marsbitHace 21 min(s)

Sales Drop 26% But Prices Rise? Xiaomi's Dilemma

marsbitHace 21 min(s)

Three Consecutive Quarters of Decline: The Crypto Market is Experiencing Its Longest Ebb Since 2022

The cryptocurrency market experienced its third consecutive quarterly decline in Q2 2026, marking its longest downturn since 2022, according to a CoinGecko report. The total market capitalization fell 12.6% to $2.1 trillion, a retreat of roughly 52% from its October 2025 peak. Multiple indicators signal an orderly capital exit from the sector. For the first time since Q3 2023, the total stablecoin market cap shrank (-1.6% to $305.1B), indicating funds are leaving the ecosystem entirely, not just rotating to safer crypto assets. Trading volumes on centralized exchanges dropped 27.9%, while DeFi's Total Value Locked (TVL) plummeted 23.4%. Both Bitcoin (-14.2%) and Ethereum (-25.4%) underperformed traditional risk assets like equities in Q2, breaking from previous correlative narratives. Ethereum saw its first-ever three-quarter losing streak, with its market share falling to around 10%. A few areas saw growth. Prediction market volumes surged 48.7%, largely driven by sports betting. Hyperliquid's HYPE token entered the top 10 by market cap, and tokenized collectibles platforms grew, though primarily via gamified mechanics. Despite a ~9.8% Bitcoin rebound in July, historical trends suggest caution for August. The market, now ~49% below its 2025 high, is undergoing a measured retreat. Its recovery hinges on future Federal Reserve policy and the industry's ability to develop sustainable revenue streams beyond speculation.

marsbitHace 35 min(s)

Three Consecutive Quarters of Decline: The Crypto Market is Experiencing Its Longest Ebb Since 2022

marsbitHace 35 min(s)

Bithumb sets 2028 IPO timetable as it overhauls internal controls

South Korean cryptocurrency exchange Bithumb announced plans to apply for a preliminary listing review in 2027 and complete an IPO in 2028. To prepare, it has reorganized its business structure, including spinning off Bithumb Asset, to clarify responsibilities and reduce conflicts of interest. Preparations also involve upgrading internal controls and adopting the K-IFRS international accounting standard. Bithumb's IPO push aligns with a trend of South Korean crypto exchanges deepening ties with traditional finance. The announcement follows a major operational error in February, where Bithumb mistakenly credited customers with 620,000 Bitcoin instead of cash rewards, recovering 99.7% after some was sold. The exchange's preparations proceed as two related listed companies, Vidente and Bucket Studio, face ongoing audit and listing issues, having had their share trading suspended since March 2023. Bithumb noted its IPO timetable could change based on market conditions and regulatory reviews.

cointelegraphHace 45 min(s)

Bithumb sets 2028 IPO timetable as it overhauls internal controls

cointelegraphHace 45 min(s)

Trading

Spot

Artículos destacados

Cómo comprar ONE

¡Bienvenido a HTX.com! Hemos hecho que comprar Harmony (ONE) sea simple y conveniente. Sigue nuestra guía paso a paso para iniciar tu viaje de criptos.Paso 1: crea tu cuenta HTXUtiliza tu correo electrónico o número de teléfono para registrarte y obtener una cuenta gratuita en HTX. Experimenta un proceso de registro sin complicaciones y desbloquea todas las funciones.Obtener mi cuentaPaso 2: ve a Comprar cripto y elige tu método de pagoTarjeta de crédito/débito: usa tu Visa o Mastercard para comprar Harmony (ONE) al instante.Saldo: utiliza fondos del saldo de tu cuenta HTX para tradear sin problemas.Terceros: hemos agregado métodos de pago populares como Google Pay y Apple Pay para mejorar la comodidad.P2P: tradear directamente con otros usuarios en HTX.Over-the-Counter (OTC): ofrecemos servicios personalizados y tipos de cambio competitivos para los traders.Paso 3: guarda tu Harmony (ONE)Después de comprar tu Harmony (ONE), guárdalo en tu cuenta HTX. Alternativamente, puedes enviarlo a otro lugar mediante transferencia blockchain o utilizarlo para tradear otras criptomonedas.Paso 4: tradear Harmony (ONE)Tradear fácilmente con Harmony (ONE) en HTX's mercado spot. Simplemente accede a tu cuenta, selecciona tu par de trading, ejecuta tus trades y monitorea en tiempo real. Ofrecemos una experiencia fácil de usar tanto para principiantes como para traders experimentados.

410 Vistas totalesPublicado en 2024.12.12Actualizado en 2026.06.02

Discusiones

Bienvenido a la comunidad de HTX. Aquí puedes mantenerte informado sobre los últimos desarrollos de la plataforma y acceder a análisis profesionales del mercado. A continuación se presentan las opiniones de los usuarios sobre el precio de ONE (ONE).