Microsoft Open-Sources Cutting-Edge Voice AI Family VibeVoice: Processes 90-Minute Multi-Speaker Conversations in One Go, Rapidly Gains 27K Stars on GitHub

marsbitPubblicato 2026-03-30Pubblicato ultima volta 2026-03-30

Introduzione

Microsoft has open-sourced VibeVoice, a cutting-edge family of speech AI models for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). The project, gaining 27K stars on GitHub, offers powerful long-audio processing, multi-speaker dialogue generation, and real-time capabilities under an MIT license for local deployment. Key models include: - **VibeVoice-ASR-7B**: Processes up to 60 minutes of audio, outputs structured transcriptions with speaker identification, timestamps, and supports over 50 languages. - **VibeVoice-TTS-1.5B**: Generates expressive, 90-minute multi-speaker (up to 4 voices) conversations with natural flow and emotional nuance. - **VibeVoice-Realtime-0.5B**: Enables real-time TTS with ~300ms latency for interactive applications like voice assistants. The framework addresses limitations in long-sequence processing, speaker consistency, and naturalness. It includes safety features like audio watermarking and has sparked community-developed tools (e.g., a voice input method). Available on GitHub and Hugging Face, VibeVoice aims to advance innovation in content creation, accessibility, and voice interaction.

Microsoft recently open-sourced a cutting-edge voice AI model family named VibeVoice, which encompasses capabilities such as automatic speech recognition (ASR) and text-to-speech (TTS). The project has quickly garnered attention in the developer community due to its powerful long-audio processing, multi-speaker natural conversation generation, and real-time low-latency features. It has already gained approximately 27K Stars on GitHub.

As an open-source research framework, VibeVoice uses the MIT license, supports local deployment, requires no cloud subscription fees, and aims to promote collaboration and innovation in the field of speech synthesis. The model family mainly includes three core members, each with its own focus, collectively addressing the pain points of traditional voice AI in long-sequence processing, speaker consistency, and natural fluency.

VibeVoice-ASR-7B: A Structured Speech-to-Text Tool for Up to 60 Minutes

VibeVoice-ASR-7B is a unified speech-to-text model capable of processing audio files up to 60 minutes long in one go, directly outputting structured transcription results. The output includes not only "who is speaking" (speaker identification) and "when they speak" (precise timestamps), but also "what was said" (detailed content), and supports custom hotwords to effectively improve the recognition accuracy of proper nouns or technical terms. The model supports over 50 languages and is suitable for complex scenarios like long meeting recordings and podcast transcriptions.

Community developers have already built practical tools based on this model, such as a voice input method named Vibing, which supports macOS and Windows platforms. User feedback indicates that its recognition speed and accuracy perform well, significantly improving daily voice input efficiency.

VibeVoice-TTS-1.5B: Expressive Speech Generation for 90-Minute Multi-Speaker Content

VibeVoice-TTS-1.5B is a core model focused on text-to-speech, capable of producing continuous audio up to 90 minutes long in a single generation, supporting natural dialogue simulation with up to 4 different speakers. The generated speech is expressive, sounds natural and fluent, and can simulate realistic pauses, emphasis, and emotional transitions, making it very suitable for producing podcasts, long-form audio narratives, audiobooks, or multi-character dialogue content.

Compared to many traditional TTS models that only support 1-2 speakers, VibeVoice-TTS has achieved significant breakthroughs in long-form, multi-speaker consistency. Its underlying architecture uses continuous speech tokenizers (acoustic and semantic tokenizers) combined with a low frame rate design (7.5Hz), greatly improving computational efficiency for long-sequence handling.

VibeVoice-Realtime-0.5B: Real-Time TTS with ~300ms Latency

VibeVoice-Realtime-0.5B focuses on real-time scenarios, supporting streaming text input with an initial audio output latency of approximately 300 milliseconds, while also being able to generate long-form speech of about 10 minutes. This model is particularly suitable for interactive applications requiring immediate responses, such as real-time voice assistants or live broadcast dubbing scenarios.

Additionally, the project introduces experimental speaker support, including multilingual voices and various English style variants, providing developers with more customization options.

AIbase Review: Microsoft's open-sourcing of VibeVoice not only lowers the barrier to using high-performance voice AI but also provides a complete solution for local deployment. The project was briefly taken down due to potential misuse risks but was later re-released with safety mechanisms such as embedded watermarks and audible disclaimers, reflecting the principles of responsible AI development. Currently, developers can obtain model weights on the GitHub repository and Hugging Face, and quickly try them out on platforms like Colab.

With continued contributions from the open-source community (such as optimized forks for Apple Silicon), VibeVoice is expected to accelerate adoption in fields like content creation, accessibility tools, and voice interaction. Interested developers can visit the official Microsoft project page to explore further.

Project address: https://github.com/microsoft/VibeVoice

Domande pertinenti

QWhat is the name of the open-source voice AI model family recently released by Microsoft, and how many stars has it received on GitHub?

AThe open-source voice AI model family is called VibeVoice, and it has received approximately 27,000 stars on GitHub.

QWhat are the three core models in the VibeVoice family and their primary capabilities?

AThe three core models are: 1) VibeVoice-ASR-7B, which handles automatic speech recognition for up to 60 minutes of audio; 2) VibeVoice-TTS-1.5B, which generates expressive speech for up to 90 minutes with multiple speakers; and 3) VibeVoice-Realtime-0.5B, which provides real-time text-to-speech with about 300ms latency.

QWhat is a key feature of the VibeVoice-ASR-7B model regarding its output?

AA key feature is its ability to output structured transcriptions that include speaker identification (who is speaking), precise timestamps (when they speak), and the detailed content (what was said).

QHow does the VibeVoice-TTS-1.5B model achieve efficient long-sequence processing?

AIt uses continuous speech tokenizers (acoustic and semantic tokenizers) combined with a low frame rate design (7.5Hz), which significantly improves computational efficiency for long-sequence processing.

QWhat safety measures were implemented in the VibeVoice project to address potential misuse risks?

AThe project implemented embedded audio watermarks and audible disclaimer mechanisms as safety measures to address potential misuse risks.

Letture associate

Cardano Founder Hoskinson Says Midnight Mainnet Is Now Live

Cardano founder Charles Hoskinson announced the mainnet launch of Midnight, a privacy-focused blockchain within the Cardano ecosystem. The network, which has already produced over 163,000 blocks, is currently in a controlled "guarded era" with federated node operators ensuring stability. It features programmable privacy through zero-knowledge proofs, allowing selective data sharing. The ecosystem uses a dual-token model: NIGHT for governance and DUST for renewable transaction fees. Institutional partners include Worldpay, Google Cloud, and eToro. Hoskinson also released a free book on zero-knowledge proofs and confirmed upcoming Lace wallet updates.

bitcoinist1 min fa

Cardano Founder Hoskinson Says Midnight Mainnet Is Now Live

bitcoinist1 min fa

Hardcore Breakdown of Polymarket's Fee Formula: How Did the Extreme Rate of 90+% Pop Up?

Polymarket, a prediction market platform, recently faced backlash due to unexpectedly high transaction fees, with some users reporting fees as extreme as 94.8%. The issue stemmed from a temporary change in the fee calculation formula. The platform initially used an "old formula": fee = C × p × feeRate × (p × (1 - p))^exponent. However, a brief update to an "abnormal formula" removed a critical "× p" term, becoming: fee = C × feeRate × (p × (1 - p))^exponent. Since share prices (p) are always less than $1, this omission drastically increased fees, especially for very low-priced shares (near $0.001), as the fee was no longer scaled by the price. An exponent value of 0.5 in certain markets ("Weather" and "Economy") further exacerbated the curve, leading to the extreme rates. Polymarket quickly responded by implementing a "new formula": fee = C × feeRate × p × (1 - p), effectively setting the exponent to 1. This change significantly reduced fees, particularly for extreme price points, bringing maximum fees down to around 5%. The article advises users to avoid high fees by using limit orders (which are free and even offer a 20-25% maker rebate) or the platform's "Split" function to indirectly establish positions instead of market orders.

Odaily星球日报6 min fa

Hardcore Breakdown of Polymarket's Fee Formula: How Did the Extreme Rate of 90+% Pop Up?

Odaily星球日报6 min fa

When Bitcoin Miners Take to Space

SpaceX is reportedly preparing for a historic IPO with a target of $1.75 trillion, while simultaneously advancing plans to deploy AI data centers in orbit, leveraging space’s vacuum for cooling and solar energy for power. This has sparked interest in whether Bitcoin mining—also energy-intensive and dependent on computing hardware—could also move to space. The core idea involves placing mining ASICs on the back of solar panels in orbit, using abundant solar energy to power mining operations. Heat dissipation in vacuum, a key challenge, is manageable through thermal radiation, and communication with mining pools is feasible with low latency via low Earth orbit satellites. However, the economics remain prohibitive. Launch costs, currently around $2,720 per kilogram via Falcon 9, make mining payloads financially unviable. Estimates suggest that with current technology, the payback period would exceed 100 years. SpaceX’s Starship may eventually reduce launch costs below $200/kg, making space mining more feasible. Companies like Starcloud—backed by NVIDIA and top VCs—are already testing orbit-based computing, including AI and planned Bitcoin mining experiments. Others, like SpaceChain and Cryptosat, focus on secure blockchain nodes and cryptographic services in space rather than mining. While orbital mining is not yet economically competitive with terrestrial operations, it represents a long-term vision for radically reducing energy costs and expanding the infrastructure of decentralized networks beyond Earth.

marsbit15 min fa

marsbit15 min fa

A Country Betting 9% of Its GDP on Bitcoin

Bhutan, a small Buddhist kingdom nestled in the Himalayas, made a secret and massive bet on Bitcoin mining, investing up to 9% of its GDP in cryptocurrency infrastructure. Using its abundant and cheap hydroelectric power, the country began mining Bitcoin around 2019–2020 under the direction of its sovereign wealth fund, Druk Holding and Investments (DHI), led by U.S.-educated technocrat Ujjwal Deep Dahal. At its peak, Bhutan held around 13,000 BTC. It sold portions to cover national expenses, including a 50% salary raise for civil servants in 2023, and committed up to 10,000 BTC (worth ~$1 billion) to fund a massive special economic zone—the Gelephu Mindfulness City (GMC)—aimed at attracting global investment. The move was a strategic effort to monetize excess energy, reduce dependence on the Indian rupee, and diversify its economy. However, the large-scale import of mining equipment caused foreign reserves to drop and the current account deficit to spike to 34.3% of GDP in 2022/23. While the nation’s macroeconomic indicators improved with the 2025 Bitcoin bull run, the benefits have not fully reached its people. Youth unemployment remains high at 18%, and a significant portion of the population—nearly 8%—has emigrated abroad due to limited opportunities, highlighting the disconnect between national crypto ambitions and everyday economic reality.

marsbit23 min fa

A Country Betting 9% of Its GDP on Bitcoin

marsbit23 min fa

BitMart VIP Insights: March Crypto Market Review and Hotspot Analysis

BitMart VIP Insights: March 2026 Crypto Market Review and Analysis March saw a mixed macro environment with a hawkish Fed holding rates steady amid persistent inflation, rising oil prices, and weakening employment, raising stagflation concerns. Equities and risk assets weakened. Crypto trading volume showed volatile spikes but lacked sustainability, with total market cap stabilizing around $2.45–2.50T after a mid-month peak. BTC and ETH spot ETFs reversed from outflows to net inflows, with ETH showing stronger capital return and price elasticity. Stablecoin supply expanded modestly but concentrated in major tokens, indicating cautious liquidity return rather than broad risk-on sentiment. BTC traded between $62K–$74K, currently around $69K–$71K, while ETH was weaker in the $1.9K–$2.2K range. SOL was relatively resilient between $82–$97. Key developments included a landmark SEC/CFTC joint framework classifying 16 major assets (including BTC and ETH) as digital commodities, significantly improving regulatory clarity. BlackRock launched the first staking-enabled ETH ETF (ETHB), shifting crypto ETFs from pure price-trackers to yield-generating assets. However, security incidents like the Resolv Labs private key attack highlighted growing off-chain risks. April will be critical for crypto regulation, with the CLARITY法案 potentially advancing. The Ethereum Glamsterdam upgrade enters key testing, and Fed Chair Powell’s term end adds policy uncertainty. Macro data, geopolitics, and ETF flows will remain key market drivers.

marsbit32 min fa

BitMart VIP Insights: March Crypto Market Review and Hotspot Analysis

marsbit32 min fa

Trading

Spot

Futures

Articoli Popolari

Come comprare ONE

Benvenuto in HTX.com! Abbiamo reso l'acquisto di Harmony (ONE) semplice e conveniente. Segui la nostra guida passo passo per intraprendere il tuo viaggio nel mondo delle criptovalute.Step 1: Crea il tuo Account HTXUsa la tua email o numero di telefono per registrarti il tuo account gratuito su HTX. Vivi un'esperienza facile e sblocca tutte le funzionalità,Crea il mio accountStep 2: Vai in Acquista crypto e seleziona il tuo metodo di pagamentoCarta di credito/debito: utilizza la tua Visa o Mastercard per acquistare immediatamente HarmonyONE.Bilancio: Usa i fondi dal bilancio del tuo account HTX per fare trading senza problemi.Terze parti: abbiamo aggiunto metodi di pagamento molto utilizzati come Google Pay e Apple Pay per maggiore comodità.P2P: Fai trading direttamente con altri utenti HTX.Over-the-Counter (OTC): Offriamo servizi su misura e tassi di cambio competitivi per i trader.Step 3: Conserva Harmony (ONE)Dopo aver acquistato Harmony (ONE), conserva nel tuo account HTX. In alternativa, puoi inviare tramite trasferimento blockchain o scambiare per altre criptovalute.Step 4: Scambia Harmony (ONE)Scambia facilmente Harmony (ONE) nel mercato spot di HTX. Accedi al tuo account, seleziona la tua coppia di trading, esegui le tue operazioni e monitora in tempo reale. Offriamo un'esperienza user-friendly sia per chi ha appena iniziato che per i trader più esperti.

196 Totale visualizzazioniPubblicato il 2024.12.12Aggiornato il 2025.03.21

Discussioni

Benvenuto nella Community HTX. Qui puoi rimanere informato sugli ultimi sviluppi della piattaforma e accedere ad approfondimenti esperti sul mercato. Le opinioni degli utenti sul prezzo di ONE ONE sono presentate come di seguito.