Microsoft Open-Sources Cutting-Edge Voice AI Family VibeVoice: Processes 90-Minute Multi-Speaker Conversations in One Go, Rapidly Gains 27K Stars on GitHub

marsbitPubblicato 2026-03-30Pubblicato ultima volta 2026-03-30

Introduzione

Microsoft has open-sourced VibeVoice, a cutting-edge family of speech AI models for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). The project, gaining 27K stars on GitHub, offers powerful long-audio processing, multi-speaker dialogue generation, and real-time capabilities under an MIT license for local deployment. Key models include: - **VibeVoice-ASR-7B**: Processes up to 60 minutes of audio, outputs structured transcriptions with speaker identification, timestamps, and supports over 50 languages. - **VibeVoice-TTS-1.5B**: Generates expressive, 90-minute multi-speaker (up to 4 voices) conversations with natural flow and emotional nuance. - **VibeVoice-Realtime-0.5B**: Enables real-time TTS with ~300ms latency for interactive applications like voice assistants. The framework addresses limitations in long-sequence processing, speaker consistency, and naturalness. It includes safety features like audio watermarking and has sparked community-developed tools (e.g., a voice input method). Available on GitHub and Hugging Face, VibeVoice aims to advance innovation in content creation, accessibility, and voice interaction.

Microsoft recently open-sourced a cutting-edge voice AI model family named VibeVoice, which encompasses capabilities such as automatic speech recognition (ASR) and text-to-speech (TTS). The project has quickly garnered attention in the developer community due to its powerful long-audio processing, multi-speaker natural conversation generation, and real-time low-latency features. It has already gained approximately 27K Stars on GitHub.

As an open-source research framework, VibeVoice uses the MIT license, supports local deployment, requires no cloud subscription fees, and aims to promote collaboration and innovation in the field of speech synthesis. The model family mainly includes three core members, each with its own focus, collectively addressing the pain points of traditional voice AI in long-sequence processing, speaker consistency, and natural fluency.

VibeVoice-ASR-7B: A Structured Speech-to-Text Tool for Up to 60 Minutes

VibeVoice-ASR-7B is a unified speech-to-text model capable of processing audio files up to 60 minutes long in one go, directly outputting structured transcription results. The output includes not only "who is speaking" (speaker identification) and "when they speak" (precise timestamps), but also "what was said" (detailed content), and supports custom hotwords to effectively improve the recognition accuracy of proper nouns or technical terms. The model supports over 50 languages and is suitable for complex scenarios like long meeting recordings and podcast transcriptions.

Community developers have already built practical tools based on this model, such as a voice input method named Vibing, which supports macOS and Windows platforms. User feedback indicates that its recognition speed and accuracy perform well, significantly improving daily voice input efficiency.

VibeVoice-TTS-1.5B: Expressive Speech Generation for 90-Minute Multi-Speaker Content

VibeVoice-TTS-1.5B is a core model focused on text-to-speech, capable of producing continuous audio up to 90 minutes long in a single generation, supporting natural dialogue simulation with up to 4 different speakers. The generated speech is expressive, sounds natural and fluent, and can simulate realistic pauses, emphasis, and emotional transitions, making it very suitable for producing podcasts, long-form audio narratives, audiobooks, or multi-character dialogue content.

Compared to many traditional TTS models that only support 1-2 speakers, VibeVoice-TTS has achieved significant breakthroughs in long-form, multi-speaker consistency. Its underlying architecture uses continuous speech tokenizers (acoustic and semantic tokenizers) combined with a low frame rate design (7.5Hz), greatly improving computational efficiency for long-sequence handling.

VibeVoice-Realtime-0.5B: Real-Time TTS with ~300ms Latency

VibeVoice-Realtime-0.5B focuses on real-time scenarios, supporting streaming text input with an initial audio output latency of approximately 300 milliseconds, while also being able to generate long-form speech of about 10 minutes. This model is particularly suitable for interactive applications requiring immediate responses, such as real-time voice assistants or live broadcast dubbing scenarios.

Additionally, the project introduces experimental speaker support, including multilingual voices and various English style variants, providing developers with more customization options.

AIbase Review: Microsoft's open-sourcing of VibeVoice not only lowers the barrier to using high-performance voice AI but also provides a complete solution for local deployment. The project was briefly taken down due to potential misuse risks but was later re-released with safety mechanisms such as embedded watermarks and audible disclaimers, reflecting the principles of responsible AI development. Currently, developers can obtain model weights on the GitHub repository and Hugging Face, and quickly try them out on platforms like Colab.

With continued contributions from the open-source community (such as optimized forks for Apple Silicon), VibeVoice is expected to accelerate adoption in fields like content creation, accessibility tools, and voice interaction. Interested developers can visit the official Microsoft project page to explore further.

Project address: https://github.com/microsoft/VibeVoice

Crypto di tendenza

CitreaCTR

wrapped stUSDTWSTUSDT

Velodrome FinanceVELODROME

Domande pertinenti

QWhat is the name of the open-source voice AI model family recently released by Microsoft, and how many stars has it received on GitHub?

AThe open-source voice AI model family is called VibeVoice, and it has received approximately 27,000 stars on GitHub.

QWhat are the three core models in the VibeVoice family and their primary capabilities?

AThe three core models are: 1) VibeVoice-ASR-7B, which handles automatic speech recognition for up to 60 minutes of audio; 2) VibeVoice-TTS-1.5B, which generates expressive speech for up to 90 minutes with multiple speakers; and 3) VibeVoice-Realtime-0.5B, which provides real-time text-to-speech with about 300ms latency.

QWhat is a key feature of the VibeVoice-ASR-7B model regarding its output?

AA key feature is its ability to output structured transcriptions that include speaker identification (who is speaking), precise timestamps (when they speak), and the detailed content (what was said).

QHow does the VibeVoice-TTS-1.5B model achieve efficient long-sequence processing?

AIt uses continuous speech tokenizers (acoustic and semantic tokenizers) combined with a low frame rate design (7.5Hz), which significantly improves computational efficiency for long-sequence processing.

QWhat safety measures were implemented in the VibeVoice project to address potential misuse risks?

AThe project implemented embedded audio watermarks and audible disclaimer mechanisms as safety measures to address potential misuse risks.

Letture associate

Is a New Era Beginning for Bitcoin? MicroStrategy Starts August with Further Sales: Announces Another Major BTC Sell-Off!

The world's largest public holder of Bitcoin, the company "Strategy," has announced another sale of its Bitcoin holdings. According to founder Michael Saylor, the company sold 1,638 BTC for approximately $105 million between July 27th and August 2nd, at an average price of $63,957 per Bitcoin. The stated purpose of the sale was to fund preferred stock dividends and buy back its stock, ticker STRC. As a result, Strategy's total Bitcoin holdings have decreased to 842,138. The company has not purchased any Bitcoin for roughly six weeks, with its last purchase occurring in July. During the same period, Strategy also raised $290.6 million through a sale of its MSTR stock. A portion of these funds, $81.2 million, was used for the STRC buyback. The company reported that its US dollar reserves have grown to approximately $4 billion. This increase in cash extends the projected period for which it can fund dividends using these reserves by 57 days, to a total of 2.3 years. Strategy emphasized its current asset position, stating it holds 842,138 BTC in its Bitcoin treasury and $4.0 billion in its US dollar treasury as of August 2, 2026.

cryptonews.ru21 min fa

Is a New Era Beginning for Bitcoin? MicroStrategy Starts August with Further Sales: Announces Another Major BTC Sell-Off!

cryptonews.ru21 min fa

Spot XRP Funds Ended July with a Net Inflow of $27.29 Million

Spot XRP-focused exchange-traded funds (ETFs) concluded July with a net inflow of $27.29 million, marking a continuation of their positive streak for a fourth consecutive month, according to SoSoValue. This performance made XRP ETFs the leaders among non-major cryptocurrency funds for the month, outpacing inflows into Solana ($14.62M), Chainlink ($4.54M), and Hedera ($3M) funds. Bitcoin and Ethereum ETFs attracted significantly larger sums of $172M and $365M, respectively. The last week of July alone saw XRP funds attract $14.86 million, the highest weekly inflow since the start of the month, with the majority occurring on Thursday and Friday ($6M and $7.7M respectively). Over the four-month period from April through July, these funds garnered inflows of $81.59M, $131.94M, $59.46M, and $27.29M, totaling over $300 million in new investments. Cumulative net inflows since the launch of XRP ETFs have now reached a new record high above $1.5 billion. Among individual issuers, Bitwise (ticker $XRP) leads with $511 million, followed by Canary Capital (XRPC) with $467 million. Despite the strong ETF inflows, the XRP altcoin itself depreciated by more than 3.5% over the week and nearly 1% in the last 24 hours, according to CoinMarketCap data.

cryptonews.ru36 min fa

Spot XRP Funds Ended July with a Net Inflow of $27.29 Million

cryptonews.ru36 min fa

Hashdex to shut smallest Bitcoin ETF after more than two years

Hashdex announced it will liquidate its spot-price Bitcoin ETF (ticker: DEFI) and distribute cash to shareholders, selling its approximately 225 BTC holdings. The issuer cited trading liquidity, operating costs, and investor interest for the closure. Launched in March 2024, the fund had net assets of $14.25 million. Analysts initially saw potential despite its late entry compared to other competing Bitcoin ETFs, with Bloomberg's Eric Balchunas noting possible interest if fees were competitive. Originally a futures ETF launched in 2022, the fund peaked at $17.54 million in assets in May 2025. WisdomTree Bitcoin Trust (BTCW), the next smallest among US-traded BTC ETFs, holds $140.37 million.

cointelegraph36 min fa

Hashdex to shut smallest Bitcoin ETF after more than two years

cointelegraph36 min fa

Fortune Warns of Risks in Circle and IBM Deal

Circle, the issuer of the USDC stablecoin, has acquired nearly 1,000 blockchain-related patents from IBM, making it the leading U.S. blockchain patent holder. Fortune editor Jeff John Roberts warned this move could increase competitive pressure in the stablecoin and blockchain infrastructure markets. Circle stated the patents will support development of USDC, its payments network, and other services. Roberts suggests Circle could use the portfolio to demand licensing fees, pressure competitors legally, or gain leverage in negotiations with banks and payment firms, especially with new rivals like the Open USD stablecoin emerging. However, there is no public evidence Circle plans to use the patents aggressively against open-source developers. The acquisition was framed by Circle as strengthening "internet-native finance" infrastructure. In 2025, Circle reported $2.7 billion in revenue and operating profit of $157 million, though it had a net loss due to stock-based compensation expenses.

cryptonews.ru39 min fa

Fortune Warns of Risks in Circle and IBM Deal

cryptonews.ru39 min fa

Ripple Takes a Major Step for XRP: Two More Major Investments Made!

Ripple has made two significant strategic investments, targeting fintech companies ZILO and Licuido. The goal is to integrate regulated financial infrastructure into the XRP Ledger (XRPL). These investments aim to add compliance-focused services for securities transfers, asset issuance, and collateral mobility on the XRPL. This move is designed to strengthen the foundation for institutional investors to issue, manage, and trade tokenized assets in a regulated manner within the XRPL ecosystem. The announcement follows Aviva Investors' recent tokenization of a USD liquidity fund on the XRPL, signaling growing institutional interest in the platform.

cryptonews.ru39 min fa

Ripple Takes a Major Step for XRP: Two More Major Investments Made!

cryptonews.ru39 min fa

Trading

Spot

Articoli Popolari

Come comprare ONE

Benvenuto in HTX.com! Abbiamo reso l'acquisto di Harmony (ONE) semplice e conveniente. Segui la nostra guida passo passo per intraprendere il tuo viaggio nel mondo delle criptovalute.Step 1: Crea il tuo Account HTXUsa la tua email o numero di telefono per registrarti il tuo account gratuito su HTX. Vivi un'esperienza facile e sblocca tutte le funzionalità,Crea il mio accountStep 2: Vai in Acquista crypto e seleziona il tuo metodo di pagamentoCarta di credito/debito: utilizza la tua Visa o Mastercard per acquistare immediatamente HarmonyONE.Bilancio: Usa i fondi dal bilancio del tuo account HTX per fare trading senza problemi.Terze parti: abbiamo aggiunto metodi di pagamento molto utilizzati come Google Pay e Apple Pay per maggiore comodità.P2P: Fai trading direttamente con altri utenti HTX.Over-the-Counter (OTC): Offriamo servizi su misura e tassi di cambio competitivi per i trader.Step 3: Conserva Harmony (ONE)Dopo aver acquistato Harmony (ONE), conserva nel tuo account HTX. In alternativa, puoi inviare tramite trasferimento blockchain o scambiare per altre criptovalute.Step 4: Scambia Harmony (ONE)Scambia facilmente Harmony (ONE) nel mercato spot di HTX. Accedi al tuo account, seleziona la tua coppia di trading, esegui le tue operazioni e monitora in tempo reale. Offriamo un'esperienza user-friendly sia per chi ha appena iniziato che per i trader più esperti.

433 Totale visualizzazioniPubblicato il 2024.12.12Aggiornato il 2026.06.02

Discussioni

Benvenuto nella Community HTX. Qui puoi rimanere informato sugli ultimi sviluppi della piattaforma e accedere ad approfondimenti esperti sul mercato. Le opinioni degli utenti sul prezzo di ONE ONE sono presentate come di seguito.