Google Officially Declares War

链捕手Опубликовано 2026-05-21Обновлено 2026-05-21

Введение

Google Declares War with AI-First I/O 2026 At its 2026 I/O developer conference, Google launched an aggressive, multi-pronged offensive, embedding AI across its ecosystem and challenging rivals on performance and price. The event showcased three major releases: Gemini 3.5 Flash, the video-centric Gemini Omni Flash, and the system-level AI assistant Spark. Gemini 3.5 Flash, despite being a smaller "Flash" model, outperforms its Pro counterpart in key benchmarks like mathematical reasoning (GSM8K) and coding (SWE-bench). Google attributes this to "extreme knowledge distillation" from a larger teacher model and a novel, highly granular MoE (Mixture of Experts) architecture with 256 experts, achieving sub-65ms response times. The native multi-modal model, Gemini Omni Flash, offers real-time video understanding with 120ms latency, enabling applications like preventing a cup from overfilling. The new Spark assistant gains deep Android system integration, allowing it to automate complex multi-app workflows based on voice commands. Complementing these, Google unveiled lightweight AI glasses featuring Micro-OLED displays and on-device Gemini chips for instant, offline translation and scene analysis. CEO Sundar Pichai announced Gemini has reached 900 million monthly active users, leveraged through integration into Chrome, Android, and Workspace. Google also slashed prices dramatically: the Gemini 3.5 Flash API is priced at a fraction of competitor rates. This price war is enabled ...

The 2026 Google I/O developer conference left only one impression: arrogance.

Not only did they cram AI agents seamlessly into all core traffic portals like search, browsers, phones, and smart glasses like force-feeding a duck, but they also threw out three killer bombs in a row: Gemini 3.5 Flash, the video model Omni, and the brand-new AI assistant Spark.

After flexing its muscles, Sundar Pichai even boasted by announcing that Gemini's monthly active users had surpassed 900 million, and simultaneously announced a significant price reduction.

The message couldn't be clearer:I'm stronger and cheaper than you.

If that's not a declaration of war, what is?

01

Undoubtedly, the most dazzling highlight of the conference was the debut of Gemini 3.5 Flash.

Normally, "Pro" represents the core strength, while "Flash" represents lightweight and speed.

In terms of model parameters, 3.5 Flash is indeed smaller than 3.1 Pro, yet its performance surpassed the latter in almost all reasoning and coding benchmarks:

In the GSM8K test for complex mathematical reasoning, 3.5 Flash scored 95.8%, surpassing 3.1 Pro's 93.2%; in the full version of the SWE-bench for code generation capability, 3.5 Flash achieved a solve rate of 38.4%, far exceeding 3.1 Pro's 32.1%......

Why?

According to the "Gemini 3.5 Technical Report" released by DeepMind, there are two key core technologies.

Extreme Knowledge Distillation: Google did not simply rely on brute-force compute to train Flash this time. Instead, it used a never-before-disclosed "Gemini 3.5 Ultra" as a teacher model for dimensional reduction distillation of Flash.

According to a tweet analysis by Jeff Dean, Chief Scientist at DeepMind, the proportion of fine-tuning on high-quality reasoning chain datasets for 3.5 Flash increased by 400% compared to the previous generation.

This means it inherited the "logical brain" of a super-large model, rather than a memorized "knowledge base."

A Novel MoE Architecture (Mixture of Experts): Inside 3.5 Flash, Google employed more fine-grained expert networks.

Traditional MoE might only have 8 or 16 experts, activating only 1-2 per task, sufficient for supporting trillion-parameter scale models.

However, according to an analysis from a16z's 2026 AI Infrastructure Investment Memo, 3.5 Flash employs 256 micro-experts, activating the most efficient 4 of them during each inference.

This is why it can cover an extremely vast multimodal feature space while maintaining an extremely low number of activated parameters.

In the TTFT (Time to First Token) metric, 3.5 Flash has reached below 65 milliseconds.

A human blink takes 100-150 milliseconds.

In short, when it operates as an agent, from a human physiological perspective, there is virtually no perceptible delay.

For developers requiring frequent tool calls, multi-round reflection, and extremely low latency, this is the perfect super-agent foundation.

Only with such extreme engineering optimization can "device-side deployment" dominance be established in a fiercely competitive environment.

First, the native multimodal Gemini Omni Flash.

Omni means all-around, targeting the earlier GPT-4o. Just from the name, you can feel the intensity of the rivalry.

At least in terms of performance, Gemini Omni Flash is far more deserving of the "o" character than GPT-4o.

Early models like Sora or Gemini 1.5 were essentially patchwork, converting speech to text and then text to vision.

But this released Omni features true native end-to-end multimodal alignment. It natively understands temporal coherence and physical laws within videos, and latency dropped from the industry average of 400-600 milliseconds to 120 milliseconds.

An example from the keynote: a user wearing a camera pours water; as the cup nearly fills, Omni can say "Stop stop stop!" 0.5 seconds before the water overflows.

This kind of real-time inference about the physical state of the real world seems simple but is profoundly significant:AI has officially evolved from a chatbot on a screen into a real-world auxiliary tool.

Although still in its early stages.

Second, the intelligent assistant Spark.

According to a The Verge interview with an Android engineering VP, Spark has been granted native API control at the system layer of Android 17.

In short, the complex workflows that previously required you to open many apps can now be handled without lifting a finger. Just tell Spark what you need, and it can handle everything for you—even sending messages in your tone, sorting emails, summarizing schedules, tracking webpage updates, identifying hidden charges on bills, batch-processing documents, and so on......

In other words, with an AI assistant in the future, we might hardly need apps anymore; any complex operation is simplified into a single command.

Third, smart glasses.

Why glasses again?

At least in Google's view, seamless access to vision and hearing is the ultimate host for multimodal large models.

These glasses appear without any fancy aesthetics, focusing entirely on utility:

4-gram Micro-OLED full-color waveguide lenses with a light transmittance as high as 85%;

Equipped with a self-developed lightweight Gemini edge-side chip, local inference latency ≤12ms, capable of real-time translation, image recognition, and scene analysis without an internet connection;

Natively integrated with the Spark agent, syncing with phone and cloud data to deliver personalized services like schedule reminders, real-time translation, and environmental alerts.

In short, it's about bypassing the phone screen and embedding the agent into the human first-person perspective through glasses.

There's simply too much content. Google seems to have dumped all its trump cards at once, proclaiming a truth to the market:

An algorithm without an entry point is nothing.

The era of chasing model parameters and benchmark scores is over. Pure model providers no longer have a moat. The future is a four-dimensional space battle of "device + cloud + ecosystem + hardware."

Cramming AI into its suite is reshaping the entire internet's traffic distribution logic: from "users actively search/click" to "AI agents actively distributing services."

For the vast majority of developers and small-to-medium enterprises, this is excellent news because the underlying compute and models become extremely cheap, allowing everyone to focus on application-layer innovation.

But other competitors right now probably just want to curse out loud.

02

When Sundar Pichai casually announced on stage that "Gemini's monthly active users have officially surpassed 900 million," it caused quite a stir in the audience.

900 million—more than all the MAUs of its US competitors combined.

How was this achieved?

The answer is simple and brutal: Forced integration.

Google doesn't need to spend on advertising for user acquisition like independent AI companies. It just needs to add an icon next to Chrome's address bar, integrate a shortcut in the bottom navigation bar of 3 billion Android phones, push a full update within Google Workspace......

The customer acquisition cost is basically zero.

More crucially, in the coming period, the gaze of 900 million active users as they look at products with smart glasses, the logical adjustments made when using Spark to handle tasks, and the interactions with the Omni visual model will generate a massive amount of high-quality, multimodal real-world feedback data, all of which will become nourishment for Gemini 4.

This forms an extremely robust barrier:The better the model -> the more people use it -> the more data generated -> the better the model becomes.

To rapidly strengthen this loop, Google directly declared a price war on all competitors: The AI Ultra subscription was slashed from $249.99/month to $99.9/month.

The input price for 1 million tokens for 3.5 Flash was driven down to $0.02, and the output price for 1 million tokens is $0.08.

What kind of magical price is this?

For comparison, the industry average prices for similar-tier models are $0.15-0.2 and $0.6-1, respectively.

Sundar Pichai calculated: top customers process about 1 trillion tokens per day. Shifting 80% of the workload to Gemini 3.5 Flash for a year can save over $1 billion.

Why dare to sell AI at cabbage prices?

The biggest reliance is: vertically integrated computing infrastructure.

Including giants like OpenAI, Anthropic, despite their apparent success, are essentially "compute tenants," needing to buy computing power from Microsoft, Amazon, who in turn pay Jensen Huang (Nvidia).

Google has its own TPUs, coupled with the incredibly efficient sparse activation of 3.5 Flash's MoE architecture, compressing compute costs to the extreme.

It can leverage its heavy-asset advantage to deliver a dimensional blow to pure algorithm companies.

The logic is clear.

Foundation models are rapidly becoming commoditized. Like water and electricity, have you ever seen a water utility company making obscene profits?

Google isn't afraid of the model itself not being profitable, because it can make money back through search ads, cloud services, and fees from the Android ecosystem.

But for pure model API sellers like OpenAI, Anthropic, Cohere, Mistral, this is not feasible.

Investors probably want to press Sam Altman's head and ask:"Google's API price is only one-tenth of yours, and its performance is better. Tell me, how does your business model work?"

Competitive landscapes across multiple industries will thus enter an accelerated reshuffling period.

AI vendors, needless to say, must quickly find cheaper compute sources or venture into chipmaking themselves.

Next is Apple, still developing behind closed doors.

The combination of smart glasses + the Omni video model + Spark's native system-level control undoubtedly already threatens the iPhone.

According to Macquarie's "Consumer Electronics Trend Forecast Report": Within the next three years, the proportion of time spent on screenless interaction based on vision/voice is expected to jump from the current 8% to 35%.

If users get accustomed to using glasses and voice for daily work and entertainment, screen time will inevitably be significantly reduced.

If Apple cannot counter with sufficiently impressive wearable devices (Vision Pro is too heavy and expensive, destined to be a toy for a minority), its monopoly on mobile internet entry points will face an unprecedented challenge.

This is not an iteration; it's a revolution.

Google, with its three blades—technology, traffic, and price—has issued a declaration of war to all its rivals.

Is there anyone still laughing at its corporate bureaucracy now?

Связанные с этим вопросы

QWhat are the core technological advancements behind Gemini 3.5 Flash that allow it to outperform Pro models despite being smaller?

AGemini 3.5 Flash leverages two key technologies: 1) Extreme Knowledge Distillation: It uses a powerful, undisclosed 'Gemini 3.5 Ultra' model as a teacher to distill high-quality logical reasoning capabilities, with a 400% increase in fine-tuning on high-quality logic-chain data. 2) A novel fine-grained MoE (Mixture of Experts) architecture with 256 experts, activating only the most efficient 4 per task, allowing it to cover a vast multimodal feature space while maintaining low activation parameters.

QHow does Google's new AI strategy, as presented at I/O 2026, fundamentally shift the competitive landscape for AI companies like OpenAI?

AGoogle's strategy moves beyond pure model competition into a 'Device + Cloud + Ecosystem + Hardware' four-dimensional war. By deeply integrating AI (like Spark) into its massive user base via Chrome, Android, and Workspace, it achieves near-zero user acquisition costs and massive data collection. This, combined with aggressive price cuts on APIs enabled by its vertically integrated TPU infrastructure, challenges pure-play AI API companies. Their business model is threatened as Google can offer superior performance at a fraction of the cost and subsidize its AI through other revenue streams like ads.

QWhat are the three major product announcements from Google I/O 2026 that signify a direct challenge to competitors?

AThe three major announcements are: 1) Gemini Omni Flash: A native end-to-end multimodal video model with significantly reduced latency (120ms) capable of real-time physical world inference. 2) Spark: A native AI assistant built into Android 17 with deep system-level API access to automate complex app workflows. 3) AI Smart Glasses: A lightweight, locally-capable device integrating Micro-OLED displays and Gemini for seamless, screen-less, first-person AI interaction.

QAccording to the article, what is the significant implication of Gemini's monthly active users reaching 9 billion?

AReaching 9 billion MAUs (more than all US competitors combined) creates a powerful and difficult-to-break flywheel effect: A better model attracts more users, which generates more high-quality, multimodal real-world interaction data, which in turn is used to train even better models. This massive user base and the data it generates form a significant competitive moat for Google.

QWhy is the launch of Google's AI smart glasses considered a potential threat to Apple's iPhone?

AThe smart glasses represent a move towards screen-less, voice/vision-based interaction. Industry reports predict such interactions could rise from 8% to 35% of total usage time in three years. If users adopt glasses and AI assistants like Spark for daily tasks, screen time on devices like the iPhone would be significantly compressed. This challenges Apple's core mobile interface dominance unless it can counter with compelling wearable devices of its own.

Похожее

Why More AI Agents Does Not Equal Higher Productivity?

Editor's Note: As AI Agents become cheaper and easier to use, a new constraint emerges: the cost isn't in launching more Agents, but in the human attention required to manage, judge, and integrate their outputs. This hidden cost is called the "orchestration tax." The article argues that a developer's cognitive bandwidth is the key bottleneck—a serial, non-parallelizable resource akin to a Global Interpreter Lock (GIL). While many Agents can run concurrently, their results ultimately require human judgment for review, conflict resolution, and final integration. Therefore, more Agents don't automatically mean higher productivity; they can simply create longer queues, lead to cognitive fatigue, and create the illusion of busyness without real output. The core solution is to design workflows around this scarce human attention. Key strategies include: scaling the number of Agents to match review capacity (not UI capacity), categorizing tasks (delegating independent ones, keeping complex judgment-heavy ones serial), batch reviewing results to minimize context-switching costs, automating verifiable checks to reserve human judgment for critical decisions, and protecting focused, uninterrupted thinking time. Ultimately, the critical skill is not launching many Agents, but architecting systems that respect the fundamental limit of human attention. Unpaid "orchestration tax" accumulates as both technical and cognitive debt, undermining system understanding and quality. True productivity comes from thoughtfully managing the single-threaded resource—your focus.

marsbit1 ч. назад

Why More AI Agents Does Not Equal Higher Productivity?

marsbit1 ч. назад

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

Three Years Later: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's launch, I made 20 predictions about its future. Now, in mid-2026, I've used AI agents to fact-check each one against the latest data. Overall, most major directional forecasts were correct, with only one outright error (incorrectly stating GPT-4 had 100 trillion parameters). Key successes included predicting that RAG and retrieval architectures would become the standard for handling knowledge and hallucinations, that natural language interfaces (LUI) would create a massive new industry layer beyond the models themselves, and that China would develop viable large language models, significantly closing the performance gap with Western counterparts within about three years. Predictions about the absence of mass unemployment, the rise of a new "robot network" for agent communication, and ChatGPT not possessing consciousness also held true in their core arguments. However, the "devil was in the details." Errors frequently involved specific numbers, timelines, or overlooking distributional effects. I tended to overestimate the speed of adoption (e.g., for agent networks) while underestimating the ultimate scale of capabilities or costs (e.g., AI winning IMO gold without tools, or the extreme capital required for frontier models). Other misjudgments included: underestimating how AI would reinforce, not dissolve, information filter bubbles; incorrectly assuming AI-generated content would easily circumvent copyright (it has instead triggered record-breaking settlements); and misidentifying where value would be captured (it accrued overwhelmingly to the compute layer, like Nvidia, not just the application or model layers). Key lessons from reviewing these predictions are: 1) Directional and mechanistic insights are far more reliable than precise numbers or absolute statements. 2) There's a consistent bias to overestimate short-term speed but underestimate long-term magnitude. 3) Errors often lie in missing distributional impacts within a generally correct aggregate trend. 4) Predictions phrased with nuance and caveats aged the best. 5) Some fundamental debates (e.g., on machine consciousness or the ultimate value chain) remain unresolved even after three years. This exercise is less about scoring the past and more about establishing rules for clearer thinking about the next three years of AI.

marsbit7 ч. назад

Three Years Later: Looking Back at My Predictions About ChatGPT in 2023

marsbit7 ч. назад

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

Looking Back After Three Years: Revisiting My 2023 Predictions on ChatGPT In March 2023, shortly after ChatGPT's debut and before GPT-4's release, I made over twenty predictions about AI's future based on limited information and intuition. Now, in May 2026, I revisited those forecasts using an AI-driven analysis with 41 Opus 4.8 agents to cross-reference them with the latest data. The assessment used symbols: ✅ Correct, 🟢 Mostly Correct, 🟡 Partially Correct, ❌ Incorrect. Overall, the directional judgments held up well, with only one major factual error regarding GPT-4's rumored parameter size (incorrectly cited as 100T). However, nuances and degrees of accuracy revealed more. **What Was Largely Correct:** Predictions about mechanisms and directions proved accurate. The rise of RAG (Retrieval-Augmented Generation) as the standard architecture for combating AI hallucination was confirmed, as was the transformative potential of LUI (Language User Interface) in creating a new industry layer atop GUIs. The emergence of "robot networks" (agent-to-agent communication protocols) and China's rapid catch-up in developing capable large models (closing the performance gap with top models to ~2.7%) were also on point. The analysis affirmed that LLMs lack consciousness and that the Turing Test merely measures perceived intelligence. **What Was Off Target:** Errors often involved specific numbers, over-optimistic timelines, or misjudged distributions. The prediction that value would primarily accrue to the application layer was half-right but missed NVIDIA's dominance as the profitable infrastructure layer. Forecasts about AI circumventing copyright issues and fostering a "global common ground" by averaging human viewpoints were incorrect; instead, major copyright settlements occurred and AI personalization is increasing. Estimates for model training costs ("$5-10 billion cap") were significantly off, underestimating frontier costs and overestimating replication costs. The notion that LLMs could never do complex math without tools was disproven by later models winning IMO gold. **Key Patterns from the Review:** 1. **Direction over precision:** Judgments about mechanisms and trends were more reliable than specific numbers or definitive statements. 2. **Timing bias:** There was a tendency to overestimate short-term speed but underestimate long-term magnitude and transformation. 3. **The distribution blind spot:** Aggregate-level correctness often masked uneven impacts (e.g., on young professionals' employment). 4. **The value of qualifiers:** Predictions framed with caution (e.g., "reportedly," "for now," "prototype in 2-3 years") aged better. 5. **Some debates continue:** Issues like the nature of "emergent abilities" or machine consciousness remain unresolved. This three-year review highlights that while seeing the big picture is crucial, humility regarding specifics, timelines, and disparate impacts is essential for future forecasting.

链捕手10 ч. назад

Three Years Later: Looking Back on My 2023 Predictions for ChatGPT

链捕手10 ч. назад

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

The article issues a stark warning about a potential AI investment bubble. It notes that while the AI boom shares similarities with the TMT bubble of the late 1990s, its scale is vastly larger, currently driving 93% of U.S. GDP growth. Major hyperscale cloud providers like Microsoft, Alphabet, Amazon, Meta, and Oracle are planning to invest trillions in AI data centers over the coming years. However, calculations based on analyst projections for 2025-2030 reveal a concerning math problem: expected capital expenditure growth far outpaces projected revenue growth. Even under an extremely optimistic scenario of zero costs, the implied return on investment for most of these tech giants (except Amazon) is deeply negative. This suggests that the current trajectory could lead to one of history's largest shareholder value destruction events. The piece outlines two potential escapes: AI generating vastly more revenue than currently anticipated—a near-impossible task—or a significant cutback in the planned investment splurge. The latter scenario could trigger a domino effect, severely impacting the entire tech supply chain (from Nvidia to TSMC), potentially pushing the U.S. economy into recession, and causing a major stock market downturn. The author suggests upcoming high-profile IPOs by companies like OpenAI and Anthropic might represent a transfer of risk from early investors to public market participants. While the peak of the hype cycle might sustain investment through 2026, the fundamental financial dilemma remains unresolved, setting the stage for a potential market correction in 2027 or 2028, similar to the years following Alan Greenspan's "irrational exuberance" warning.

marsbit11 ч. назад

AI Bubble Warning: AI Investments Are Negative Returns for Most Tech Giants

marsbit11 ч. назад

Торговля

Спот
Фьючерсы

Популярные статьи

Как купить WAR

Добро пожаловать на HTX.com! Мы сделали приобретение WAR (WAR) простым и удобным. Следуйте нашему пошаговому руководству и отправляйтесь в свое крипто-путешествие.Шаг 1: Создайте аккаунт на HTXИспользуйте свой адрес электронной почты или номер телефона, чтобы зарегистрироваться и бесплатно создать аккаунт на HTX. Пройдите удобную регистрацию и откройте для себя весь функционал.Создать аккаунтШаг 2: Перейдите в Купить криптовалюту и выберите свой способ оплатыКредитная/Дебетовая Карта: Используйте свою карту Visa или Mastercard для мгновенной покупки WAR (WAR).Баланс: Используйте средства с баланса вашего аккаунта HTX для простой торговли.Третьи Лица: Мы добавили популярные способы оплаты, такие как Google Pay и Apple Pay, для повышения удобства.P2P: Торгуйте напрямую с другими пользователями на HTX.Внебиржевая Торговля (OTC): Мы предлагаем индивидуальные услуги и конкурентоспособные обменные курсы для трейдеров.Шаг 3: Хранение WAR (WAR)После приобретения вами WAR (WAR) храните их в своем аккаунте на HTX. В качестве альтернативы вы можете отправить их куда-либо с помощью перевода в блокчейне или использовать для торговли с другими криптовалютами.Шаг 4: Торговля WAR (WAR)С легкостью торгуйте WAR (WAR) на спотовом рынке HTX. Просто зайдите в свой аккаунт, выберите торговую пару, совершайте сделки и следите за ними в режиме реального времени. Мы предлагаем удобный интерфейс как для начинающих, так и для опытных трейдеров.

341 просмотров всегоОпубликовано 2024.12.11Обновлено 2026.04.28

Как купить WAR

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на WAR (WAR) представлены ниже.

活动图片