# Inference Articoli collegati

Il Centro Notizie HTX fornisce gli articoli più recenti e le analisi più approfondite su "Inference", coprendo tendenze di mercato, aggiornamenti sui progetti, sviluppi tecnologici e politiche normative nel settore crypto.

The First Year of Computing Power Inflation: The Cheaper DeepSeek Gets, the Harder It Is to Stop This Round of Price Hikes

The year 2026 marks the beginning of "computing power inflation." While AI inference costs have dropped by over 80% in 18 months globally, China's three major cloud providers—Alibaba Cloud, Baidu AI Cloud, and Tencent Cloud—simultaneously announced price hikes of 20–30%. This reflects a deeper structural shift driven by Jevons Paradox: as unit costs fall (e.g., via models like DeepSeek-R1), demand explodes, especially with the rise of reasoning models and AI agents that consume 10–50x more tokens per task. Although DeepSeek open-sourced its model weights, it did not release its inference optimization stack, leaving a significant engineering efficiency gap between cloud providers and smaller players. The big three are leveraging this advantage to reposition: Alibaba focuses on high-margin premium clients, Baidu filters out low-value users, and Tencent capitalizes on ecosystem lock-in. Meanwhile, ByteDance’s Volcano Engine adopts a more moderate pricing strategy to capture displaced customers. Unexpectedly, the price surge is pushing large enterprises toward self-built computing solutions once their cloud bills exceed a certain threshold. While cloud providers aim to boost profitability, they risk driving away innovative startups and accelerating competition from GPU leasing and domestic hardware providers like Huawei. The涨价 trend is expected to persist for 2–3 years, fueled by rising token consumption from reasoning models, AI agent adoption, and NVIDIA export restrictions. The inflection point depends on whether domestic chips can match NVIDIA’s efficiency, likely around 2027–2028. Until then, cloud providers will maintain pricing power, and the key for AI companies is to optimize token usage—the real moat in this era.

marsbitIeri 01:16

The First Year of Computing Power Inflation: The Cheaper DeepSeek Gets, the Harder It Is to Stop This Round of Price Hikes

marsbitIeri 01:16

The DeepSeek You've Been Waiting For Has Long Changed

The article discusses the delayed release of DeepSeek V4, a highly anticipated AI model in China, and explores the reasons behind its slowed development. Initially a leader in the global AI race, DeepSeek has fallen behind competitors like OpenAI, Anthropic, and Google, which release major updates every few months. A key factor is DeepSeek's shift in focus due to national strategic priorities. In early 2025, the Chinese government encouraged the company to use Huawei’s Ascend processors instead of NVIDIA’s GPUs, aligning with broader efforts to achieve technological self-reliance. DeepSeek attempted to train its models on Huawei’s Ascend 910C chips but faced technical challenges, including instability and communication issues during distributed training. As a result, the company continued using NVIDIA hardware for training while only using Ascend chips for inference. In 2026, DeepSeek prioritized adapting V4 to Huawei’s new Ascend 950PR and Cambricon chips, aiming for a full migration from NVIDIA’s CUDA to Huawei’s CANN framework. This adaptation process, particularly ensuring precision alignment across hardware, consumed significant time and resources, slowing down model iteration. The delay also reflects DeepSeek’s evolving role from a purely market-driven entity to a "national mission-oriented" company. This shift has come at a cost: the model now lags behind competitors in areas like code generation and multimodal capabilities, and the company has faced talent drain, with key researchers leaving for better-paying opportunities at larger tech firms. Despite these challenges, V4’s release is seen as a potential milestone for China’s AI industry, demonstrating that advanced models can run on domestic hardware ecosystems. While it may not be a groundbreaking model in terms of performance, its success could validate China’s broader strategy for AI independence.

marsbit2 giorni fa 10:32

The DeepSeek You've Been Waiting For Has Long Changed

marsbit2 giorni fa 10:32

Stop Staring at GPUs: CPUs Are Becoming the 'New Bottleneck' in the AI Era

In the AI era, while GPUs have long been the focus for computational power, the narrative is shifting as CPUs are increasingly becoming the new bottleneck. By 2026, system performance is more dependent on execution and scheduling capabilities, with CPUs playing a critical role in enabling AI operations. A supply crisis is emerging, with server CPU prices rising about 30% in Q4 2025 due to high demand and production constraints, as GPU orders compete for limited semiconductor capacity. Companies like Google and Intel have deepened collaborations, and Elon Musk is investing in custom CPU solutions for his ventures, highlighting the strategic importance of CPU infrastructure. The shift is driven by the rise of agentic AI, where CPUs handle tasks such as multi-step reasoning, API calls, and data I/O, accounting for 50–90.6% of total latency in intelligent workloads. Expanding context windows in AI models further strain GPU memory, necessitating CPU offloading for key-value cache management. Major players are adopting varied strategies: Intel is strengthening its Xeon processor line and partnerships; AMD is benefiting from increased demand, with server CPU revenue surpassing 40%; and NVIDIA is designing CPUs like Grace to optimize GPU-CPU synergy through high-speed interconnects. The industry is witnessing a rebalancing of compute infrastructure, with CPUs gaining prominence as essential enablers of scalable AI agent systems. By 2030, the CPU market is projected to double to $60 billion, driven largely by AI demands. The focus is now on overcoming system-level bottlenecks to maximize the efficiency and economic viability of AI deployments.

marsbit04/13 00:57

Stop Staring at GPUs: CPUs Are Becoming the 'New Bottleneck' in the AI Era

marsbit04/13 00:57

Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

Google's newly open-sourced Gemma 4 model, built on the same architecture as Gemini 3, has gained significant attention for its ability to run locally on mobile devices like the iPhone and Samsung Galaxy. With smaller versions such as E2B (2.3B parameters) and E4B (4.5B parameters), it supports native multimodal capabilities and offers a 128K context window. Users report impressive speeds—over 40 tokens per second on Apple chips with MLX optimization—making it feel "like magic." The model is accessible via Google’s official AI Edge Gallery app, ensuring ease of use and security. While Gemma 4 excels in tasks like text generation, coding, and image understanding, it struggles with more complex agent-based workflows, such as tool calling and structured outputs, where models like Qwen3-coder perform better. Despite some limitations in reasoning, Gemma 4’s local performance hints at a future where everyday AI tasks—chat, coding, reasoning—can be handled offline, reducing reliance on cloud-based token services. Although cloud models still lead in advanced reasoning and large-scale multi-agent tasks, the trend suggests that as hardware and quantization improve, on-device models will increasingly handle high-frequency simple tasks. This shift could disrupt the AI industry’s reliance on token sales and API subscriptions, pushing providers to focus on more complex, data-intensive capabilities. Gemma 4 is just the beginning of this transformation.

marsbit04/06 05:53

Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

marsbit04/06 05:53

NVIDIA's Market Share in China Drops Below 60%, Domestic AI Chips Seize Market with 1.65 Million Units Delivered Annually

Nvidia's market share in China's AI accelerator card market has declined significantly, dropping from approximately 95% to 55% in 2025, according to IDC data. During the same period, domestic Chinese manufacturers collectively captured 41% of the market, shipping 1.65 million units out of a total market of 4 million units. Huawei led the domestic suppliers with 812,000 units shipped, representing nearly half of the local market share. This shift is driven by both U.S. export controls and China’s aggressive domestic substitution policies. In November 2025, Beijing mandated that state-funded data centers must use domestic AI chips, accelerating the adoption of local alternatives. Huawei recently launched the Atlas 350 accelerator card, claiming 2.87 times the inference performance of Nvidia’s H20 in low-precision computing, though direct comparisons are complicated by architectural differences. While Chinese chips still lag behind in training large-scale AI models—estimated to be 5-10 years behind Nvidia—they have reached a "good enough" level for many commercial applications like inference tasks. The main challenge remains software ecosystem development, as Nvidia’s CUDA platform remains the industry standard. Chinese firms are responding with compatibility efforts and open-source initiatives. Several domestic AI chip companies are now pursuing IPOs, and Huawei continues heavy R&D spending to reduce foreign dependency. Even if U.S. export policies ease, the structural move toward domestic AI chips appears irreversible.

marsbit04/03 05:51

NVIDIA's Market Share in China Drops Below 60%, Domestic AI Chips Seize Market with 1.65 Million Units Delivered Annually

marsbit04/03 05:51

活动图片