# Inference Articoli collegati

Il Centro Notizie HTX fornisce gli articoli più recenti e le analisi più approfondite su "Inference", coprendo tendenze di mercato, aggiornamenti sui progetti, sviluppi tecnologici e politiche normative nel settore crypto.

Cloudflare Unleashes a Barrage of Updates: Unified Inference Layer Integrates 70+ Models, Email Service Enables AI Agents to Send and Receive Emails

Cloudflare launched "Agents Week 2026," introducing over a dozen product updates aimed at positioning itself as a core infrastructure provider for the AI agent era. Key announcements include the unification of its AI Gateway and Workers AI into a single AI Platform, offering developers a unified API to access more than 70 models from over 12 providers—including OpenAI, Anthropic, Google, Alibaba Cloud, and ByteDance—with a single line of code for model switching, consolidated billing, and automatic failover. Additionally, Cloudflare Email Service entered public beta, enabling AI agents to natively send and receive emails, facilitating asynchronous task handling and communication. Other releases include an updated Agents SDK with persistent state support, general availability of Sandboxes, Git-compatible storage (Artifacts), AI Search primitives, and an experimental voice pipeline. Together, these updates provide a comprehensive stack for agent development, covering compute, inference, storage, communication, and security.

marsbitIeri 07:44

Cloudflare Unleashes a Barrage of Updates: Unified Inference Layer Integrates 70+ Models, Email Service Enables AI Agents to Send and Receive Emails

marsbitIeri 07:44

The First Year of Computing Power Inflation: The Cheaper DeepSeek Gets, the Harder It Is to Stop This Round of Price Hikes

The year 2026 marks the beginning of "computing power inflation." While AI inference costs have dropped by over 80% in 18 months globally, China's three major cloud providers—Alibaba Cloud, Baidu AI Cloud, and Tencent Cloud—simultaneously announced price hikes of 20–30%. This reflects a deeper structural shift driven by Jevons Paradox: as unit costs fall (e.g., via models like DeepSeek-R1), demand explodes, especially with the rise of reasoning models and AI agents that consume 10–50x more tokens per task. Although DeepSeek open-sourced its model weights, it did not release its inference optimization stack, leaving a significant engineering efficiency gap between cloud providers and smaller players. The big three are leveraging this advantage to reposition: Alibaba focuses on high-margin premium clients, Baidu filters out low-value users, and Tencent capitalizes on ecosystem lock-in. Meanwhile, ByteDance’s Volcano Engine adopts a more moderate pricing strategy to capture displaced customers. Unexpectedly, the price surge is pushing large enterprises toward self-built computing solutions once their cloud bills exceed a certain threshold. While cloud providers aim to boost profitability, they risk driving away innovative startups and accelerating competition from GPU leasing and domestic hardware providers like Huawei. The涨价 trend is expected to persist for 2–3 years, fueled by rising token consumption from reasoning models, AI agent adoption, and NVIDIA export restrictions. The inflection point depends on whether domestic chips can match NVIDIA’s efficiency, likely around 2027–2028. Until then, cloud providers will maintain pricing power, and the key for AI companies is to optimize token usage—the real moat in this era.

marsbitIeri 01:16

The First Year of Computing Power Inflation: The Cheaper DeepSeek Gets, the Harder It Is to Stop This Round of Price Hikes

marsbitIeri 01:16

The DeepSeek You've Been Waiting For Has Long Changed

The article discusses the delayed release of DeepSeek V4, a highly anticipated AI model in China, and explores the reasons behind its slowed development. Initially a leader in the global AI race, DeepSeek has fallen behind competitors like OpenAI, Anthropic, and Google, which release major updates every few months. A key factor is DeepSeek's shift in focus due to national strategic priorities. In early 2025, the Chinese government encouraged the company to use Huawei’s Ascend processors instead of NVIDIA’s GPUs, aligning with broader efforts to achieve technological self-reliance. DeepSeek attempted to train its models on Huawei’s Ascend 910C chips but faced technical challenges, including instability and communication issues during distributed training. As a result, the company continued using NVIDIA hardware for training while only using Ascend chips for inference. In 2026, DeepSeek prioritized adapting V4 to Huawei’s new Ascend 950PR and Cambricon chips, aiming for a full migration from NVIDIA’s CUDA to Huawei’s CANN framework. This adaptation process, particularly ensuring precision alignment across hardware, consumed significant time and resources, slowing down model iteration. The delay also reflects DeepSeek’s evolving role from a purely market-driven entity to a "national mission-oriented" company. This shift has come at a cost: the model now lags behind competitors in areas like code generation and multimodal capabilities, and the company has faced talent drain, with key researchers leaving for better-paying opportunities at larger tech firms. Despite these challenges, V4’s release is seen as a potential milestone for China’s AI industry, demonstrating that advanced models can run on domestic hardware ecosystems. While it may not be a groundbreaking model in terms of performance, its success could validate China’s broader strategy for AI independence.

marsbit2 giorni fa 10:32

The DeepSeek You've Been Waiting For Has Long Changed

marsbit2 giorni fa 10:32

Stop Staring at GPUs: CPUs Are Becoming the 'New Bottleneck' in the AI Era

In the AI era, while GPUs have long been the focus for computational power, the narrative is shifting as CPUs are increasingly becoming the new bottleneck. By 2026, system performance is more dependent on execution and scheduling capabilities, with CPUs playing a critical role in enabling AI operations. A supply crisis is emerging, with server CPU prices rising about 30% in Q4 2025 due to high demand and production constraints, as GPU orders compete for limited semiconductor capacity. Companies like Google and Intel have deepened collaborations, and Elon Musk is investing in custom CPU solutions for his ventures, highlighting the strategic importance of CPU infrastructure. The shift is driven by the rise of agentic AI, where CPUs handle tasks such as multi-step reasoning, API calls, and data I/O, accounting for 50–90.6% of total latency in intelligent workloads. Expanding context windows in AI models further strain GPU memory, necessitating CPU offloading for key-value cache management. Major players are adopting varied strategies: Intel is strengthening its Xeon processor line and partnerships; AMD is benefiting from increased demand, with server CPU revenue surpassing 40%; and NVIDIA is designing CPUs like Grace to optimize GPU-CPU synergy through high-speed interconnects. The industry is witnessing a rebalancing of compute infrastructure, with CPUs gaining prominence as essential enablers of scalable AI agent systems. By 2030, the CPU market is projected to double to $60 billion, driven largely by AI demands. The focus is now on overcoming system-level bottlenecks to maximize the efficiency and economic viability of AI deployments.

marsbit04/13 00:57

Stop Staring at GPUs: CPUs Are Becoming the 'New Bottleneck' in the AI Era

marsbit04/13 00:57

Why Are GPU Prices Spiraling Out of Control?

GPU prices are surging due to a fundamental shift in market dynamics, driven by AI's transition from a tool to core infrastructure. Demand is exploding from multi-agent systems, AI-generated content, and coding tools like Claude Code, causing token consumption growth. This has led to a severe GPU shortage, with H100 one-year lease prices rising nearly 40% from late 2025 to early 2026. Supply is constrained further by component cost increases (e.g., DRAM, NAND) and extended delivery times for new clusters, many pre-booked into late 2026. The market is dominated by long-term contracts, with AI labs locking in capacity for 4-5 years. High ROI (5-10x) from AI tools makes demand relatively inelastic to price hikes. Neocloud providers now hold pricing power, and the divergence between physical scarcity and market expectations of future oversupply is reshaping valuation logic. Key factors to watch: GB300 cluster deployment pace, chip supply chain stability, and AI lab revenue growth.

marsbit04/06 13:24

Why Are GPU Prices Spiraling Out of Control?

marsbit04/06 13:24

Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

Google's newly open-sourced Gemma 4 model, built on the same architecture as Gemini 3, has gained significant attention for its ability to run locally on mobile devices like the iPhone and Samsung Galaxy. With smaller versions such as E2B (2.3B parameters) and E4B (4.5B parameters), it supports native multimodal capabilities and offers a 128K context window. Users report impressive speeds—over 40 tokens per second on Apple chips with MLX optimization—making it feel "like magic." The model is accessible via Google’s official AI Edge Gallery app, ensuring ease of use and security. While Gemma 4 excels in tasks like text generation, coding, and image understanding, it struggles with more complex agent-based workflows, such as tool calling and structured outputs, where models like Qwen3-coder perform better. Despite some limitations in reasoning, Gemma 4’s local performance hints at a future where everyday AI tasks—chat, coding, reasoning—can be handled offline, reducing reliance on cloud-based token services. Although cloud models still lead in advanced reasoning and large-scale multi-agent tasks, the trend suggests that as hardware and quantization improve, on-device models will increasingly handle high-frequency simple tasks. This shift could disrupt the AI industry’s reliance on token sales and API subscriptions, pushing providers to focus on more complex, data-intensive capabilities. Gemma 4 is just the beginning of this transformation.

marsbit04/06 05:53

Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

marsbit04/06 05:53

NVIDIA's Market Share in China Drops Below 60%, Domestic AI Chips Seize Market with 1.65 Million Units Delivered Annually

Nvidia's market share in China's AI accelerator card market has declined significantly, dropping from approximately 95% to 55% in 2025, according to IDC data. During the same period, domestic Chinese manufacturers collectively captured 41% of the market, shipping 1.65 million units out of a total market of 4 million units. Huawei led the domestic suppliers with 812,000 units shipped, representing nearly half of the local market share. This shift is driven by both U.S. export controls and China’s aggressive domestic substitution policies. In November 2025, Beijing mandated that state-funded data centers must use domestic AI chips, accelerating the adoption of local alternatives. Huawei recently launched the Atlas 350 accelerator card, claiming 2.87 times the inference performance of Nvidia’s H20 in low-precision computing, though direct comparisons are complicated by architectural differences. While Chinese chips still lag behind in training large-scale AI models—estimated to be 5-10 years behind Nvidia—they have reached a "good enough" level for many commercial applications like inference tasks. The main challenge remains software ecosystem development, as Nvidia’s CUDA platform remains the industry standard. Chinese firms are responding with compatibility efforts and open-source initiatives. Several domestic AI chip companies are now pursuing IPOs, and Huawei continues heavy R&D spending to reduce foreign dependency. Even if U.S. export policies ease, the structural move toward domestic AI chips appears irreversible.

marsbit04/03 05:51

NVIDIA's Market Share in China Drops Below 60%, Domestic AI Chips Seize Market with 1.65 Million Units Delivered Annually

marsbit04/03 05:51

AI's Cost Dilemma: How Infrastructure Economics Will Reshape the Next Phase of the Market

AI is expanding, but its underlying economic model is fragile. While training cutting-edge models like Claude 3.5 Sonnet costs tens of millions—with future models potentially reaching $1 billion—the real burden is inference costs, which accumulate with each API call and strain startups. Three cloud giants—AWS, Azure, and Google Cloud—control two-thirds of global cloud infrastructure, creating market concentration and supply risks. Top AI labs secure GPU access at near-cost rates (as low as $1.30–$1.90/hour) via strategic partnerships, while smaller players pay retail prices exceeding $14/hour—a 600% premium. Energy consumption is another challenge: data centers already use 1–1.5% of global electricity, and AI’s growth will intensify this demand. Decentralized inference networks like Gonka offer an alternative, aiming to reduce costs (e.g., $0.0009 per million tokens vs. $1.50 for centralized services), increase supply elasticity, and enhance sovereignty by leveraging idle GPUs globally. The AI infrastructure war is just beginning. Centralized providers hold scale advantages, but economic pressures may drive adoption of decentralized models, reshaping value distribution in the AI industry.

marsbit03/26 08:14

AI's Cost Dilemma: How Infrastructure Economics Will Reshape the Next Phase of the Market

marsbit03/26 08:14

Bittensor (TAO) Bearish Logic: The Income Desert Beneath the Computing Power Myth

Bittensor (TAO), currently priced around $275 with a market cap of $2.6 billion, faces significant challenges in generating sustainable external revenue despite its strong narrative around token scarcity, institutional backing, and AI decentralization. The network relies heavily on inflationary token emissions to subsidize operations, with an estimated $36 million in annual incentives dwarfing confirmed external revenues of only $3–15 million. Key subnet Chutes (SN64), for example, receives subsidies 22–40 times higher than its actual revenue. Without subsidies, its costs would exceed those of centralized competitors. The network lacks transparent demand-side metrics, and its subnets operate in a highly competitive environment, squeezed between low-cost self-hosted solutions and heavily subsidized cloud giants. While TAO’s value is driven by speculative factors like scarcity, halving events, and ETF prospects, its fundamental economic model shows little evidence of long-term viability or competitive advantage in the AI services market.

marsbit03/24 10:10

Bittensor (TAO) Bearish Logic: The Income Desert Beneath the Computing Power Myth

marsbit03/24 10:10

AI Agent Got Its ID and Wallet on the Same Day｜Rewire Morning News

AI Agent Economy Accelerates with Key Infrastructure Launches. In a significant week for AI agents, Sam Altman's Worldcoin launched AgentKit, enabling AI agents to carry cryptographic proof of human identity, with 17.9 million users verified. On the same day, the protocol integrated with the x402 system from Coinbase and Cloudflare, allowing agents to perform micro-payments using stablecoins autonomously. Nvidia further solidified the infrastructure by releasing its Retail Agentic Commerce Blueprint, a full implementation of OpenAI's commerce protocol, enabling AI models like ChatGPT to handle entire shopping journeys. Industry projections suggest agent-driven commerce could reach $3-5 trillion by 2030. In other news, major tech industry associations filed a legal brief supporting Anthropic in its dispute with the Pentagon over the right to restrict government use of its AI. The Fed's March meeting concluded with rates held steady, while rising energy prices from the Iran conflict cloud the inflation outlook. Nvidia also unveiled its AI Grid, aiming to repurpose telecom networks into distributed推理 grids, and announced DLSS 5, a major graphics leap for 2026. The ongoing Iran war continues to strain global budgets and supply chains, with costs estimated between $40-95 billion.

marsbit03/18 04:55

AI Agent Got Its ID and Wallet on the Same Day｜Rewire Morning News

marsbit03/18 04:55

# Inference Articoli collegati

Cloudflare Unleashes a Barrage of Updates: Unified Inference Layer Integrates 70+ Models, Email Service Enables AI Agents to Send and Receive Emails

The First Year of Computing Power Inflation: The Cheaper DeepSeek Gets, the Harder It Is to Stop This Round of Price Hikes

The DeepSeek You've Been Waiting For Has Long Changed

Stop Staring at GPUs: CPUs Are Becoming the 'New Bottleneck' in the AI Era

Why Are GPU Prices Spiraling Out of Control?

Running Gemma 4 Locally on iPhone Goes Viral: How Far Are We from the Zero Token Era?

NVIDIA's Market Share in China Drops Below 60%, Domestic AI Chips Seize Market with 1.65 Million Units Delivered Annually

AI's Cost Dilemma: How Infrastructure Economics Will Reshape the Next Phase of the Market

Bittensor (TAO) Bearish Logic: The Income Desert Beneath the Computing Power Myth

AI Agent Got Its ID and Wallet on the Same Day｜Rewire Morning News

Project Updates