Artículos Relacionados con Inference

El Centro de Noticias de HTX ofrece los artículos más recientes y un análisis profundo sobre "Inference", cubriendo tendencias del mercado, actualizaciones de proyectos, desarrollos tecnológicos y políticas regulatorias en la industria de cripto.

The Computing Power Dilemma in the Sino-US AI Rivalry

The Sino-US AI rivalry faces a fundamental bottleneck: the widening compute power gap. While Chinese AI chip companies have seen investment surges, their current focus remains largely on the less demanding inference market. The real challenge lies in the high-end training chip sector, crucial for developing cutting-edge large language models (LLMs), where Nvidia holds a near-monopoly. The compute disparity is stark. US tech giants like Meta, Google, and xAI command massive GPU clusters, enabling them to train trillion-parameter models rapidly. Estimates suggest US data center count and total compute capacity significantly outstrip China's. This "brute force" advantage allows for faster model iteration and exploration of larger parameter scales, with top US models reportedly leading their Chinese counterparts by 8 to 15 months. Chinese alternatives, such as Huawei's Ascend and others from companies like Moore Thread and Biren, are emerging. They show promise in inference and some training scenarios, closing the performance gap with mid-range Nvidia products. However, the core hurdle extends beyond raw chip performance to the entrenched software ecosystem, exemplified by Nvidia's CUDA platform. The path forward involves "walking on two legs": navigating import restrictions while heavily investing in the domestic chip industry. Though still in a catch-up phase, China's vast market, talent pool, and capital are fostering progress. The ultimate test is whether Chinese firms can build a competitive hardware-software ecosystem to power the next generation of AI.

marsbitHace 4 hora(s)

The Computing Power Dilemma in the Sino-US AI Rivalry

marsbitHace 4 hora(s)

Idle Macs Can Also Make Money? An Overview of Eigen Labs' Decentralized AI Inference Network Darkbloom

AI inference is becoming a crucial layer of internet infrastructure, yet it remains largely dependent on costly, capacity-limited centralized systems with potential security risks. Meanwhile, millions of powerful computers sit idle globally. Eigen Labs' Darkbloom network aims to utilize this idle capacity by enabling distributed AI inference on Mac computers, specifically those with Apple Silicon chips. Darkbloom's architecture consists of three components: users who send inference requests, a coordinator (operated by Eigen Labs) that routes these requests, and providers (Mac owners) whose machines run the models and return outputs without being able to see the request content. The system prioritizes privacy through a hardened provider process, software integrity checks, and hardware-supported attestation based on Apple's security architecture to ensure verifiable privacy. Economically, Darkbloom differs from traditional models. It leverages existing hardware, with marginal costs primarily driven by electricity, allowing it to offer pricing roughly 50% lower than major API aggregators. Providers keep 100% of the inference revenue, and the project does not rely on token subsidies; earnings come solely from real AI inference demand. However, early-stage earnings are modest, with top providers currently earning under $6 per day, influenced by factors like hardware specs, uptime, and network demand. The network currently supports models like Google's Gemma 4 and OpenAI's GPT-OSS via OpenRouter. To participate as a provider, users need an Apple Silicon Mac running macOS 14 or later, must install the Darkbloom provider software, and keep the machine online with a stable internet connection.

marsbitHace 7 hora(s)

Idle Macs Can Also Make Money? An Overview of Eigen Labs' Decentralized AI Inference Network Darkbloom

marsbitHace 7 hora(s)

CPU Makes a Comeback to the Table, A $170 Billion "Power Seizure" Drama Begins

A new era is dawning for the server CPU (Central Processing Unit), driven by the shift from AI model training to large-scale reasoning and the rise of Agentic AI. This article explores how the CPU is reclaiming a central role in the AI data center. For years, the focus has been on the GPU (Graphics Processing Unit) for AI training. However, as AI moves to the inference and Agent phase—where tasks involve complex, multi-step reasoning, tool calls, and data management—the workload balance is flipping. Studies show CPUs now handle over 70% of the workload in Agentic AI, up from 10-30% in training. This is because Agent tasks generate massive intermediate data (KV Cache) that exceeds GPU memory, forcing it to be offloaded to the CPU's larger, more scalable memory pools. This increased importance is translating into market changes. Major players are taking note: NVIDIA launched its first standalone CPU line, Vera, based on ARM architecture and optimized for Agent performance. AMD doubled its server CPU market forecast to over $1200 billion by 2030. Analyst reports project the total server CPU market could reach $1700 billion by 2030, with AI-driven demand being a primary driver. Furthermore, the classic ratio of CPUs to GPUs in AI servers is rapidly changing, converging from 1:8 toward 1:1 for Agent deployments. This surge in demand has led to a rare industry-wide price increase of 10-15% for server CPUs from Intel and AMD, breaking a decade-long trend of "more performance for the same price." Demand is bifurcating into high-core-count CPUs for in-rack GPU support and moderate-core CPUs for standalone Agent task orchestration. In China, this global trend presents an opportunity for domestic CPU manufacturers like Hygon (海光信息) and Huawei Kunpeng, who are bolstered by both growing AI infrastructure needs and national policies promoting technological self-reliance ("xin chuang"). The maturity of their software ecosystems is also accelerating, evidenced by faster adaptation to new AI models. In conclusion, the narrative is shifting from a GPU-centric view to one where CPU-GPU synergy is critical. The CPU is no longer a peripheral component but a performance-defining bottleneck and a key growth driver in the AI hardware stack, opening a massive new market estimated in the hundreds of billions of dollars.

marsbit06/19 13:41

CPU Makes a Comeback to the Table, A $170 Billion "Power Seizure" Drama Begins

marsbit06/19 13:41

BitTorrent Launches BTTInferGrid: The Decentralized Infrastructure Layer for Scalable AI Inference

BitTorrent has launched BTTInferGrid, a decentralized GPU computing network designed to meet the surging demand for AI inference workloads. The platform aggregates global idle GPU resources into an open-access, verifiable, and pay-as-you-go infrastructure, aiming to solve the cost, scalability, and supply bottlenecks of traditional centralized cloud providers. BTTInferGrid addresses a key market shift, as industry forecasts indicate over 70% of future AI compute will be for inference—a continuous operational cost. It tackles centralization issues like inflexible resource allocation during volatile demand, prohibitive GPU pricing, and the underutilization of fragmented global compute capacity. The platform establishes a direct corridor between AI developers and idle hardware. On the supply side, it allows providers to monetize underutilized GPUs through tokenized incentives. On the demand side, it offers developers cost-efficient, on-demand inference with on-chain verification. Key differentiators include permissionless access for providers, verifiable service quality through blockchain validation, and a sustainable, demand-driven economic model. Built on BitTorrent's proven DePIN expertise from the BitTorrent File System (BTFS), BTTInferGrid follows a phased roadmap. It begins with network bootstrapping in 2026, focusing on scaling GPU nodes, and aims to evolve into a foundational Web3 AI infrastructure layer by 2028, supporting diverse model architectures and decentralized fine-tuning.

TheNewsCrypto06/18 07:33

BitTorrent Launches BTTInferGrid: The Decentralized Infrastructure Layer for Scalable AI Inference

TheNewsCrypto06/18 07:33

Bernstein Report: Agentic AI Will Transform CPU from Supporting Role to Leading Role, Bullish on Hygon Information

Bernstein research report: Agentic AI will turn CPUs from supporting players to leading roles, bullish on Hygon Information. Analysts led by David Dai argue that AI is transitioning from the chatbot era to the agentic AI era. Unlike simple query-response models, agentic AI involves complex workflows including retrieval, planning, tool calling, and multi-step reasoning. This shift dramatically increases the demand for CPU compute to orchestrate these tasks, manage memory, and prevent expensive GPU idling. The report forecasts that the GPU-to-CPU ratio in inference clusters will reverse from 8:1 in 2025 to 1:1 by 2029. In agentic AI workloads, CPUs could account for 50% of the compute, on par with GPUs. Consequently, the server CPU Total Addressable Market (TAM) is projected to surge from $37 billion in 2025 to $223 billion by 2030, representing a 6x expansion. Arm is identified as a key beneficiary due to its superior performance-per-watt and a strategic shift from IP licensing to designing its own chips, targeting $15 billion in chip revenue by 2030. Bernstein raises Arm's price target to $500. For x86 vendors, the report is Overweight on AMD (target $600) and Hygon Information (target CNY 450), citing leadership and strong growth in the Chinese market respectively. Intel's target is raised to $100, reflecting upgraded earnings assumptions. The analysis acknowledges significant supply-side risks, questioning whether foundry and memory capacity can support such rapid CPU growth. The optimistic demand forecast also heavily relies on Nvidia's guidance for over $1 trillion in annual AI infrastructure spend by 2027.

marsbit06/17 09:46

Bernstein Report: Agentic AI Will Transform CPU from Supporting Role to Leading Role, Bullish on Hygon Information

marsbit06/17 09:46

How Much of the Subscription Fee You Pay to Claude Can Optical Module Companies Get?

How much of your $20 Claude Pro subscription actually goes to AI model companies like Anthropic? A viral breakdown image highlights the fundamental valuation challenge for AI applications versus traditional SaaS. Unlike SaaS with high software margins, AI subscriptions face variable "inference costs": every user query consumes GPU time, power, and cloud resources. This creates a tension between fixed subscription fees and usage-driven expenses. While the specific dollar splits are illustrative, the core question is whether AI revenue can achieve SaaS-like margins as usage scales. Currently, infrastructure providers (cloud platforms, GPU makers like Nvidia, HBM suppliers, power/data centers) capture more certain revenue from growing AI usage. Their financials reflect pricing power and faster earnings validation. The bullish case hinges on efficiency improvements: model optimization, caching, smaller models, and custom chips could lower per-token costs over time. The key debate is whether cost declines can outpace increases in user workload complexity and volume. Ultimately, for AI companies to command high SaaS-like valuations, they must demonstrate not just user growth but also improving gross margins after accounting for inference costs. Investors will scrutinize not just subscriber numbers, but usage patterns, enterprise pricing tiers, and real efficiency gains.

marsbit06/17 03:43

How Much of the Subscription Fee You Pay to Claude Can Optical Module Companies Get?

marsbit06/17 03:43

AMD Launches Compact AI Host, Directly Challenging NVIDIA DGX Spark

In June 2026, AMD announced the Ryzen AI Halo, a compact AI developer desktop to rival NVIDIA's DGX Spark. Both feature 128GB unified memory for running 200B+ parameter models locally. Priced from $2,949 to $3,999, AMD undercuts NVIDIA's $3,999+ DGX Spark. The core divergence lies in architecture and philosophy. Ryzen AI Halo uses an x86-based Ryzen AI Max+ 395 APU (CPU+GPU+NPU), runs standard Windows/Linux, and emphasizes general-purpose PC flexibility. DGX Spark uses an ARM-based Grace Blackwell Superchip, runs a custom DGX OS, and includes a high-speed ConnectX-7 NIC for cluster prototyping, anchoring it to NVIDIA's full-stack CUDA ecosystem. AMD's ROCm software has improved, with simpler installation and support for major frameworks, but still lags behind CUDA's 17-year maturity in community support and cutting-edge library availability. AMD's broader strategy focuses on becoming a viable second-source supplier. Key moves include acquiring design capabilities via ZT Systems (while outsourcing manufacturing) and securing two major 6GW GPU supply deals with OpenAI and Meta in late 2025/early 2026. These contracts validate AMD's role in diversifying the AI supply chain, rather than outright beating NVIDIA. NVIDIA counters with a tightly integrated stack from desktop (DGX Spark) to data center, emphasizing seamless scalability and enterprise software subscriptions (AI Enterprise). In summary, Ryzen AI Halo represents AMD's pragmatic path: offering a cost-effective, open-ecosystem alternative for developers wary of vendor lock-in, while its large data center contracts aim to capture share from customers seeking a second GPU supplier. The choice boils down to a familiar, flexible PC environment with potential software gaps (AMD) versus a premium, optimized, but locked-in ecosystem (NVIDIA).

marsbit06/16 09:14

AMD Launches Compact AI Host, Directly Challenging NVIDIA DGX Spark

marsbit06/16 09:14

2026 Landscape of Decentralized AI: Why is Blockchain the Inevitable "Antidote" for AI?

**The 2026 Landscape of Decentralized AI: Why Blockchain is the "Cure" AI Cannot Ignore** Decentralized AI addresses fundamental bottlenecks of centralized AI: scarce and expensive computational resources, excessive control concentration, unverifiable model outputs, and increasing difficulty in acquiring training data due to privacy and regulation. Blockchain offers a path to make intelligence open, verifiable, and economically accessible. The technical stack comprises three layers: 1. **Applications & Services**: The main crypto use cases are "Agentic Finance" (converting natural language into on-chain actions) and "Agentic Payments" for machine-to-machine commerce. Projects like Giza, Infinity Labs, Coinvest AI, and x402 (handling 173M+ transactions) are key players. 2. **Middleware**: This coordination layer enables agents to discover, identify, and transact. Notable projects include Gokite AI (specialized L1), Virtuals (an OS for the agent economy), and especially Bittensor—a network of specialized subnets forming competitive AI micro-economies. 3. **Infrastructure**: The capital-intensive layer providing raw resources. It includes decentralized compute (Akash, Render, Aethir), verifiable inference (Venice AI, OpenGradient), distributed training (Prime Intellect, Templar AI), decentralized storage (Filecoin, Walrus), and privacy/verification layers (Nillion, Arcium, Phala Network) using technologies like ZKPs, MPC, and TEEs. The outlook for 2026-2027 indicates AI demand outpacing infrastructure, with AI agents as a primary growth engine. Computation is becoming an asset class, with on-chain markets as its financial layer. Tokenomics is emerging as a structural advantage for coordinating capital, compute, and data in decentralized AI networks. While still early—with adoption uneven and revenue often trailing token incentives—projects like Bittensor, NEAR, and Virtuals demonstrate a shift from speculative narrative to a new model for coordinating intelligence.

marsbit06/12 02:40

2026 Landscape of Decentralized AI: Why is Blockchain the Inevitable "Antidote" for AI?

marsbit06/12 02:40

The 2026 Landscape of Decentralized AI: Why Blockchain is the Inevitable 'Antidote' for AI?

Decentralized AI 2026 Landscape: Why Blockchain is AI's Essential "Antidote" Centralized AI faces structural bottlenecks—expensive compute, concentrated control, unverifiable outputs, and difficult data access—that cannot be solved by capital or code alone. Blockchain offers a path to make intelligence open, verifiable, and economically accessible. The decentralized AI stack comprises: * **Infrastructure:** The foundation with compute, verifiable inference, distributed training, data/storage, and privacy/verification layers. Projects like Akash, Render, and Filecoin provide cheaper, decentralized alternatives for raw resources. * **Middleware:** The coordination layer for agent discovery, identity, and commerce. Key players include Bittensor (a network of specialized AI subnets), Virtuals (an agent economy OS), and frameworks providing agent identity and tooling. * **Applications & Services:** Dominated by Agentic Finance (AI agents executing on-chain actions based on natural language) and Agentic Payments (machine-to-machine transactions using blockchain as a settlement layer). Projects like Giza, Infinit Labs, and x402 are enabling these use cases. Key trends for 2026-2027 show AI demand outgrowing infrastructure, compute becoming an asset class, and tokenomics emerging as a structural advantage for coordinating capital, compute, and data. While still early—with adoption uneven and revenue often trailing token incentives—projects like Bittensor, NEAR, and Venice demonstrate decentralized AI is evolving from a narrative into a new model for coordinating intelligence.

Foresight News06/11 10:02

The 2026 Landscape of Decentralized AI: Why Blockchain is the Inevitable 'Antidote' for AI?

Foresight News06/11 10:02

When Inference Becomes a Scarce Resource, Who Captures the Value?

When Inference Becomes the Scarce Resource, Who Captures the Value? The core AI bottleneck has shifted from model training to inference (runtime execution). While concerns persisted about an "AI compute gap"—initially a $200B, now a $600B problem—the market is now recognizing that the solution and value lie in the inference layer. Nvidia's financial restructuring around "serving tokens" and Cerebras's successful IPO highlight this shift. Inference is a recurring, usage-based cost, estimated to be 10-50x larger than the one-time training market, especially with the rise of agentic AI. The inference stack spans six layers: silicon (e.g., Nvidia), bare metal (e.g., CoreWeave), GPU rental/aggregation, deployment/optimization, model APIs, and end applications. Most companies operate in one layer. However, Hyperbolic uniquely spans three layers (GPU rental, deployment, and model APIs) without owning any hardware. It aggregates fragmented GPU supply from multiple cloud providers into a standardized pool, offering developers the cheapest available compute through intelligent routing. Its multi-cloud aggregation creates a data moat and a flywheel: more supply leads to better pricing data and liquidity, attracting more developers and providers. In contrast, applications like Venice operate at the top of the stack, reselling privacy-wrapped inference but remaining dependent on and constrained by the underlying compute costs they purchase. As inference demand explodes, value accrues not just to consumer applications but increasingly to the aggregation and routing layer that captures their cost of revenue. The coming potential GPU oversupply reinforces this dynamic. While hardware owners may suffer from depreciation, asset-light aggregators like Hyperbolic benefit from price arbitrage, routing workloads to the cheapest available capacity. The ultimate winner in the inference economy may not be the entity with the most GPUs, but the one that can most efficiently discover, aggregate, and route the world's fragmented compute.

链捕手06/08 15:39

When Inference Becomes a Scarce Resource, Who Captures the Value?

链捕手06/08 15:39

Trading Strategies

Others

1Alliance's Co-Founder's Letter to Entrepreneurs: Written on the Occasion of Cursor's $60 Billion Sale