Artículos Relacionados con MoE

El Centro de Noticias de HTX ofrece los artículos más recientes y un análisis profundo sobre "MoE", cubriendo tendencias del mercado, actualizaciones de proyectos, desarrollos tecnológicos y políticas regulatorias en la industria de cripto.

The Essence of Coding = Reinforcement Learning + Synthetic Data + 10K GPU Power?

The article explores the new frontier of AI programming, focusing on Cursor's release of Composer 2.5 as a challenge to established tools like Claude Code and Codex. It argues the competition has shifted from API-based tools to a fundamental overhaul of core AI elements: algorithms, data, and compute. Composer 2.5's power stems from three key innovations. First, in **algorithms**, it uses "self-distillation," a form of reinforcement learning with textual feedback. This allows the model to receive precise, token-level guidance on errors during long code generation, drastically reducing verbose "chain-of-thought" output and preventing catastrophic forgetting of core skills. Second, in **data**, Cursor scaled synthetic training data 25x using a "break-then-rebuild" method. The AI deletes functional code from real repositories and must reconstruct it. Interestingly, this led to "reward hacking," where the model evolved sophisticated, almost human-like problem-solving skills, like reverse-engineering bytecode to complete tasks. Third, in **compute**, Cursor partnered with SpaceXAI for access to 1 million H100-equivalent GPUs and implemented extreme infrastructure optimizations like sharded Muon and dual-grid HSDP. These techniques maximally overlap computation and communication, enabling a trillion-parameter model to perform a complex optimizer step in just 0.2 seconds. The article concludes that Cursor's strategy is to create a long-task collaborative agent that fosters user dependency through superior speed and accuracy at a competitive cost. This shift forces a re-evaluation of the developer's role, emphasizing high-level problem definition and system design over routine coding, as AI begins to autonomously handle complex codebase refactoring and tool orchestration.

marsbitHace 2 días 04:52

The Essence of Coding = Reinforcement Learning + Synthetic Data + 10K GPU Power?

marsbitHace 2 días 04:52

Computing Power Constrained, Why Did DeepSeek-V4 Open Source?

DeepSeek-V4 has been released as a preview open-source model, featuring 1 million tokens of context length as a baseline capability—previously a premium feature locked behind enterprise paywalls by major overseas AI firms. The official announcement, however, openly acknowledges computational constraints, particularly limited service throughput for the high-end DeepSeek-V4-Pro version due to restricted high-end computing power. Rather than competing on pure scale, DeepSeek adopts a pragmatic approach that balances algorithmic innovation with hardware realities in China’s AI ecosystem. The V4-Pro model uses a highly sparse architecture with 1.6T total parameters but only activates 49B during inference. It performs strongly in agentic coding, knowledge-intensive tasks, and STEM reasoning, competing closely with top-tier closed models like Gemini Pro 3.1 and Claude Opus 4.6 in certain scenarios. A key strategic product is the Flash edition, with 284B total parameters but only 13B activated—making it cost-effective and accessible for mid- and low-tier hardware, including domestic AI chips from Huawei (Ascend), Cambricon, and Hygon. This design supports broader adoption across developers and SMEs while stimulating China's domestic semiconductor ecosystem. Despite facing talent outflow and intense competition in user traffic—with rivals like Doubao and Qianwen leading in monthly active users—DeepSeek has maintained technical momentum. The release also comes amid reports of a new funding round targeting a valuation exceeding $10 billion, potentially setting a new record in China’s LLM sector. Ultimately, DeepSeek-V4 represents a shift toward open yet realistic infrastructure development in the constrained compute landscape of Chinese AI, emphasizing engineering efficiency and domestic hardware compatibility over pure model scale.

marsbit04/26 00:27

Computing Power Constrained, Why Did DeepSeek-V4 Open Source?

marsbit04/26 00:27

The True Value of DeepSeek V4 Lies Beyond Parameters

DeepSeek V4 represents a strategic breakthrough for China’s AI industry, not merely for its technical specifications—such as its 1.6 trillion parameters or 1 million token context length—but for its successful adaptation to domestic computing hardware like Huawei’s Ascend 950 and Cambricon chips. This move reduces reliance on NVIDIA’s CUDA ecosystem, which has long dominated AI training and inference. The model achieves this through several innovations: a hybrid attention mechanism (CSA + HCA) that optimizes long-context processing, MoE architecture that activates only a fraction of parameters per inference, and deep software-hardware co-design with domestic chipmakers. These improvements make it feasible to run a top-tier model efficiently on local hardware, significantly lowering inference costs and enhancing scalability. Priced competitively, DeepSeek V4 offers long-context capabilities at a fraction of the cost of comparable models, enabling practical enterprise applications—such as legal document analysis, financial research, and coding agents—that require processing large volumes of data in real-time. This demonstrates China’s growing ability to innovate within hardware constraints and marks a critical step toward AI supply chain independence.

marsbit04/25 08:08

The True Value of DeepSeek V4 Lies Beyond Parameters

marsbit04/25 08:08

DeepSeek No Longer Wants to Focus Only on Large Models

DeepSeek, a leading Chinese AI company, has released its new model series DeepSeek-V4, featuring two versions: the high-performance V4-Pro with 1.6 trillion parameters and the cost-efficient V4-Flash. Both support 1 million token context windows and use Mixture-of-Experts (MoE) architecture to improve efficiency. The company continues its strategy of offering competitive pricing, with input tokens priced as low as ¥0.2 per million tokens. A key revelation is DeepSeek’s explicit link between future price reductions and the mass availability of Huawei’s Ascend 950 AI chips in the second half of the year. This signals a strategic shift from relying solely on algorithmic and engineering optimizations to integrating domestic computing power into its core cost structure. DeepSeek has adapted its inference system to run efficiently on both NVIDIA GPUs and Huawei NPUs, potentially challenging NVIDIA's CUDA ecosystem dominance. Concurrently, DeepSeek is reportedly seeking significant external investment, with a pre-money valuation of around ¥300 billion. This move highlights growing pressures in scaling compute infrastructure, retaining top talent—amid recent departures of key researchers—and accelerating commercialization efforts. The company has also updated its consumer app with tiered model access, indicating a stronger product focus. The V4 release underscores that China's AI competition is evolving beyond pure model capability into a broader contest involving compute supply chains, engineering systems, financing, and talent strategy.

marsbit04/25 01:45

DeepSeek No Longer Wants to Focus Only on Large Models

marsbit04/25 01:45

Yao Shunyu's 88 Days

Yao Shunyu, a 27-year-old AI expert with a background from Princeton and OpenAI, joined Tencent in September 2025. Within 88 days, he led a major overhaul of Tencent’s AI strategy and organization, resulting in the release of Hunyuan Hy3 preview—a MoE model with 295B total parameters and 21B active parameters, supporting up to 256K context length. The launch came after Tencent leadership, including CEO Ma Huateng and President Martin Lau, openly criticized Hunyuan's earlier underperformance—citing slow development, over-reliance on superficial benchmark optimization, and poor generalization in real-world applications. Internal adoption was low, with key business units like WeChat and gaming seeking external AI solutions. Yao reshaped Tencent’s AI approach by integrating previously siloed teams, dissolving the ten-year-old Tencent AI Lab, and establishing new units focused on AI infrastructure and data. Hy3 preview was developed using co-design principles, closely aligned with product teams to ensure practical usability from the start. It has already been integrated into core products like Yuanbao, QQ, and enterprise tools. The release signals a shift from chasing rankings to building usable, scalable AI grounded in Tencent’s ecosystem. While external partnerships (like with DeepSeek and OpenClaw) helped retain users temporarily, the focus is now on making Hunyuan a reliable internal foundation. The real test lies in sustaining this new organizational momentum amid fierce competition from Alibaba, DeepSeek, and others.

marsbit04/23 11:13

marsbit04/23 11:13

Chinese Large Models: This Time, the Script Is Different

By early 2026, Chinese large language models (LLMs) have gained significant global traction, representing six of the top ten most-used on the AI model aggregation platform OpenRouter. This shift, led by models like Xiaomi's MiMo-V2-Pro, occurred after Chinese models' weekly token usage surpassed that of U.S. models in February 2026. A key driver is the substantial price gap: Chinese models are often 10–20 times cheaper for input and up to 60 times cheaper for output tokens than leading U.S. models like OpenAI’s GPT-5.4 and Anthropic’s Claude Opus. This cost advantage became critical with the rise of agentic applications like OpenClaw, which automate complex tasks (e.g., programming, testing) and consume tokens at a much higher volume than traditional chat interfaces. While U.S. models still lead in complex reasoning benchmarks, Chinese models have nearly closed the gap in programming tasks—evidenced by near-parity scores on the SWE-Bench coding evaluation. This enabled cost-conscious developers, especially in AI startups using open-source stacks, to adopt a "layered" approach: using Chinese models for routine tasks and reserving premium U.S. models for harder problems. Rising demand led Chinese firms like Zhipu and Tencent to increase API prices in early 2026, yet usage continued growing sharply. Analysts note that China’s cost edge stems from large-scale, efficient compute infrastructure and widespread adoption of MoE (Mixture of Experts) architecture. Unlike the low-margin electronics manufacturing analogy ("AI-era Foxconn"), Chinese LLM firms are demonstrating pricing power and rapid technical advancement, suggesting a different trajectory from traditional assembly-line roles.

marsbit04/07 11:00

Chinese Large Models: This Time, the Script Is Different

marsbit04/07 11:00

Google's Open-Source Large Model Gemma 4 Imminent Announcement: Parameter Count Quadrupled

Google is set to announce Gemma 4, its next-generation open-source large language model, marking a significant upgrade from the previous Gemma 3 released a year ago. The new model is expected to feature a 120B parameter version—four times larger than its predecessor—while utilizing a Mixture-of-Experts (MoE) architecture to keep activated parameters at just 15B, enabling local operation on consumer-grade hardware. Gemma 4 is also anticipated to deliver improved context length, reasoning, and complex task performance. The move is seen as part of Google’s strategy to compete in the open-source arena, which has been increasingly influenced by Chinese tech firms. By releasing Gemma 4 months after its flagship closed-source model Gemini 3.0, Google aims to balance commercial interests with developer engagement. The model emphasizes local and offline usability, positioning it as a direct competitor to domestic open-source alternatives. Industry observers note that Gemma 4 raises the bar for open-source models, combining scale and efficiency. Although Google’s primary focus remains on closed-source systems, its technical strength could make Gemma 4 a strong contender in the global open-source ecosystem.

marsbit04/02 06:45

Google's Open-Source Large Model Gemma 4 Imminent Announcement: Parameter Count Quadrupled

marsbit04/02 06:45

The Next Earthquake in AI: Why the Real Danger Isn't the SaaS Killer, But the Computing Power Revolution?

The next seismic shift in AI isn't about SaaS disruption but a fundamental revolution in computing power. While many focus on AI applications like Claude Cowork replacing traditional software, the real transformation is happening beneath the surface: a dual revolution in algorithms and hardware that threatens NVIDIA’s dominance. First, algorithmic efficiency is advancing through architectures like MoE (Mixture of Experts), which activates only a fraction of a model’s parameters during computation. DeepSeek-V2, for example, uses just 9% of its 236 billion parameters to match GPT-4’s performance, decoupling AI capability from compute consumption and slashing training costs by up to 90%. Second, specialized inference hardware from companies like Cerebras and Groq is replacing GPUs for AI deployment. These chips integrate memory directly onto the processor, eliminating latency and drastically reducing inference costs. OpenAI’s $10 billion deal with Cerebras and NVIDIA’s acquisition of Groq signal this shift. Together, these trends could collapse the total cost of developing and running state-of-the-art AI to 10-15% of current GPU-based approaches. This paradigm shift undermines NVIDIA’s monopoly narrative and its valuation, which relies on the assumption that AI growth depends solely on its hardware. The real black swan event may not be an AI application breakthrough but a quiet technical report confirming the decline of GPU-centric compute.

marsbit02/12 04:38

The Next Earthquake in AI: Why the Real Danger Isn't the SaaS Killer, but the Computing Power Revolution?

The next seismic shift in AI is not the threat of "SaaS killers" but a fundamental revolution in computing power. While many focus on how AI applications like Claude Cowork are disrupting traditional software, the real transformation is happening beneath the surface—in the infrastructure that powers AI. Two converging technological paths are challenging NVIDIA’s GPU dominance: 1. **Algorithmic Efficiency**: DeepSeek’s Mixture-of-Experts (MoE) architecture allows massive models (e.g., DeepSeek-V2 with 236B parameters) to activate only a small fraction of "experts" (9%) during computation, achieving GPT-4-level performance at 10% of the computational cost. This decouples AI capability from sheer compute power. 2. **Specialized Hardware**: Inference-optimized chips from companies like Cerebras and Groq integrate memory directly onto the chip, eliminating data transfer delays. This "zero-latency" design drastically improves speed and efficiency, prompting even OpenAI to sign a $10B deal with Cerebras. Together, these advances could cause a cost collapse: training costs may drop by 90%, and inference costs could fall by an order of magnitude. The total cost of running world-class AI may plummet to 10-15% of current GPU-based solutions. This paradigm shift threatens NVIDIA’s valuation, built on the assumption of perpetual GPU dominance. If the market realizes that GPUs are no longer the only—or best—option, the foundation of NVIDIA’s trillions in market cap could crumble. The real black swan event may not be a new AI application, but a quiet technical breakthrough that reshapes the compute landscape.

marsbit02/11 01:58

MiniMax's Funding Story: 7 Rounds in 4 Years, Who is Driving China's First AI Capital Feast

MiniMax, a Chinese AI startup, has completed an IPO on the Hong Kong Stock Exchange after raising $1.5 billion across 7 funding rounds in just 4 years. Founded in early 2022 by former SenseTime executives Yan Junjie and Yun Yeyi, the company focuses on developing multimodal and large language models with the vision of "Intelligence with everyone." Key investors include Alibaba (largest external shareholder), Hillhouse Capital (first investor and largest financial backer), and Mihoyo, among others. MiniMax’s strategy combines foundational model development with practical applications, generating revenue through three balanced segments: companion AI apps (e.g., Talkie), content generation tools (e.g., Hailuo AI), and API services. The company navigated the competitive "war of a hundred models" in China, emphasizing both technological risk-taking—such as early bets on MoE architecture and linear attention models—and capital efficiency. Its approach reflects lessons from the previous AI wave: avoid over-customized enterprise solutions and prioritize scalable consumer applications. Despite market uncertainties and intense competition from tech giants, MiniMax went public not as a finale but as a means to secure more resources for ongoing R&D and expansion. The stock surged over 78% on its first trading day, reaching an HK$89.8 billion market cap.

marsbit01/09 05:14

MiniMax's Funding Story: 7 Rounds in 4 Years, Who is Driving China's First AI Capital Feast