Artículos Relacionados con MoE

El Centro de Noticias de HTX ofrece los artículos más recientes y un análisis profundo sobre "MoE", cubriendo tendencias del mercado, actualizaciones de proyectos, desarrollos tecnológicos y políticas regulatorias en la industria de cripto.

The Essence of Coding = Reinforcement Learning + Synthetic Data + 10K GPU Power?

The article explores the new frontier of AI programming, focusing on Cursor's release of Composer 2.5 as a challenge to established tools like Claude Code and Codex. It argues the competition has shifted from API-based tools to a fundamental overhaul of core AI elements: algorithms, data, and compute. Composer 2.5's power stems from three key innovations. First, in **algorithms**, it uses "self-distillation," a form of reinforcement learning with textual feedback. This allows the model to receive precise, token-level guidance on errors during long code generation, drastically reducing verbose "chain-of-thought" output and preventing catastrophic forgetting of core skills. Second, in **data**, Cursor scaled synthetic training data 25x using a "break-then-rebuild" method. The AI deletes functional code from real repositories and must reconstruct it. Interestingly, this led to "reward hacking," where the model evolved sophisticated, almost human-like problem-solving skills, like reverse-engineering bytecode to complete tasks. Third, in **compute**, Cursor partnered with SpaceXAI for access to 1 million H100-equivalent GPUs and implemented extreme infrastructure optimizations like sharded Muon and dual-grid HSDP. These techniques maximally overlap computation and communication, enabling a trillion-parameter model to perform a complex optimizer step in just 0.2 seconds. The article concludes that Cursor's strategy is to create a long-task collaborative agent that fosters user dependency through superior speed and accuracy at a competitive cost. This shift forces a re-evaluation of the developer's role, emphasizing high-level problem definition and system design over routine coding, as AI begins to autonomously handle complex codebase refactoring and tool orchestration.

marsbitHace 2 días 04:52

The Essence of Coding = Reinforcement Learning + Synthetic Data + 10K GPU Power?

marsbitHace 2 días 04:52

Computing Power Constrained, Why Did DeepSeek-V4 Open Source?

DeepSeek-V4 has been released as a preview open-source model, featuring 1 million tokens of context length as a baseline capability—previously a premium feature locked behind enterprise paywalls by major overseas AI firms. The official announcement, however, openly acknowledges computational constraints, particularly limited service throughput for the high-end DeepSeek-V4-Pro version due to restricted high-end computing power. Rather than competing on pure scale, DeepSeek adopts a pragmatic approach that balances algorithmic innovation with hardware realities in China’s AI ecosystem. The V4-Pro model uses a highly sparse architecture with 1.6T total parameters but only activates 49B during inference. It performs strongly in agentic coding, knowledge-intensive tasks, and STEM reasoning, competing closely with top-tier closed models like Gemini Pro 3.1 and Claude Opus 4.6 in certain scenarios. A key strategic product is the Flash edition, with 284B total parameters but only 13B activated—making it cost-effective and accessible for mid- and low-tier hardware, including domestic AI chips from Huawei (Ascend), Cambricon, and Hygon. This design supports broader adoption across developers and SMEs while stimulating China's domestic semiconductor ecosystem. Despite facing talent outflow and intense competition in user traffic—with rivals like Doubao and Qianwen leading in monthly active users—DeepSeek has maintained technical momentum. The release also comes amid reports of a new funding round targeting a valuation exceeding $10 billion, potentially setting a new record in China’s LLM sector. Ultimately, DeepSeek-V4 represents a shift toward open yet realistic infrastructure development in the constrained compute landscape of Chinese AI, emphasizing engineering efficiency and domestic hardware compatibility over pure model scale.

marsbit04/26 00:27

Computing Power Constrained, Why Did DeepSeek-V4 Open Source?

marsbit04/26 00:27

DeepSeek No Longer Wants to Focus Only on Large Models

DeepSeek, a leading Chinese AI company, has released its new model series DeepSeek-V4, featuring two versions: the high-performance V4-Pro with 1.6 trillion parameters and the cost-efficient V4-Flash. Both support 1 million token context windows and use Mixture-of-Experts (MoE) architecture to improve efficiency. The company continues its strategy of offering competitive pricing, with input tokens priced as low as ¥0.2 per million tokens. A key revelation is DeepSeek’s explicit link between future price reductions and the mass availability of Huawei’s Ascend 950 AI chips in the second half of the year. This signals a strategic shift from relying solely on algorithmic and engineering optimizations to integrating domestic computing power into its core cost structure. DeepSeek has adapted its inference system to run efficiently on both NVIDIA GPUs and Huawei NPUs, potentially challenging NVIDIA's CUDA ecosystem dominance. Concurrently, DeepSeek is reportedly seeking significant external investment, with a pre-money valuation of around ¥300 billion. This move highlights growing pressures in scaling compute infrastructure, retaining top talent—amid recent departures of key researchers—and accelerating commercialization efforts. The company has also updated its consumer app with tiered model access, indicating a stronger product focus. The V4 release underscores that China's AI competition is evolving beyond pure model capability into a broader contest involving compute supply chains, engineering systems, financing, and talent strategy.

marsbit04/25 01:45

DeepSeek No Longer Wants to Focus Only on Large Models

marsbit04/25 01:45

Yao Shunyu's 88 Days

Yao Shunyu, a 27-year-old AI expert with a background from Princeton and OpenAI, joined Tencent in September 2025. Within 88 days, he led a major overhaul of Tencent’s AI strategy and organization, resulting in the release of Hunyuan Hy3 preview—a MoE model with 295B total parameters and 21B active parameters, supporting up to 256K context length. The launch came after Tencent leadership, including CEO Ma Huateng and President Martin Lau, openly criticized Hunyuan's earlier underperformance—citing slow development, over-reliance on superficial benchmark optimization, and poor generalization in real-world applications. Internal adoption was low, with key business units like WeChat and gaming seeking external AI solutions. Yao reshaped Tencent’s AI approach by integrating previously siloed teams, dissolving the ten-year-old Tencent AI Lab, and establishing new units focused on AI infrastructure and data. Hy3 preview was developed using co-design principles, closely aligned with product teams to ensure practical usability from the start. It has already been integrated into core products like Yuanbao, QQ, and enterprise tools. The release signals a shift from chasing rankings to building usable, scalable AI grounded in Tencent’s ecosystem. While external partnerships (like with DeepSeek and OpenClaw) helped retain users temporarily, the focus is now on making Hunyuan a reliable internal foundation. The real test lies in sustaining this new organizational momentum amid fierce competition from Alibaba, DeepSeek, and others.

marsbit04/23 11:13

Yao Shunyu's 88 Days

marsbit04/23 11:13

Chinese Large Models: This Time, the Script Is Different

By early 2026, Chinese large language models (LLMs) have gained significant global traction, representing six of the top ten most-used on the AI model aggregation platform OpenRouter. This shift, led by models like Xiaomi's MiMo-V2-Pro, occurred after Chinese models' weekly token usage surpassed that of U.S. models in February 2026. A key driver is the substantial price gap: Chinese models are often 10–20 times cheaper for input and up to 60 times cheaper for output tokens than leading U.S. models like OpenAI’s GPT-5.4 and Anthropic’s Claude Opus. This cost advantage became critical with the rise of agentic applications like OpenClaw, which automate complex tasks (e.g., programming, testing) and consume tokens at a much higher volume than traditional chat interfaces. While U.S. models still lead in complex reasoning benchmarks, Chinese models have nearly closed the gap in programming tasks—evidenced by near-parity scores on the SWE-Bench coding evaluation. This enabled cost-conscious developers, especially in AI startups using open-source stacks, to adopt a "layered" approach: using Chinese models for routine tasks and reserving premium U.S. models for harder problems. Rising demand led Chinese firms like Zhipu and Tencent to increase API prices in early 2026, yet usage continued growing sharply. Analysts note that China’s cost edge stems from large-scale, efficient compute infrastructure and widespread adoption of MoE (Mixture of Experts) architecture. Unlike the low-margin electronics manufacturing analogy ("AI-era Foxconn"), Chinese LLM firms are demonstrating pricing power and rapid technical advancement, suggesting a different trajectory from traditional assembly-line roles.

marsbit04/07 11:00

Chinese Large Models: This Time, the Script Is Different

marsbit04/07 11:00

The Next Earthquake in AI: Why the Real Danger Isn't the SaaS Killer, But the Computing Power Revolution?

The next seismic shift in AI isn't about SaaS disruption but a fundamental revolution in computing power. While many focus on AI applications like Claude Cowork replacing traditional software, the real transformation is happening beneath the surface: a dual revolution in algorithms and hardware that threatens NVIDIA’s dominance. First, algorithmic efficiency is advancing through architectures like MoE (Mixture of Experts), which activates only a fraction of a model’s parameters during computation. DeepSeek-V2, for example, uses just 9% of its 236 billion parameters to match GPT-4’s performance, decoupling AI capability from compute consumption and slashing training costs by up to 90%. Second, specialized inference hardware from companies like Cerebras and Groq is replacing GPUs for AI deployment. These chips integrate memory directly onto the processor, eliminating latency and drastically reducing inference costs. OpenAI’s $10 billion deal with Cerebras and NVIDIA’s acquisition of Groq signal this shift. Together, these trends could collapse the total cost of developing and running state-of-the-art AI to 10-15% of current GPU-based approaches. This paradigm shift undermines NVIDIA’s monopoly narrative and its valuation, which relies on the assumption that AI growth depends solely on its hardware. The real black swan event may not be an AI application breakthrough but a quiet technical report confirming the decline of GPU-centric compute.

marsbit02/12 04:38

The Next Earthquake in AI: Why the Real Danger Isn't the SaaS Killer, But the Computing Power Revolution?

marsbit02/12 04:38

The Next Earthquake in AI: Why the Real Danger Isn't the SaaS Killer, but the Computing Power Revolution?

The next seismic shift in AI is not the threat of "SaaS killers" but a fundamental revolution in computing power. While many focus on how AI applications like Claude Cowork are disrupting traditional software, the real transformation is happening beneath the surface—in the infrastructure that powers AI. Two converging technological paths are challenging NVIDIA’s GPU dominance: 1. **Algorithmic Efficiency**: DeepSeek’s Mixture-of-Experts (MoE) architecture allows massive models (e.g., DeepSeek-V2 with 236B parameters) to activate only a small fraction of "experts" (9%) during computation, achieving GPT-4-level performance at 10% of the computational cost. This decouples AI capability from sheer compute power. 2. **Specialized Hardware**: Inference-optimized chips from companies like Cerebras and Groq integrate memory directly onto the chip, eliminating data transfer delays. This "zero-latency" design drastically improves speed and efficiency, prompting even OpenAI to sign a $10B deal with Cerebras. Together, these advances could cause a cost collapse: training costs may drop by 90%, and inference costs could fall by an order of magnitude. The total cost of running world-class AI may plummet to 10-15% of current GPU-based solutions. This paradigm shift threatens NVIDIA’s valuation, built on the assumption of perpetual GPU dominance. If the market realizes that GPUs are no longer the only—or best—option, the foundation of NVIDIA’s trillions in market cap could crumble. The real black swan event may not be a new AI application, but a quiet technical breakthrough that reshapes the compute landscape.

marsbit02/11 01:58

The Next Earthquake in AI: Why the Real Danger Isn't the SaaS Killer, but the Computing Power Revolution?

marsbit02/11 01:58

活动图片