# Cost Optimization Related Articles

HTX News Center provides the latest articles and in-depth analysis on "Cost Optimization", covering market trends, project updates, tech developments, and regulatory policies in the crypto industry.

Rubin Ultra Makes Major Cuts, Even Nvidia Can't Handle Memory Price Hikes?

NVIDIA's Rubin Ultra, the top-tier variant of the newly announced Rubin AI accelerators, has reportedly seen significant specification downgrades, according to an industry report from SemiAnalysis. Initially designed with four compute dies (4-die), the Rubin Ultra is now said to be reduced to a 2-die design. Key changes highlighted in the report include: * **No increase in peak theoretical compute performance**, remaining at 35 PFLOPs like the standard Rubin. * **Severe reduction in memory capacity** to 192GB using 8-Hi HBM stacks, which is less than the standard Rubin's 288GB using 12-Hi stacks. * **Negligible memory bandwidth improvement** of only 1 TB/s. * **Slightly higher chip-level power consumption**. * The **primary upgrade is a massive increase in scale-up interconnect capacity**, supporting connections for up to 576 GPUs via NVLink, compared to 72 for the standard Rubin. The report suggests the redesign is primarily a cost-optimization move driven by the sharp rise in HBM (High-Bandwidth Memory) prices. By reducing the expensive HBM content and shifting investment towards enhanced system-scale networking, NVIDIA aims to maintain the platform's value for large-scale AI training clusters while managing soaring material costs. The news reportedly triggered a sell-off in South Korean memory stocks, with SK Hynix and Samsung shares falling around 8%, as markets grew concerned that NVIDIA—a major HBM buyer—might be reducing its reliance on high-capacity memory, potentially capping future pricing power for memory makers.

Odaily星球日报12h ago

Rubin Ultra Makes Major Cuts, Even Nvidia Can't Handle Memory Price Hikes?

Odaily星球日报12h ago

Token Devours One-Third of Payroll, Silicon Valley's AI Bill is Spinning Out of Control

The article discusses the dual reality of AI token costs in Silicon Valley. While research firm SemiAnalysis reports spending 30% of its employee salary budget on internal LLM tokens—translating to massive productivity gains like converting complex Excel models in minutes—other giants are struggling with ballooning, uncontrolled AI bills. Uber exhausted its annual AI budget in months after rapid engineer adoption, and Microsoft is cutting third-party AI tools due to high costs. NVIDIA's CEO argues tokens are becoming "means of production" and plans substantial AI budgets per engineer. Despite current cost concerns, the analysis emphasizes that cost collapse is just beginning. Through software optimizations (like 14x throughput boosts) and next-gen hardware (e.g., GB300 NVL72 with 17-32x H100 performance), real token costs can fall far below list prices. Anthropic's gross margins reportedly soared as token prices dropped. Gartner predicts a >90% inference cost drop by 2030. The piece highlights a split: massive AI capex ($740B announced) contrasts with tech layoffs and minimal measured economic impact so far. The transition mirrors past infrastructure shifts—investment precedes widespread productivity. For early adopters like SemiAnalysis, tokens already deliver high leverage; for others, the choice is to adopt now or risk falling behind.

marsbit07/06 00:15

Token Devours One-Third of Payroll, Silicon Valley's AI Bill is Spinning Out of Control

marsbit07/06 00:15

When US Giants Collectively "Defect" to Chinese AI Models

When Silicon Valley Giants Turn to Chinese AI Models to Cut Costs A surprising trend is emerging: major U.S. tech companies are significantly reducing AI costs by switching to Chinese models. Coinbase, the largest U.S. cryptocurrency exchange, reportedly halved its AI spending after migrating to China's GLM-5.2 and Kimi 2.7 models, despite increasing usage. They achieved this through a sophisticated three-part strategy: implementing an automatic routing system to select the most cost-effective model per task, boosting cache hit rates from 5% to 60% to reuse computations, and employing "context engineering" to provide AI with more precise, less cluttered information. They are not alone. AI startup Lindy switched from Claude to DeepSeek, saving millions, while Snowflake's tests found GLM-5.2 solved 66% of coding tasks compared to Claude Opus's 67%—but at a fraction of the cost (output pricing is 5-7 times lower). While the top Western models may offer slightly better stability, the massive price differential is leading many businesses to reconsider their value proposition. This shift signals a deeper change in the AI industry, moving beyond pure performance benchmarks to a fierce cost competition. As pressure mounts, even OpenAI and Anthropic have begun slashing prices. For users, this means more choices, lower costs, and a crucial lesson: using multiple models based on task complexity, optimizing with caching, and keeping contexts lean are now key to leveraging AI efficiently and affordably.

marsbit07/03 16:15

When US Giants Collectively "Defect" to Chinese AI Models

marsbit07/03 16:15

The Art of Saving in the AI Era: How to Spend Every Token Wisely

In the AI era, tokens are the new currency, and efficiency is paramount. This article outlines strategies to minimize token usage while maximizing value. Key principles include prioritizing high signal-to-noise ratio inputs by removing unnecessary content like greetings, repetitive context, or verbose instructions before processing. Converting files (e.g., PDFs to clean Markdown) and compressing images drastically reduce token consumption. Avoid conversational, multi-turn interactions; instead, provide clear, concise, and complete instructions upfront to prevent costly back-and-forth. Output costs are higher than input, so eliminate AI pleasantries and enforce structured responses (e.g., JSON) over verbose explanations. Use system prompts to mandate direct answers and disable unnecessary features like "extended thinking" for simple tasks. Manage context efficiently: start new conversations for new tasks, compress long histories, and leverage prompt caching to reuse fixed instructions at lower costs. Employ model tiering—assigning complex tasks to premium models (e.g., Claude Opus) and simpler subtasks to cheaper ones (e.g., Claude Haiku)—to optimize cost and performance. Ultimately, the most effective saving is questioning whether a task requires AI at all. Human judgment remains a critical filter to avoid unnecessary token expenditure, ensuring that AI complements rather than replaces human efficiency.

marsbit04/03 03:22

The Art of Saving in the AI Era: How to Spend Every Token Wisely

marsbit04/03 03:22

OpenClaw Token Saving Ultimate Guide: Use the Strongest Model, Spend the Least Money / Includes Prompts

This guide provides strategies to reduce OpenClaw token usage by 60-85% when using expensive models like Claude Opus. The main costs come not just from your input and the model's output, but from hidden overhead: a fixed System Prompt (~3000-5000 tokens), injected context files like AGENTS.md and MEMORY.md (~3000-14000 tokens), and conversation history. Key strategies include: 1. **Model Tiering:** Use the cheaper Claude Sonnet for 80% of daily tasks (chat, simple Q&A, cron jobs) and reserve Opus for complex tasks like writing and deep analysis. 2. **Context Slimming:** Drastically reduce the token count in injected files (AGENTS.md, SOUL.md, MEMORY.md) and remove unnecessary files from `workspaceFiles`. 3. **Cron Optimization:** Lower the frequency, merge tasks, and downgrade non-critical cron jobs to Sonnet. Configure deliveries for notifications only when necessary. 4. **Heartbeat Tuning:** Increase the interval (e.g., 45-60 minutes), set a silent period overnight, and slim down the HEARTBEAT.md file. 5. **Precise Retrieval with QMD:** Implement the local, zero-cost qmd tool for semantic search. This allows the agent to retrieve only specific relevant paragraphs from documents instead of reading entire files, saving up to 90% of tokens per query. 6. **Memory Search Selection:** For small memory files, use local embedding; for larger or multi-language needs, consider Voyage AI's free tier. By implementing these changes—model switching, context reduction, and smarter retrieval—users can significantly cut costs while maintaining performance for most tasks.

marsbit02/11 00:35

OpenClaw Token Saving Ultimate Guide: Use the Strongest Model, Spend the Least Money / Includes Prompts