What is the key feature of the NVIDIA Rubin GPU that significantly boosts its inference performance?

The key feature is the introduction of the Transformer engine, which delivers NVFP4 inference performance of up to 50 PFLOPS, a 5x increase over the Blackwell GPU.

What is the primary purpose of the NVIDIA Inference Context Memory Storage Platform?

Its primary purpose is to store KV Cache, acting as a POD-level AI-native storage infrastructure to avoid repeated computations, thereby improving inference efficiency and performance for long-context applications like multi-turn conversations and Agentic AI.

According to the article, how does the cost of tokens for large Mixture-of-Experts (MoE) models change with the new DGX Vera Rubin NVL72-based SuperPOD?

The token cost for large MoE models is reduced to 1/10th of the previous cost under the same latency conditions.

Jensen Huang Announces 8 New Products in 1.5 Hours, NVIDIA Fully Bets on AI Inference and Physical AI

Q: What are the six self-developed chips used in the NVIDIA Vera Rubin POD AI supercomputer?

The six self-developed chips are: Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 (CX9) SmartNIC, BlueField-4 DPU, and Spectrum-X 102.4T CPO.

Q: What major achievement did Jensen Huang announce regarding the application of Physical AI in the automotive industry?

He announced that the NVIDIA DRIVE autonomous driving platform is now in production, powering all new Mercedes-Benz CLA vehicles for L2++ AI-defined driving, and that a car equipped with the open-source Alpha-Mayo model was rated the world's safest car by NCAP.

marsbitPublished on 2026-01-06Last updated on 2026-01-06

Abstract

NVIDIA CEO Jensen Huang unveiled eight major announcements during his CES 2026 keynote, focusing on advancing AI inference and physical AI technologies. The centerpiece was the NVIDIA Vera Rubin POD AI supercomputer, which integrates six custom chips—Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-X CPO—designed for协同 performance. The Rubin GPU offers 5x higher inference and 3.5x higher training performance than Blackwell, with support for HBM4 memory. The Vera Rubin NVL72 system delivers 3.6 EFLOPS in NVFP4 inference performance in a single rack, with enhanced memory bandwidth. NVIDIA also introduced the Spectrum-X Ethernet CPO for improved power efficiency, a推理上下文内存存储平台 to optimize KV cache storage and reduce recomputation, and the DGX SuperPOD based on Rubin architecture, cutting token costs for large MoE models to 1/10. On the software side, NVIDIA expanded its open-source offerings, including new models and datasets, and emphasized the rise of physical AI. The company open-sourced the Alpha-Mayo model for autonomous driving, enabling reasoning-based decision-making, and announced production-ready NVIDIA DRIVE platforms for Mercedes-Benz. Partnerships with Siemens and robotics firms like Boston Dynamics were highlighted, underscoring NVIDIA’s full-stack approach to AI infrastructure and real-world AI applications.

Author | ZeR0 JunDa, Zhidongxi

Editor | Moying

LAS VEGAS, January 5, 2026 (Zhidongxi) — Just now, NVIDIA founder and CEO Jensen Huang delivered his first keynote of 2026 at CES 2026. As usual wearing a leather jacket, Huang announced 8 major releases within 1.5 hours, providing an in-depth introduction to the entire new generation platform, from chips and racks to network design.

In the fields of accelerated computing and AI infrastructure, NVIDIA released the NVIDIA Vera Rubin POD AI supercomputer, NVIDIA Spectrum-X co-packaged optics for Ethernet, NVIDIA Inference Context Memory Storage Platform, and the NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72.

The NVIDIA Vera Rubin POD utilizes six major NVIDIA self-developed chips, covering CPU, GPU, Scale-up, Scale-out, storage, and processing capabilities. All parts are co-designed to meet the demands of advanced models and reduce computing costs.

Among them, the Vera CPU adopts a custom Olympus core architecture. The Rubin GPU introduces a Transformer engine, achieving up to 50 PFLOPS of NBFP4 inference performance. NVLink bandwidth per GPU is as fast as 3.6 TB/s. It supports third-generation Universal Confidential Computing (the first rack-level TEE), achieving a complete trusted execution environment across CPU and GPU domains.

These chips have already taped out. NVIDIA has validated the entire NVIDIA Vera Rubin NVL72 system, and partners have begun running their internally integrated AI models and algorithms. The entire ecosystem is preparing for the deployment of Vera Rubin.

Among other releases, the NVIDIA Spectrum-X co-packaged optics for Ethernet significantly optimize power efficiency and application uptime. The NVIDIA Inference Context Memory Storage Platform redefines the storage stack to reduce redundant computation and improve inference efficiency. The NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72 reduces the token cost of large MoE models to 1/10th.

Regarding open models, NVIDIA announced an expansion of its open-source model family bucket, releasing new models, datasets, and libraries. This includes new additions to the NVIDIA Nemotron open-source model series: an Agentic RAG model, security models, and voice models. It also released new open models for all types of robots. However, Jensen Huang did not provide detailed introductions during the speech.

In terms of Physical AI: The ChatGPT moment for Physical AI has arrived. NVIDIA's full-stack technology enables the global ecosystem to transform industries through AI-driven robotics. NVIDIA's extensive AI tool library, including the new Alpamayo open-source model portfolio, enables the global transportation industry to quickly achieve safe L4 driving. The NVIDIA DRIVE autonomous driving platform is now in production, equipped in all new Mercedes-Benz CLA vehicles for L2++ AI-defined driving.

01. New AI Supercomputer: 6 Self-Developed Chips, Single Rack Computing Power Reaches 3.6 EFLOPS

Jensen Huang believes that every 10 to 15 years, the computer industry undergoes a comprehensive reshaping. But this time, two platform transformations are happening simultaneously: from CPU to GPU, and from "programming software" to "training software." Accelerated computing and AI are reconstructing the entire computing stack. The computing industry, worth $10 trillion over the past decade, is undergoing a modernization transformation.

At the same time, the demand for computing power is soaring dramatically. Model size grows 10x annually, the number of tokens used for model thinking grows 5x annually, and the cost per token decreases 10x annually.

To meet this demand, NVIDIA has decided to release new computing hardware every year. Huang revealed that Vera Rubin has now fully entered production.

The new NVIDIA Vera Rubin POD AI supercomputer utilizes six self-developed chips: Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 (CX9) SmartNIC, BlueField-4 DPU, and Spectrum-X 102.4T CPO.

Vera CPU: Designed for data movement and agent processing, it features 88 NVIDIA custom Olympus cores, 176-thread NVIDIA spatial multithreading, 1.8 TB/s NVLink-C2C supporting CPU:GPU unified memory, system memory up to 1.5 TB (3x that of Grace CPU), SOCAMM LPDDR5X memory bandwidth of 1.2 TB/s, and supports rack-level confidential computing, doubling data processing performance.

Rubin GPU: Introduces a Transformer engine, achieving NVFP4 inference performance up to 50 PFLOPS, 5x that of Blackwell GPU, with backward compatibility, maintaining inference precision while improving BF16/FP4 level performance; NVFP4 training performance reaches 35 PFLOPS, 3.5x that of Blackwell.

Rubin is also the first platform to support HBM4, with HBM4 bandwidth reaching 22 TB/s, 2.8x that of the previous generation, providing the required performance for demanding MoE models and AI workloads.

NVLink 6 Switch: Single lane rate increased to 400 Gbps, using SerDes technology for high-speed signal transmission; each GPU achieves 3.6 TB/s of full interconnect communication bandwidth, 2x the previous generation, total bandwidth is 28.8 TB/s, in-network computing performance reaches 14.4 TFLOPS at FP8 precision, and supports 100% liquid cooling.

NVIDIA ConnectX-9 SuperNIC: Provides 1.6 Tb/s bandwidth per GPU, optimized for large-scale AI, with fully software-defined, programmable, accelerated data paths.

NVIDIA BlueField-4: 800 Gbps DPU, used for SmartNICs and storage processors, equipped with 64-core Grace CPU, combined with ConnectX-9 SuperNIC, for offloading network and storage-related computing tasks, while enhancing network security capabilities. Computing performance is 6x the previous generation, memory bandwidth is 3x, and GPU access to data storage speed is increased to 2x.

NVIDIA Vera Rubin NVL72: Integrates all the above components into a single-rack processing system at the system level, featuring 2 trillion transistors, NVFP4 inference performance of 3.6 EFLOPS, and NVFP4 training performance of 2.5 EFLOPS.

The system's LPDDR5X memory capacity reaches 54 TB, 2.5x the previous generation; total HBM4 memory is 20.7 TB, 1.5x the previous generation; HBM4 bandwidth is 1.6 PB/s, 2.8x the previous generation; total scale-up bandwidth reaches 260 TB/s, exceeding the total bandwidth scale of the global internet.

The system is based on the third-generation MGX rack design. The compute tray features a modular, hostless, cableless, fanless design, making assembly and maintenance 18x faster than GB200. Assembly work that originally took 2 hours now takes about 5 minutes. While the original system used about 80% liquid cooling, it now uses 100% liquid cooling. The single system itself weighs 2 tons, and with coolant, it can reach 2.5 tons.

The NVLink Switch tray enables zero downtime maintenance and fault tolerance; the rack can still operate when a tray is removed or partially deployed. The second-generation RAS engine enables zero downtime health checks.

These features improve system uptime and throughput, further reducing training and inference costs, meeting data center requirements for high reliability and high maintainability.

Over 80 MGX partners are ready to support the deployment of Rubin NVL72 in hyperscale networks.

02. Three Major New Releases Drastically Improve AI Inference Efficiency: New CPO Device, New Context Storage Layer, New DGX SuperPOD

Simultaneously, NVIDIA released three important new products: NVIDIA Spectrum-X co-packaged optics for Ethernet, NVIDIA Inference Context Memory Storage Platform, and the NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72.

1. NVIDIA Spectrum-X Co-Packaged Optics for Ethernet

The NVIDIA Spectrum-X co-packaged optics for Ethernet is based on the Spectrum-X architecture, uses a 2-chip design, employs 200 Gbps SerDes, and each ASIC can provide 102.4 Tb/s bandwidth.

This switching platform includes a 512-port high-density system and a 128-port compact system, each port rate is 800 Gb/s.

The CPO (Co-Packaged Optics) switching system achieves 5x improvement in energy efficiency, 10x improvement in reliability, and 5x improvement in application uptime.

This means more tokens can be processed per day, further reducing the total cost of ownership (TCO) of data centers.

2. NVIDIA Inference Context Memory Storage Platform

The NVIDIA Inference Context Memory Storage Platform is a POD-level AI-native storage infrastructure for storing KV Cache. Based on BlueField-4 and Spectrum-X Ethernet acceleration, tightly coupled with NVIDIA Dynamo and NVLink, it achieves协同 context scheduling between memory, storage, and network.

This platform treats context as a first-class data type, achieving 5x inference performance and 5x better energy efficiency.

This is crucial for improving long-context applications like multi-turn conversations, RAG, and Agentic multi-step reasoning. These workloads highly depend on the ability to efficiently store, reuse, and share context throughout the system.

AI is evolving from chatbots to Agentic AI, which reasons, calls tools, and maintains long-term state. Context windows have expanded to millions of tokens. This context is stored in the KV Cache. Recomputing it every step wastes GPU time and creates huge latency, hence the need for storage.

But GPU memory, while fast, is scarce. Traditional network storage is too inefficient for short-term context. The AI inference bottleneck is shifting from computation to context storage. Therefore, a new memory layer between GPU and storage, optimized for inference, is needed.

This layer is no longer an afterthought patch but must be co-designed with network storage to move context data with minimal overhead.

As a new storage tier, the NVIDIA Inference Context Memory Storage Platform does not reside directly in the host system but is connected externally to the computing devices via BlueField-4. Its key advantage is the ability to scale the storage pool size more efficiently, thereby avoiding redundant computation of KV Cache.

NVIDIA is working closely with storage partners to bring the NVIDIA Inference Context Memory Storage Platform to the Rubin platform, enabling customers to deploy it as part of a fully integrated AI infrastructure.

3. NVIDIA DGX SuperPOD Built on Vera Rubin

At the system level, the NVIDIA DGX SuperPOD serves as a blueprint for large-scale AI factory deployment. It uses 8 sets of DGX Vera Rubin NVL72 systems, with NVLink 6 for scale-up networking and Spectrum-X Ethernet for scale-out networking, incorporates the NVIDIA Inference Context Memory Storage Platform, and is engineering-validated.

The entire system is managed by NVIDIA Mission Control software for ultimate efficiency. Customers can deploy it as a turnkey platform, completing training and inference tasks with fewer GPUs.

Due to极致 co-design at the 6-chip, tray, rack, Pod, data center, and software levels, the Rubin platform achieves a significant drop in training and inference costs. Compared to the previous generation Blackwell, training MoE models of the same scale requires only 1/4 the number of GPUs; at the same latency, the token cost for large MoE models is reduced to 1/10th.

The NVIDIA DGX SuperPOD using the DGX Rubin NVL8 system was also announced.

Leveraging the Vera Rubin architecture, NVIDIA is working with partners and customers to build the world's largest, most advanced, and lowest-cost AI systems, accelerating the mainstream adoption of AI.

<极速>

Rubin infrastructure will be available in the second half of this year through CSPs and system integrators, with Microsoft among the first to deploy.

03. Expanding the Open Model Universe: New Models, Data, Major Contributor to Open Source Ecosystem

On the software and model front, NVIDIA continues to increase its open-source investment.

Mainstream development platforms like OpenRouter show that AI model usage grew 20x over the past year, with about 1/4 of the tokens coming from open-source models.

In 2025, NVIDIA was the largest contributor of open-source models, data, and recipes on Hugging Face, releasing 650 open-source models and 250 open-source datasets.

NVIDIA's open-source models rank at the top in various leaderboards. Developers can not only use these open-source models but also learn from them, continuously train, expand datasets, and use open-source tools and documented techniques to build AI systems.

Inspired by Perplexity, Jensen Huang observed that Agents should be multi-model, multi-cloud, and hybrid-cloud, which is also the basic architecture of Agentic AI systems, adopted by almost all startups.

With the open-source models and tools provided by NVIDIA, developers can now customize AI systems and use the most cutting-edge model capabilities. NVIDIA has integrated the above framework into "blueprints" and integrated them into SaaS platforms. Users can achieve rapid deployment with blueprints.

In a live demo case, this system can automatically judge whether a task should be handled by a local private model or a cloud frontier model based on user intent. It can also call external tools (like email API, robot control interface, calendar service, etc.) and achieve multimodal fusion, uniformly processing information like text, voice, images, and robot sensor signals.

These complex capabilities were absolutely unimaginable in the past but have now become trivial. Similar capabilities are available on enterprise platforms like ServiceNow and Snowflake.

04. Open-Sourcing Alpha-Mayo Model, Enabling Autonomous Vehicles to "Think"

NVIDIA believes that Physical AI and robotics will ultimately become the world's largest consumer electronics segment. Everything that can move will eventually become fully autonomous, powered by Physical AI.

AI has gone through the stages of perceptual AI, generative AI, and Agentic AI, and is now entering the era of Physical AI, where intelligence enters the real world. These models can understand physical laws and generate actions directly from perception of the physical world.

But to achieve this goal, Physical AI must learn the common sense of the world — object permanence, gravity, friction. The acquisition of these capabilities will rely on three computers: the training computer (DGX) for building AI models, the inference computer (robot/vehicle chip) for real-time execution, and the simulation computer (Omniverse) for generating synthetic data and verifying physical logic.

The core model among these is the Cosmos world foundation model, which aligns language, images, 3D, and physical laws, supporting the full pipeline from simulation to training data generation.

Physical AI will appear in three types of entities: structures (like factories, warehouses), robots, and autonomous vehicles.

Jensen Huang believes that autonomous driving will be the first large-scale application scenario for Physical AI. Such systems need to understand the real world, make decisions, and execute actions, requiring extremely high safety, simulation, and data requirements.

To this end, NVIDIA released Alpha-Mayo, a complete system comprising open-source models, simulation tools, and Physical AI datasets, to accelerate the development of safe, reasoning-based Physical AI.

Its product portfolio provides basic building blocks for global automakers, suppliers, startups, and researchers to construct L4 autonomous driving systems.

Alpha-Mayo is the industry's first model that truly enables autonomous vehicles to "think," and this model is now open-sourced. It works by breaking down problems into steps, reasoning about all possibilities, and choosing the safest path.

This reasoning-based task-action model enables the autonomous driving system to solve complex edge scenarios it has never encountered before, such as a busy intersection with failed traffic lights.

Alpha-Mayo has 10 billion parameters, large enough to handle autonomous driving tasks, yet lightweight enough to run on workstations built for autonomous driving researchers.

It can receive text, surround-view camera feeds, vehicle historical state, and navigation input, and output driving trajectories and reasoning processes, allowing passengers to understand why the vehicle took a certain action.

In the promotional video played live, driven by Alpha-Mayo, the autonomous vehicle can autonomously complete pedestrian avoidance, predict oncoming left-turn vehicles and change lanes to avoid them, all with 0 intervention.

Huang announced that the Mercedes-Benz CLA equipped with Alpha-Mayo is already in production and was just rated the world's safest car by NCAP. Every line of code, chip, and system is safety-certified. The system will launch in the US market and will introduce stronger driving capabilities later this year, including highway hands-off driving and end-to-end autonomous driving in urban environments.

NVIDIA also released part of the dataset used to train Alpha-Mayo and the open-source reasoning model evaluation simulation framework Alpha-Sim. Developers can fine-tune Alpha-Mayo with their own data or use Cosmos to generate synthetic data, training and testing autonomous driving applications on a combination of real and synthetic data. Additionally, NVIDIA announced the NVIDIA DRIVE platform is now in production.

NVIDIA announced that global robotics leaders like Boston Dynamics, Franka Robotics, Surgical手术机器人, LG Electronics, NEURA, XRLabs, and Zhiyuan Robotics are all building on NVIDIA Isaac and GR00T.

Huang also announced the latest collaboration with Siemens. Siemens is integrating NVIDIA CUDA-X, AI models, and Omniverse into its portfolio of EDA, CAE, and digital twin tools and platforms. Physical AI will be widely used throughout the entire process from design and simulation to production manufacturing and operations.

05. Conclusion: Embracing Open Source with the Left Hand, Making Hardware Systems Irreplaceable with the Right Hand

As the focus of AI infrastructure shifts from training to large-scale inference, platform competition has evolved from single-point computing power to systems engineering covering chips, racks, networks, and software. The goal is转向 delivering maximum inference throughput at the lowest TCO. AI is entering a new stage of "factory-like operation."

NVIDIA places great emphasis on system-level design. Rubin achieves improvements in both training and inference performance and economics and can serve as a plug-and-play replacement for Blackwell, enabling a seamless transition from Blackwell.

In terms of platform positioning, NVIDIA still believes training is crucial because only by quickly training the most advanced models can the inference platform truly benefit. Therefore, NVFP4 training was introduced in the Rubin GPU to further improve performance and reduce TCO.

Simultaneously, this AI computing giant continues to significantly strengthen network communication capabilities in both scale-up and scale-out architectures and treats context as a key bottleneck, achieving co-design of storage, network, and computation.

NVIDIA is vigorously pursuing open source on one hand, while on the other hand making its hardware, interconnects, and system design increasingly "irreplaceable." This strategic closed loop of continuously expanding demand, incentivizing token consumption, promoting inference scaling, and providing cost-effective infrastructure is building an even more impregnable moat for NVIDIA.

Jensen Huang Announces 8 New Products in 1.5 Hours, NVIDIA Fully Bets on AI Inference and Physical AI

Abstract

01. New AI Supercomputer: 6 Self-Developed Chips, Single Rack Computing Power Reaches 3.6 EFLOPS

02. Three Major New Releases Drastically Improve AI Inference Efficiency: New CPO Device, New Context Storage Layer, New DGX SuperPOD

1. NVIDIA Spectrum-X Co-Packaged Optics for Ethernet

2. NVIDIA Inference Context Memory Storage Platform

3. NVIDIA DGX SuperPOD Built on Vera Rubin

03. Expanding the Open Model Universe: New Models, Data, Major Contributor to Open Source Ecosystem

04. Open-Sourcing Alpha-Mayo Model, Enabling Autonomous Vehicles to "Think"

05. Conclusion: Embracing Open Source with the Left Hand, Making Hardware Systems Irreplaceable with the Right Hand

Related Questions

Related Reads

Sequoia Interview with Hassabis: Information is the Essence of the Universe, AI Will Open Up Entirely New Scientific Branches

Morgan Stanley 2026 Semiconductor Report: Buy Packaging, Buy Testing, Buy China Chips, Avoid Traditional Tracks

Circle：Sluggish Market? The Top Stablecoin Stock Continues to Expand

Tech Stocks' Narrative Is Increasingly Relying on Anthropic

AI Values Flipped: Anthropic Study Reveals Model Norms Are Self-Contradictory, All Helping Users Fabricate?

Trading

Hot Articles

Audiera: The AI Agent Network Powering the Web4 Entertainment Economy

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

Discussions

Top Questions