Crossing the 'Memory Wall': The Wafer-Level Revolution and Computing Power Routes in the AI Inference Era

marsbitXuất bản vào 2026-06-05Cập nhật gần nhất vào 2026-06-05

Tóm tắt

In 2026, a historic shift occurred in AI as major cloud providers' inference spending surpassed training spending for the first time, signaling a move from "building large models" to "using large models." This shifts the core challenge from computing power to the "memory wall"—the bottleneck of data movement (model weights, activations, KV Cache) between external DRAM and processors, where energy and latency from data transfer far exceed computation itself. Companies like Nvidia face GPU idle time due to bandwidth limits. In contrast, Cerebras Systems adopts a radical "wafer-scale" approach with its Wafer-Scale Engine (WSE). Instead of cutting a silicon wafer into many chips, Cerebras uses almost the entire wafer as one massive chip (WSE-3). This design provides 44GB of on-chip SRAM, delivering memory bandwidth thousands of times higher than traditional HBM (e.g., 21 PB/s vs. Nvidia B200). For LLM inference, weights are streamed layer-by-layer from external MemoryX storage to the chip, avoiding HBM bottlenecks. This results in token generation speeds 1.5–5 times faster than Nvidia's B200 in some models and significant advantages in first-token latency and long-context tasks. Additionally, Cerebras's architecture offers much lower interconnect power consumption (0.15 pJ/bit vs. GPU's ~10 pJ/bit). However, Cerebras faces challenges: SRAM scaling has slowed with advanced nodes, limiting future capacity gains; the chip requires specialized liquid cooling and custom software sta...

In 2026, the global development of AI reached a landmark inflection point—the capital expenditure on inference by hyperscale cloud vendors historically exceeded that on training for the first time. The industry's anchor shifted from 'training large models' to 'using large models,' fundamentally flipping the structure of computing demand.

In the training era, the core challenge of computing power was 'double-precision floating-point and cluster scale'; entering the inference era, the core challenge became 'memory bandwidth and communication latency.'

The bottleneck for large model inference is no longer merely computation, but data movement—model weights, intermediate activations, and KV Cache need frequent interaction between off-chip DRAM (like HBM) and GPUs. The larger the model, the higher the energy consumption and latency of data transfer, ultimately far exceeding the energy consumption of computation itself, thus forming the memory wall.

NVIDIA GPUs have built a solid fortress with CUDA and NVLink, but still cannot avoid GPU idling caused by bandwidth bottlenecks.

A domestic large model company, Zhipu, conducted a simple experiment: In a 512-card inference cluster, keeping the GPUs, model, and code unchanged, but only upgrading the network bandwidth cap from 200GB/s to 400GB/s, inference throughput directly increased by 10%, and first-token output latency decreased by 19%—the principle is simple: widen the road, and the cars can run faster.

However, non-GPU architectures represented by Cerebras seem to be tearing an opening in this memory wall.

Size comparison between Cerebras WSE-3 chip and NVIDIA B200 GPU

The Essence of Cerebras: A Near-Memory Computer Based on SRAM

Cerebras Systems was founded in Silicon Valley by Andrew Feldman and others. The early founding team all came from SeaMicro, a low-power microserver company that was later acquired by AMD. Subsequently:

In 2015, the founding team established the 'wafer-scale computing' route.

In 2016, they completed registration and Series A financing, entering a stealth R&D phase.

In 2019, they released their first product, the WSE-1 chip and CS-1 system, based on TSMC's 16nm process.

In 2021, they released the second-generation product, based on TSMC's 7nm process.

In 2024, they released the third-generation product (WSE-3 / CS-3), based on TSMC's 5nm process. Both the chip and system are manufactured entirely in the USA, making it a genuinely pure US-made chip system.

CS-3 system configuration, containing 1 WSE-3 chip

Cerebras's Wafer-Scale Engine (WSE) architecture philosophy is simple, direct, yet hits the pain point: trade extreme physical expansion for extreme compression of data movement latency.

Ordinary chips slice a wafer into many small chips; NVIDIA GPUs follow this approach. Cerebras does the opposite: don't slice, directly turn nearly the entire wafer into one giant chip, called the Wafer-Scale Engine (WSE).

Traditional chips are formed by cutting a 300mm diameter wafer into hundreds of small chips; Cerebras chooses to keep the entire wafer intact, using it directly as the whole chip. The latest WSE-3 boasts 4 trillion transistors, 900,000 AI cores, each equipped with 48KB of local SRAM, giving the entire chip 44GB of on-chip SRAM, providing 21 PB/s of on-chip memory bandwidth and 214 Pb/s of fabric bandwidth—thousands of times the bandwidth of traditional HBM.

Cerebras WSE's memory bandwidth is 2625 times that of NVIDIA's B200 packaged chip, breaking the memory bandwidth bottleneck in large model inference scenarios.

In Cerebras's architecture, model weights are never stored on the SRAM but reside in off-chip storage (MemoryX) and are transferred layer by layer to the giant chip. The approach involves separating the storage of neural network model weights from the computing units.

All model weights are stored externally in the MemoryX memory extension module. The weights required for computing each layer of the network are transmitted layer by layer to the CS-3 system on demand. Weights are stored in the DRAM and flash memory of MemoryX and transmitted to the CS-3 system at full bandwidth rates. These weights are not stored in the CS-3 system—not even temporarily cached—CS-3 relies on the core's underlying dataflow mechanism to complete computations.

Leveraging its wafer-scale architecture, Cerebras demonstrates barrier-breaking advantages in LLM inference constrained by memory bandwidth. When generating tokens sequentially, weights are streamed layer by layer from off-chip MemoryX to CS-3. For different models, the token rate is 1.5 to 5 times that of NVIDIA's B200.

NVIDIA DGX B200 GPU versus Cerebras CS-3 chip, token rate comparison when running different large models

Its core advantage lies in: The 44GB of on-chip SRAM in CS-3 provides 21 PB/s of ultra-high bandwidth (2625 times that of B200) and 214 Pb/s of interconnect bandwidth, freeing weight streaming from HBM interface limitations. Therefore, it performs exceptionally well in TTFT (Time To First Token), long-context, and agent workload scenarios.

Although weights are external to MemoryX, loaded layer by layer on demand, and not cached on-chip, CS-3 relies on the core's dataflow mechanism to perform lossless full FP16 precision computations in SRAM; leveraging linear performance scaling, it also unleashes astonishing total throughput in multi-user concurrent inference.

Besides bandwidth, there is also a power advantage. Recently, in a speech, Sutong Liu, Chairman of InnoLight, mentioned that customers' requirement for optical modules is 1 pJ/bit, while the current level is 10 pJ/bit. In Cerebras chips, the interconnect power consumption is only 0.15 pJ/bit, whereas the current GPU interconnect power consumption is 10 pJ/bit.

Bandwidth and power consumption comparison between Cerebras interconnect and GPU interconnect architectures

Thus, if Cerebras's wafer-scale large-chip architecture becomes mainstream for AI inference or even training, it might significantly suppress and structurally alter the shipment volumes of traditional optical modules and CPO (Co-Packaged Optics). The core logic is: the high demand for optical modules and CPO essentially aims to solve the bandwidth bottleneck of 'chip-to-chip interconnect' and 'node-to-node interconnect' in GPU clusters; Cerebras's architecture precisely solves the problem by 'eliminating distributed interconnects.'

Counterintuitive: The 'True and False' Fatal Flaws of Wafer-Scale Large Chips

The core of chips always lies in Trade-Off. To achieve extreme on-chip SRAM bandwidth, Cerebras also brings some issues.

Low Yield?

Quite the opposite. The size of a single AI core is reduced to 0.05 square millimeters (1% the size of a single H100 compute core), resulting in higher yield. By routing on-chip, defective cores can be disabled and bypassed, improving defect tolerance by 100 times compared to traditional multi-core processors. The chip actually has 1 million AI cores, but considering yield, it is advertised as having 900,000 AI cores.

Only Good at Inference, Not at Training?

In the years following Cerebras's founding, training was the mainstream topic, so the company focused heavily on training. It's only after inference demand surged that people realized its advantages in inference were more pronounced.

In reality, simplified distributed computing also brings advantages like reduced code complexity and lower communication overhead.

Training a 175-billion-parameter model on 4000 GPUs typically requires about 20,000 lines of distributed training code.

Cerebras achieves equivalent training with 565 lines of code—the entire model can be placed on the wafer without dealing with data parallelism complexity.

SRAM Scaling is Dead; Core Advantage Faces a Physical Ceiling.

The third-generation product is based on TSMC's 5nm, and its SRAM capacity only increased by 10% compared to the second-generation product based on TSMC's 7nm. Beyond 5nm, SRAM cell area hardly shrinks with process node advancement.

This means Cerebras can no longer significantly increase its core advantage (SRAM capacity) by upgrading TSMC's process nodes (e.g., from 5nm to 3nm) as it did before.

Limited by wafer size, cooling capability, and manufacturing costs, on-chip storage resources like SRAM are difficult to scale linearly with computing cores, encountering a resource ratio bottleneck. This almost blocks its evolutionary path.

Technical specifications of Cerebras's three product generations

The Triple Purgatory: Cooling, Process, and Ecosystem.

The entire wafer concentrates heat, leading to high heat flux density, necessitating reliance on customized data centers and dedicated liquid cooling systems. Moreover, ecosystem compatibility means customers must adapt to its customized software stack, with weak compatibility with existing general-purpose programming frameworks like CUDA, leading to high software porting and adaptation costs.

Low Off-Chip Bandwidth Creates an Expansion 'Island'.

Due to the limitations of wafer-scale physical design, the number of I/O pins that can be led out from the edge of the WSE is extremely limited, resulting in an I/O bandwidth of only 150GB/s. Compared to NVIDIA NVLink's 1.8TB/s bi-directional bandwidth, it's like a snail. This makes it extremely difficult for WSE to scale out at high speeds. Although Cerebras's SwarmX interconnect performs decently in multi-system combinations, in the face of super-large models requiring high-speed multi-chip interconnection, the extremely low off-chip bandwidth becomes a structural physical shackle.

Route Competition: Big Tech In-House Development—How Much Window Does Cerebras Have Left?

Big tech companies have multiple parallel paths to address 'inference requiring higher bandwidth + lower latency,' not just wafer-scale. They are encircling and suppressing the technological dividends of startups like Cerebras through three concurrent approaches.

1 In-House ASIC Development

Google TPU v8 has already split into training-specific and inference-specific versions; AWS Trainium 4 is on the way; Microsoft Maia is already in use within Azure, built on TSMC's 3nm process, with native FP8/FP4 tensor cores, a redesigned memory system equipped with 216GB HBM3e, and 272MB on-chip SRAM; even Anthropic has begun evaluating in-house inference chip development.

This path is highly probable and will directly cause the TAM (Total Addressable Market) for 'third-party inference procurement' in 2028 to be compressed by 10% to 25%.

2 Process Generalization of the Standard Packaging Route

This is the most direct dimensional reduction attack on Cerebras.

TSMC's SoW (System-on-Wafer) is already widely open to customers, and CoWoS 9.5x interposer will launch in 2027.

What these two products do—stitching multiple dies at the wafer level—essentially makes Cerebras's physical process generic and accessible to all.

NVIDIA's Vera Rubin will enter this ecosystem in the second half of 2026.

Although Cerebras's own cross-reticle stitching is exclusive, the exclusivity window lasts at most 2 to 3 years. Beyond 2027-2028, its process barrier will be diluted by TSMC's advanced packaging.

3 Breakthrough of Optical Interconnect/Optical Computing

The interconnect and memory wall of electronic chips have reached their limits. Photonics' high bandwidth, low latency, and zero crosstalk are the ultimate solution.

Optical routes represented by Lumentum are rising. The biggest advantage of wafer-scale is on-chip computing, but models will inevitably grow larger, making high-speed interconnect beyond wafer scale a necessity.

With the maturity of CPO (Co-Packaged Optics) and Optical Interconnects, it's highly likely we will see optical I/O directly introduced into WSE wafers, breaking the shackles of electrical interconnect. NVIDIA might also acquire companies with specific architectural advantages like LPU (e.g., Groq), combine them with optical interconnects, and develop wafer-scale systems compatible with existing NV super-node software.

Sprinting on the Cliff: Cerebras's Business and Delivery

Cerebras is currently facing a cliff-edge sprint forced by massive orders.

Deals with leading clients like OpenAI are forcing Cerebras to transform from a chip company into a new type of cloud service provider. It no longer just sells hardware but needs to lock in and build massive data center power and facilities in the short term.

According to contract requirements, Cerebras needs to deliver 250MW of data center capacity annually from 2026 to 2028. However, wafer-scale systems have extremely high requirements for data center rooms and cannot be directly placed into traditional air-cooled IDCs. Currently, Cerebras's progress in preparing data center capacity is significantly behind the contract requirements.

From tape-out to factory construction, from power approval to cooling system deployment—this is a quagmire of heavy assets and long cycles.

Epilogue: Left or Right?

Returning to the initial proposition, as the inflection point for inference computing power has arrived, the core of computing architecture always lies in trade-offs.

There is no absolute right or wrong, only the relatively optimal solution for the most important workloads. And workloads are already changing.

Cerebras goes left, choosing extreme physical optimization, trading the entire wafer and massive SRAM for extreme low latency for a single task, which is unbeatable in scenarios extremely sensitive to first-token latency.

NVIDIA goes right, choosing to maintain generality, using HBM + NVLink + massive cluster throughput to handle ever-changing workloads, responding with constancy to change.

The winds are shifting, and the road ahead is uncertain. It is precisely this dual uncertainty of technology and business that breeds the possibility of disruption. In the torrent of computing power flowing towards AGI, it is still too early to draw conclusions—because of uncertainty, there is opportunity.

This article is from the WeChat public account "Garlic Particle Machine Research Institute," author: Pili Youxia (Thunderbolt Ranger)

Câu hỏi Liên quan

QWhat is the key structural shift in the global AI industry in 2026, as identified in the article, and what does it signify?

AIn 2026, the global AI industry reached a pivotal inflection point where for the first time, capital expenditure on inference by hyperscale cloud providers surpassed that on training. This marks a fundamental shift in the focus of the industry from 'forging large models' to 'using large models', fundamentally flipping the structure of computing demand.

QWhat is the 'memory wall' in the context of large model inference, and how does Cerebras' WSE-3 architecture attempt to overcome it?

AIn large model inference, the 'memory wall' refers to the bottleneck caused by the energy consumption and latency of frequently moving data (model weights, intermediate activations, KV Cache) between off-chip DRAM (like HBM) and the GPU, which eventually far exceeds the energy cost of computation itself. Cerebras' WSE-3 architecture attacks this by using an entire wafer as a single, massive chip, packing 44GB of on-chip SRAM. This provides 21 PB/s of on-chip memory bandwidth, which is 2625 times the bandwidth of Nvidia's B200 GPU, drastically reducing data movement latency and breaking the memory bandwidth bottleneck for inference.

QAccording to the article, what are the three main parallel paths that major tech companies are taking to compete with specialized solutions like Cerebras?

AMajor tech companies are pursuing three parallel paths to address the need for higher bandwidth and lower latency in inference, thereby challenging specialized players: 1) Developing their own ASIC chips (e.g., Google TPU v8, AWS Trainium 4, Microsoft Maia). 2) Adopting standardized packaging processes like TSMC's SoW (System-on-Wafer) and CoWoS, which essentially democratize wafer-scale integration techniques. 3) Exploring breakthroughs in optical interconnects/computing (e.g., CPO, Optical Interconnects) to overcome the limits of electrical interconnects.

QWhat are the main potential weaknesses or challenges of the Cerebras wafer-scale chip (WSE) architecture, as outlined in the article?

AThe article highlights several challenges for Cerebras' wafer-scale architecture: 1) A physical scaling limit for SRAM, as SRAM cell area barely shrinks with process nodes beyond 5nm, blocking a key path for increasing its core advantage. 2) Significant thermal management challenges requiring specialized liquid-cooled data centers. 3) A weak software ecosystem and compatibility with existing frameworks like CUDA, leading to high adaptation costs. 4) Very low off-chip I/O bandwidth (150GB/s) compared to alternatives like NVLink, making the system a potential 'island' and hindering high-speed scaling for very large models.

QWhat critical business challenge is Cerebras currently facing due to large customer contracts, according to the article?

AFacing massive orders from leading customers like OpenAI, Cerebras is being forced into a 'cliff-side sprint' to transition from a chip company to a new type of cloud service provider. Its contracts reportedly require the delivery of 250MW of data center capacity annually from 2026 to 2028. However, building specialized data centers for its wafer-scale systems, which require unique power, cooling (liquid), and facility approvals, is a heavy-asset, long-cycle process where Cerebras' progress is already significantly lagging behind the contractual requirements.

Nội dung Liên quan

43 phút của Trump: Cốt truyện người mạnh mất kiểm soát, chiến tranh truyền thông leo thang

Trong bài diễn văn kéo dài 43 phút sau hơn một tuần vắng bóng công chúng, Tổng thống Mỹ Donald Trump đã cố gắng thể hiện hình ảnh một nhà lãnh đạo mạnh mẽ và kiểm soát. Tuy nhiên, phần lớn thời gian được dành cho các chủ đề phụ như hồ phản chiếu trên National Mall, so sánh quy mô đám đông với Martin Luther King Jr., cùng những lời công kích nhắm vào phóng viên, đảng Dân chủ và các thành phố lớn. Ông còn ký sắc lệnh hành pháp hủy bỏ các biện pháp bảo vệ việc làm cho hàng nghìn công chức cấp cao liên bang, một động thái có thể gia tăng sự phụ thuộc vào lòng trung thành cá nhân hơn là năng lực chuyên môn trong chính phủ. Bài viết nêu bật trạng thái cá nhân đầy lo lắng và phòng thủ của Trump, thể hiện qua việc ông đột ngột kết thúc sự kiện và nhân viên nhanh chóng dọn dẹp hiện trường. Tác giả cũng phân tích cuộc tấn công của Trump vào nữ phóng viên CNN, Kaitlan Collins, coi đó là một phần của nỗ lực làm mất uy tín giới truyền thông. Bài báo cảnh báo về áp lực ngày càng lớn từ quyền lực chính trị và lợi ích thương mại lên các cơ quan báo chí chính thống, dẫn đến cuộc khủng hoảng về tính độc lập biên tập, như trường hợp Scott Pelley của CBS. Thông điệp chính kêu gọi công chúng ủng hộ các nhà báo và phương tiện truyền thông độc lập, những người được coi là lực lượng quan trọng duy trì sự thật khi các thể chế khác bị xâm phạm. Bài viết kết thúc với viễn cảnh lạc quan khi đề cập đến việc Hạ viện, với sự ủng hộ của một số thành viên đảng Cộng hòa, đã thông qua nghị quyết yêu cầu chấm dứt hành động quân sự ở Iran, cho thấy sự bất đồng ngày càng tăng ngay trong nội bộ đảng của Trump.

marsbit32 phút trước

43 phút của Trump: Cốt truyện người mạnh mất kiểm soát, chiến tranh truyền thông leo thang

marsbit32 phút trước

Kalshi, MTS và tham vọng của a16z

Trí tuệ thị trường dự đoán và tham vọng "Truyền thông Mới" của a16z Bài viết phân tích sự trỗi dậy của thị trường dự đoán (prediction markets), đặc biệt là công ty Kalshi được định giá 220 tỷ USD, dưới góc nhìn chiến lược đầu tư và truyền thông của quỹ mạo hiểm a16z. Tác giả điểm lại lịch sử tư tưởng của thị trường dự đoán, từ học thuyết của Hayek về việc thị trường tổng hợp tri thức phân tán, đến cơ chế khuyến khích của Robin Hanson (LMSR) và ý tưởng chính phủ dựa trên dự đoán (Futarchy). Trọng tâm bài viết nằm ở việc a16z, sau khi đầu tư vào Kalshi, đã định vị lại giá trị cốt lõi của thị trường dự đoán không chỉ là sòng bạc hay sàn giao dịch, mà là một phương tiện truyền thông mới mang lại "cảm giác hiện diện" (presence). Trong một thế giới ngày càng bị che khuất và bất lực, việc dùng tiền thật để đặt cược vào các sự kiện toàn cầu giúp cá nhân tái khẳng định vai trò "người quan sát tối thượng", can thiệp và diễn giải thực tại. Kalshi, theo logic này, sẽ trở thành nơi định đoạt tính xác thực và tầm quan trọng của sự kiện. Bài viết liên kết điều này với tầm nhìn "Truyền thông Mới" của a16z – một hệ thống truyền thông toàn diện từ định hình luận điệu, tài trợ, phát hành sản phẩm đến thu hút khách hàng với tốc độ và cường độ chưa từng có, nhằm "tiếp quản dòng thời gian". Ví dụ điển hình là MTS (Monitoring The Situation), một hãng truyền thông chuyên phát sóng tin tức 24/7 trên Twitter. Kết luận cho rằng sức hút thực sự của Kalshi và lý do định giá khổng lồ của nó nằm ở "trường lực bẻ cong hiện thực" – khả năng tạo ra một thực tại thay thế có sức thuyết phục cao nhờ vào khối lượng giao dịch bằng tiền thật, từ đó trở thành một mảnh ghép quyền lực trong đế chế truyền thông mới của a16z.

marsbit2 giờ trước

marsbit2 giờ trước

Bất Ngờ: Cựu Nhân Sự Trụ Cột Trong Dự Án Chip Của OpenAI Gia Nhập Anthropic

Chuyên gia chip "nhân viên số 002" của OpenAI, Clive Chan, vừa thông báo rời OpenAI để gia nhập Anthropic. Anh là một trong những thành viên sớm nhất của đội ngũ phát triển chip tự thiết kế của OpenAI, tham gia từ giai đoạn hình thành đến nay. Dù đánh giá cao đội ngũ chip tại OpenAI, Clive Chan chia sẻ anh luôn có mong muốn "chinh phục một ngọn núi mới từ chân núi", đó là lý do anh chuyển sang Anthropic. Tại Anthropic, anh ấn tượng với năng lực, giá trị cốt lõi và tham vọng của đội ngũ, đồng thời trải nghiệm cường độ làm việc rất cao. Khi được hỏi về tiến độ dự án chip của OpenAI, Clive Chan đề cập đến thông tin hợp tác công khai giữa OpenAI và Broadcom, với kế hoạch triển khai bắt đầu từ nửa cuối năm 2026. Clive Chan có kinh nghiệm làm việc tại nhiều công ty công nghệ hàng đầu như Tesla, Google, SpaceX trước khi gia nhập OpenAI vào đầu năm 2024. Việc chuyển đổi của anh là một ví dụ nữa cho thấy dòng chảy nhân tài đáng chú ý giữa OpenAI và Anthropic, sau sự kiện nhà nghiên cứu nổi tiếng Andrej Karpathy chuyển sang Anthropic hồi tháng 5. Động thái này càng thu hút sự chú ý khi Anthropic vừa hoàn thành vòng gọi vốn với định giá gần 1.000 tỷ USD.

marsbit2 giờ trước

Bất Ngờ: Cựu Nhân Sự Trụ Cột Trong Dự Án Chip Của OpenAI Gia Nhập Anthropic

marsbit2 giờ trước

a16z chuyển hướng toàn cầu hóa: VC đang trở thành "động lực thúc đẩy" của liên minh công nghệ Mỹ

Biên tập: Thông báo của Ben Horowitz cho thấy một bước chuyển quan trọng trong chiến lược toàn cầu hóa của a16z: họ không chỉ tìm kiếm dự án ở nước ngoài hay mở rộng đầu tư quốc tế, mà còn đặt mình vào khuôn khổ cạnh tranh công nghệ và hợp tác đồng minh rộng lớn hơn. Trong bối cảnh AI, robot, công nghệ quốc phòng, an ninh mạng và tái cấu trúc chuỗi cung ứng trở thành trọng tâm cạnh tranh quốc gia, con đường quốc tế hóa của startup trở nên phức tạp hơn. A16z đang phản ứng với sự thay đổi này thông qua việc thành lập văn phòng Tokyo, bổ nhiệm Anne Neuberger phụ trách các vấn đề toàn cầu, và nâng cấp nhóm quan hệ nhà đầu tư thành nhóm đối tác toàn cầu. Tín hiệu quan trọng nhất là a16z gắn kết mạng lưới toàn cầu của mình với năng lực lãnh đạo công nghệ của "Mỹ và các đồng minh". Đối với a16z, mạng lưới đầu tư mạo hiểm trong tương lai không chỉ giúp nhà sáng lập gọi vốn, tuyển dụng, bán hàng mà còn hỗ trợ họ tiếp cận thị trường trọng điểm, kết nối với chính phủ và các tổ chức chiến lược, cũng như hiểu rõ môi trường chính sách và quy định của các quốc gia khác nhau. Điều này có nghĩa vai trò của các tổ chức đầu tư mạo hiểm hàng đầu đang được định nghĩa lại. Họ không còn chỉ là trung gian vốn, mà là người tổ chức kết nối công ty khởi nghiệp, năng lực quốc gia, nguồn lực ngành, hệ thống đồng minh và vốn toàn cầu. Chiến lược toàn cầu hóa lần này của a16z có thể được xem như một sự chủ động định vị của vốn Silicon Valley trong cuộc cạnh tranh công nghệ toàn cầu mới.

marsbit2 giờ trước

a16z chuyển hướng toàn cầu hóa: VC đang trở thành "động lực thúc đẩy" của liên minh công nghệ Mỹ

marsbit2 giờ trước

Kalshi, MTS và Tham vọng của a16z

Bài viết phân tích tầm quan trọng của thị trường dự đoán (prediction markets), tập trung vào Kalshi, và tầm nhìn của quỹ đầu tư mạo hiểm a16z trong việc xây dựng một đế chế truyền thông mới. Tác giả điểm qua lịch sử tư tưởng đằng sau thị trường dự đoán, từ lý thuyết của Hayek về việc thị trường tổng hợp tri thức phân tán, đến cơ chế khuyến khích của Robin Hanson (LMSR) và ý tưởng "Futarchy". Trọng tâm bài viết là việc a16z đầu tư mạnh vào Kalshi (định giá 220 tỷ USD) và cách họ diễn giải giá trị cốt lõi của nó: mang lại "cảm giác hiện diện" (presence). Trong một thế giới mà con người ngày càng thụ động và xa cách với thực tại, thị trường dự đoán cho phép họ tham gia tích cực bằng cách dùng tiền thật để đặt cược vào các sự kiện, từ đó cảm thấy mình là người quan sát và dự báo lịch sử. a16z coi đây là mảnh ghép quan trọng cho tham vọng "truyền thông mới" của họ - một hệ thống toàn diện từ định hình narrative, tài trợ sản phẩm, đến tiếp cận khách hàng với tốc độ cực cao, nhằm "tiếp quản dòng thời gian". Công ty truyền thông MTS (Monitoring The Situation) là một ví dụ điển hình cho triết lý này. Bài viết kết luận rằng sức mạnh thực sự của Kalshi và thị trường dự đoán nằm ở "trường lực bẻ cong hiện thực" - khả năng định nghĩa tính xác thực và tầm quan trọng của sự kiện thông qua khối lượng giao dịch bằng tiền thật, từ đó giành được quyền giải thích tối cao về tương lai, một thứ quyền lực hiếm khi nằm trong tay một công ty tư nhân.

链捕手2 giờ trước

Giao dịch

Giao ngay

Hợp đồng Tương lai

Bài viết Nổi bật

Làm thế nào để Mua ERA

Chào mừng bạn đến với HTX.com! Chúng tôi đã làm cho mua Caldera (ERA) trở nên đơn giản và thuận tiện. Làm theo hướng dẫn từng bước của chúng tôi để bắt đầu hành trình tiền kỹ thuật số của bạn.Bước 1: Tạo Tài khoản HTX của BạnSử dụng email hoặc số điện thoại của bạn để đăng ký tài khoản miễn phí trên HTX. Trải nghiệm hành trình đăng ký không rắc rối và mở khóa tất cả tính năng. Nhận Tài khoản của tôiBước 2: Truy cập Mua Crypto và Chọn Phương thức Thanh toán của BạnThẻ Tín dụng/Ghi nợ: Sử dụng Visa hoặc Mastercard của bạn để mua Caldera (ERA) ngay lập tức.Số dư: Sử dụng tiền từ số dư tài khoản HTX của bạn để giao dịch liền mạch.Bên thứ ba: Chúng tôi đã thêm những phương thức thanh toán phổ biến như Google Pay và Apple Pay để nâng cao sự tiện lợi.P2P: Giao dịch trực tiếp với người dùng khác trên HTX.Thị trường mua bán phi tập trung (OTC): Chúng tôi cung cấp những dịch vụ được thiết kế riêng và tỷ giá hối đoái cạnh tranh cho nhà giao dịch.Bước 3: Lưu trữ Caldera (ERA) của BạnSau khi mua Caldera (ERA), lưu trữ trong tài khoản HTX của bạn. Ngoài ra, bạn có thể gửi đi nơi khác qua chuyển khoản blockchain hoặc sử dụng để giao dịch những tiền kỹ thuật số khác.Bước 4: Giao dịch Caldera (ERA)Giao dịch Caldera (ERA) dễ dàng trên thị trường giao ngay của HTX. Chỉ cần truy cập vào tài khoản của bạn, chọn cặp giao dịch, thực hiện giao dịch và theo dõi trong thời gian thực. Chúng tôi cung cấp trải nghiệm thân thiện với người dùng cho cả người mới bắt đầu và người giao dịch dày dạn kinh nghiệm.

Tổng lượt xem 523Xuất bản vào 2025.07.17Cập nhật vào 2026.06.02

Thảo luận

Chào mừng đến với Cộng đồng HTX. Tại đây, bạn có thể được thông báo về những phát triển nền tảng mới nhất và có quyền truy cập vào thông tin chuyên sâu về thị trường. Ý kiến của người dùng về giá của ERA (ERA) được trình bày dưới đây.

Crossing the 'Memory Wall': The Wafer-Level Revolution and Computing Power Routes in the AI Inference Era

Tóm tắt

The Essence of Cerebras: A Near-Memory Computer Based on SRAM

Counterintuitive: The 'True and False' Fatal Flaws of Wafer-Scale Large Chips

Route Competition: Big Tech In-House Development—How Much Window Does Cerebras Have Left?

Sprinting on the Cliff: Cerebras's Business and Delivery

Epilogue: Left or Right?

Câu hỏi Liên quan

Nội dung Liên quan

43 phút của Trump: Cốt truyện người mạnh mất kiểm soát, chiến tranh truyền thông leo thang

Kalshi, MTS và tham vọng của a16z

Bất Ngờ: Cựu Nhân Sự Trụ Cột Trong Dự Án Chip Của OpenAI Gia Nhập Anthropic

a16z chuyển hướng toàn cầu hóa: VC đang trở thành "động lực thúc đẩy" của liên minh công nghệ Mỹ

Kalshi, MTS và Tham vọng của a16z

Giao dịch

Bài viết Nổi bật

Làm thế nào để Mua ERA

Thảo luận

Danh mục Phổ biến

Thẻ Nổi bật