He Just Raised 2.7 Billion, and Li Fei-Fei Also Invested

marsbitPublished on 2026-06-20Last updated on 2026-06-20

Abstract

Pete Florence, a former senior research scientist at Google DeepMind and a key contributor to the Vision-Language-Action (VLA) model architecture, is deliberately distancing his startup, Generalist AI, from the trendy "world model" label. He argues that the industry should prioritize concrete goals over buzzwords. His goal is to create robots that can perform a vast range of unseen tasks with high speed and success rates, without needing task-specific training data. Recently, his company raised $400 million (¥2.7 billion) at a $2 billion valuation. Notable investors include NVIDIA's NVentures, Bezos Expeditions, NFDG, as well as Xiaomi co-founder Lin Bin, Zoom founder Eric Yuan, and renowned AI scientist Fei-Fei Li. Florence's approach stems from his academic background at MIT under Professor Russ Tedrake, focusing on understanding the physical world. After joining DeepMind, he developed models like Transporter Network and co-created the VLA framework. He left in 2025 to found Generalist AI. The company has launched two models: GEN-0, which demonstrated that scaling laws apply to physical motion, and GEN-1. GEN-1 was trained on over 500,000 hours of physical interaction data collected via a specialized wearable device. It achieves a 99% success rate on precise mechanical tasks like folding boxes and maintains performance three times faster than its predecessor. Florence believes GEN-1 is reaching a commercial utility threshold similar to the GPT-3 inflection point. The su...

In today's venture capital market, 'World Model' is undoubtedly a buzzword among buzzwords. We see almost daily news of new 'World Model' companies completing financing rounds, with valuations soaring rapidly and investor rosters filled with big names. Furthermore, in the press releases accompanying these funding announcements, a fact is often repeatedly emphasized: a qualified super-intelligent agent shouldn't rely solely on data feeding to gain capabilities; it should actively understand the physical world like humans do.

However, after starting his venture, Pete Florence wrote a lengthy open letter, beginning with this statement: "Do not label my company as a world model."

This is truly turning the tables. Because Pete Florence is far from just being an 'entrepreneur.' Before starting his company, Pete Florence worked at Google's DeepMind team, rising from a regular researcher to a senior research scientist. He was one of the core developers behind DeepMind's robotics control model Gemini Robotics, released in 2025. But perhaps his most influential achievement during this time was introducing, along with his colleagues in 2023, a novel robotics model architecture called "Vision-Language-Action Models" to the world.

(Pete Florence, Source: Social Media)

Yes, that's right. If 'World Model' or 'VLA' is currently the most cutting-edge and consensual direction, then Pete Florence is undoubtedly a pioneer on this path. For someone like him to take the lead in discarding the 'world model' label is truly shocking.

And now, the shock factor has doubled. Recently, the embodied AI company Generalist AI, founded by Pete Florence, completed a new funding round totaling $400 million (approximately 2.7 billion RMB), with a valuation of $2 billion (approximately 13.55 billion RMB). Investors in this round include Nvidia's NVentures, the NFDG fund co-managed by prominent angel investors Nat Friedman and Daniel Gross, Bezos's family office Bezos Expeditions, Xiaomi co-founder Bin Lin, Zoom founder Eric Yuan, and Li Fei-Fei, one of the most representative scientists in the world model field.

"Goals" Are More Important Than "Labels"

Why does Pete Florence, as one of the main architects of the world model concept, so strongly resist the 'world model' label? Why would Li Fei-Fei, as a leading scholar in the world model domain, use real money to support such a publicly 'heretical' maverick? The story might begin in 2019.

At that time, Pete Florence was pursuing his Ph.D. in Computer Science at MIT, focusing on robot manipulation, computer vision, and natural language processing—from this background, Florence was 'orthodox,' with a conventional research direction and academic pedigree, not a 'free spirit' who needed to rely on being 'unique' to secure resources. The twist, however, was that MIT assigned him a supervisor named Russ Tedrake.

Who is Russ Tedrake? First, he is undoubtedly a top academic. In 2019, he held positions as an MIT professor of Electrical Engineering and Computer Science and Director of the Robot Locomotion Group at the Computer Science and Artificial Intelligence Laboratory (CSAIL). During the famous DARPA Robotics Challenge, he also led the MIT team. Outside academia, he served as Vice President of Robotics Research at the Toyota Research Institute. In short, Russ Tedrake is one of the top scholars in robotics, with ample resources to help the young Pete Florence realize his academic dreams.

However, in Russ Tedrake's own perception, what fascinates him is not programming code, but 'physics.' In a self-introduction, Tedrake recalled that his entry into computer science academia stemmed from seeing 'rich dynamic behaviors' while studying 'bipedal humanoid robots,' sparking a deep interest in 'complex fluid dynamics control.' Therefore, unlike other researchers who might start by having robots pick apples or fold blankets, his initial research topics involved controlling 'aircraft in post-stall flight or flapping-wing flight' and 'high-speed navigation through dense obstacles.'

This background destined Russ Tedrake to place great importance on 'understanding the physical world.' The MIT website introduces his academic focus as: "His research is focused on finding elegant control solutions for interesting (underactuated, stochastic, and/or hard-to-model) dynamical systems, and on building those systems to experimentally verify his findings. He is particularly interested in the connection between mechanics (especially non-smooth mechanics) and machine learning/optimization theory, to achieve robust control design for complex mechanical systems."

Influenced by this environment, Pete Florence naturally became a 'physics faction' within computer science. For instance, his most representative doctoral work was a paper titled "Self-Supervised Correspondence in Vision and Robotic Manipulation." This paper proposed a method using imitation learning, allowing robots to complete challenging manipulation tasks with just 50 demonstrations, generalize to different object categories, and adapt to configurations of deformable objects. This paper consequently won the 2020 IEEE Robotics and Automation Society Best Paper Award.

Of course, belonging to any 'faction' isn't the key point; what matters is that Pete Florence developed a distinct way of thinking under this influence. Many researchers are accustomed to starting with existing technology, exploring its possibilities through experiments, and then determining its application scenarios. Pete Florence believes the correct order should be "first set a specific goal," then design the technological path.

After joining Google DeepMind, Pete Florence proceeded along this direction. His first major work was the initial robotics model architecture, Transporter Network, launched by Google in 2021. In the paper announcing the model, Florence stated that organizing items should be a fundamental skill, but for robots, completing this action involves "high-level and low-level perceptual reasoning," requiring consideration of where a book should be placed, the stacking order, and ensuring the edges align to form neat piles.

Transporter Network was precisely a model architecture launched to "make simple actions simple," enabling robots to perform various manipulations universally based on vision, with faster training speeds and lower dependency on training environments.

The subsequent co-development and release of the VLA architecture with the DeepMind team in 2023 was a natural extension of this line of thinking. In that seminal paper that ushered in the current 'world model' boom, the authors expressed their hope that the VLA architecture would "significantly enhance generalization to novel objects, interpret instructions not seen in the robot's training data (e.g., placing an object on a specific number or icon), and perform basic reasoning based on user instructions (e.g., picking up the smallest or largest object, or the one closest to another object)."

Returning to the opening question: Why does Pete Florence, as a main architect of world models, so strongly resist the label? The answer lies here: Pete Florence believes "goals" are more important than "labels."

In his view, much of the current enthusiasm around world models is actually 'concept-driven.' For instance, a significant portion of the excitement can be attributed to the capital market's thrill at discovering non-consensus within a hot trend. Moreover, if the goal is truly to integrate robots into our work and lives to create productivity, then building a 'world model' is clearly not the objective. The real goal should be robots achieving extremely high success rates and speeds on a wide variety of never-before-seen tasks, entirely without any task-specific data.

And this is precisely why Pete Florence decided to leave Google DeepMind to start his own company. At Nvidia's GTC conference in 2025, Pete Florence first appeared in the public eye as the co-founder and CEO of Generalist AI. He stated: "We are determined to build robots that can do anything... Imagine a world where the marginal cost of physical labor drops to zero."

99% Success Rate

Beyond his 'heretical' technical philosophy, Pete Florence's entrepreneurial journey also seems unconventional.

Theoretically, an entrepreneur with such a resume would undoubtedly be highly sought after by VCs in the current climate. Examples like Yann LeCun, Ilya Sutskever, and Mira Murati are proof—their companies raised over $1 billion in seed rounds upon founding (or even before registration). However, Generalist AI, at its initial stage, only accepted investments from a handful of institutions like Nvidia, Bezos's family office, and NFDG. If Nvidia's venture arm, NVentures, hadn't organized a 'portfolio company roundtable' at the 2025 GTC conference, few would have known he had left to start a company.

Why is this? The most likely answer is Pete Florence's active choice. As mentioned, Florence joined Google DeepMind right after graduation, working there from 2019 until 2025 without any other work experience. That means Generalist AI is his first entrepreneurial endeavor, warranting extreme caution.

Indeed, at his first public appearance as an entrepreneur at Nvidia's GTC 2025, Pete Florence clearly demonstrated this 'caution.' Beyond stating he was building 'robots,' he revealed no specific business direction, directly stating, "We are still in stealth mode."

It wasn't until November 2025 that people got their first glimpse of Generalist AI's specific work. In November 2025, Generalist AI released their first-generation embodied AI model, GEN-0. In the official introduction, Generalist AI stated that GEN-0 combines the strengths of vision and language models while achieving a breakthrough—Gen-0 captures human-level reflexes and physical common sense.

In simple terms, its capabilities improve continuously with model scale and training data, breaking through the limitations of previous small models; it can think and act simultaneously like humans, making fast, natural reactions in real physical environments; it is natively adaptable to different robot types without extra modifications; more importantly, it leverages massive real-world operational data, no longer constrained by data scarcity, and allows flexible adjustment of training data composition. Many tech media outlets pointed out that GEN-0 demonstrated that the mathematical 'scaling laws' driving large language models like ChatGPT also apply to physical motion.

However, GEN-0 was not perfect. For example, it did not solve the dataset problem plaguing the embodied AI field. Therefore, by April 2026, Generalist AI rapidly iterated to a new version, GEN-1.

("Robotic Hands," Source: Generalist AI Social Media)

To address the dataset issue, Generalist AI developed a wearable device to capture minute human hand movements and visual information during manual tasks. Generalist AI stated that during GEN-1's development, they collected over 500,000 hours of 'petabyte-scale physical interaction data' using these robotic hands to train their physical model. After sufficient training, Generalist AI claimed that GEN-1 achieved a success rate of 99% on repetitive but precise mechanical tasks like folding cardboard boxes, packing phones, and maintaining robot vacuum cleaners, with speeds about three times faster than the previous GEN-0 model, and requiring only about an hour to achieve this level of performance.

Thus, Generalist AI proudly announced that GEN-1's physical model is approaching an inflection point similar to GPT-3, with performance on some tasks beginning to "reach the level required for deployment in commercial utility environments," and "we can anticipate that each new generation of models will bring a growing suite of increasingly complex new tasks that can be mastered."

In the official blog, Pete Florence pointed out that the development process of GEN-1 is the best illustration of his personal technical philosophy: First, he set a rational goal—robots achieving extremely high success rates and speeds on a wide variety of never-before-seen tasks, entirely without task-specific data. Then, based on this goal, he designed a solution path that allows using a small amount of robot data for specific tasks (call it X) to achieve high-level execution, and continuously reduces X while improving performance.

Discussing this, we now have an answer to the earlier question. Whether Generalist AI's product is called a 'world model' is no longer important. If you look at the embodied AI industry and believe in the large-scale integration of robots into actual production, then Generalist AI is indeed a choice worth betting on. And Generalist AI's latest funding round was indeed swiftly finalized within two months of GEN-1's release.

According to reports, existing investors Nvidia, Bezos Expeditions, and NFDG all chose to reinvest, and at increased amounts. Additionally, new investors include Xiaomi co-founder Bin Lin, Zoom founder Eric Yuan, Chinese-American scientist Li Fei-Fei, as well as institutional investors like Radical Ventures, 8VC, Union Square Ventures, Hanabi Capital, and Norwest.

In other words, as of June 2026, Pete Florence no longer needs to prove himself. At the very least, the bold claims he made over the years—like when he said on a podcast in 2025, shortly after starting the company: "A generalist robot isn't about being mediocre at everything, but being professional enough on real tasks to be genuinely useful"—are well on their way to being "fulfilled one by one."

This article is from the WeChat public account "投中网" (Touzhong Wang), author: Pu Fan

Related Questions

QWho is Pete Florence and why did he write a letter rejecting the 'World Model' label for his company?

APete Florence is a former senior research scientist at Google DeepMind, a key developer of Gemini Robotics, and a pioneer of the Vision-Language-Action Models (VLA) architecture. He wrote a letter rejecting the 'World Model' label for his company, Generalist AI, because he believes focusing on a specific, tangible goal—creating robots that can perform any task with high success rates without task-specific data—is more important than aligning with popular but vague industry trends like 'World Models'.

QWhat was the recent funding round for Generalist AI, and which notable investors participated?

AGeneralist AI recently raised $4 billion in a new funding round, valuing the company at $20 billion. Notable investors included NVentures (Nvidia), NFDG (managed by Nat Friedman and Daniel Gross), Bezos Expeditions, Xiaomi co-founder Bin Lin, Zoom founder Eric Yuan, and renowned AI scientist Fei-Fei Li.

QHow did Pete Florence's academic background and his advisor Russ Tedrake influence his approach to robotics?

APete Florence studied at MIT under advisor Russ Tedrake, a leading robotics researcher fascinated by 'physics' and the mechanics of complex dynamic systems. This influence shaped Florence's 'physics-centric' thinking, leading him to prioritize understanding the physical world and to adopt a goal-first approach. He focuses on setting clear objectives (like high-success-rate task completion) before designing the technical path, rather than starting with existing technology.

QWhat are the key capabilities of Generalist AI's GEN-1 model as described in the article?

AGeneralist AI's GEN-1 model achieves a 99% success rate on repetitive but delicate mechanical tasks like folding cardboard boxes, packing phones, and maintaining robot vacuums. It performs these tasks about three times faster than its predecessor, GEN-0, and can be adapted to new tasks in about an hour. It was trained on a massive dataset of over 500,000 hours of physical interaction data collected via a specialized wearable device.

QAccording to the article, why is Pete Florence's approach considered significant for the future of embodied AI and robotics?

APete Florence's approach is significant because it shifts the focus from chasing trending labels like 'World Models' to solving the core, practical problem: creating general-purpose robots that can reliably and quickly perform a vast range of unseen tasks in the real world without needing tailored data for each one. His work with GEN-1 demonstrates that scaling laws, similar to those behind large language models, can be applied to physical motion, bringing robots closer to commercial viability and the potential to reduce the marginal cost of physical labor to near zero.

Related Reads

Optical Chips: Collective Capacity Expansion

The global optical chip industry is experiencing a massive wave of expansion driven by surging AI data center demand. Major players across the US, Japan, Europe, and China are aggressively investing to ramp up production capacity. In the US, Coherent is expanding its 6-inch Indium Phosphide (InP) semiconductor fab in Texas, supported by CHIPS Act funding and a $2 billion strategic investment from NVIDIA. Lumentum is building a new factory for InP optical devices, and Nokia is scaling its advanced photonic chip packaging and testing capabilities. NVIDIA's investments aim to secure future supply of critical lasers and optical interconnect products for AI infrastructure. Japan's JX Advanced Metals, a leading InP substrate supplier, plans a multi-billion yen investment to increase its capacity 7-10 times, strengthening its grip on the crucial upstream materials market. In Europe, IQE and Tower Semiconductor settled a patent dispute and signed a multi-year InP epitaxial wafer supply agreement, highlighting that next-generation silicon photonics platforms will integrate high-performance InP components. STMicroelectronics and Sivers Semiconductors are also expanding silicon photonics production and partnerships. China is rapidly building out its domestic supply chain. Dongshan Precision's subsidiary, Source Photonics, announced a $12 billion project to expand optical chip and module production. Companies like Sanan Optoelectronics and Yunnan Germanium are scaling up InP chip manufacturing and substrate production, moving towards vertical integration from materials to modules. While debate continues around the exact future architecture—whether CPO (Co-Packaged Optics), NPO, or pluggables will dominate—analysts like Morgan Stanley argue the underlying driver is unchangeable: the explosive growth in bandwidth demand. This will inevitably increase the volume of optical engines, lasers, and related content per GPU, regardless of the final technical path. The competition for "more light" in the AI era has intensified into a global, full-chain capacity race.

marsbit2h ago

Optical Chips: Collective Capacity Expansion

marsbit2h ago

Stablecoins Finally Find Real Yield: An In-Depth Look at On-Chain Reinsurance Re | A Conversation with Re Founder Karan Saroya

Stablecoin Real Yield Found: A Deep Dive into On-Chain Reinsurance with Re's Karan Saroya As stablecoin supply exceeds $170 billion, the search for sustainable, non-speculative yield intensifies. Re, an on-chain reinsurance platform, provides an answer: connecting stablecoin capital to the trillion-dollar traditional reinsurance market. Re operates as a regulated reinsurer, accepting stablecoin deposits as collateral to back US insurance companies. These insurers pay premiums, generating yield that flows back to on-chain depositors. Currently supporting 35 insurers and underwriting $500 million, Re projects scaling to over $1 billion soon. Key insights from a Bankless podcast with founder Karan Saroya and investor Avichal of Electric Capital: 1. **Uncorrelated, Real-World Yield:** Re offers stablecoin holders access to reinsurance returns (targeting 12-14%+), an asset class entirely separate from crypto or equity markets. 2. **Operational Efficiency via Smart Contracts:** Re replaces traditional, labor-intensive capital fundraising with smart contracts, allowing a ~12-person team to compete with industry giants. 3. **Regulatory Leverage:** For every $1 of collateral, regulations allow backing $5-7 in written premiums. This leverage amplifies returns from the underlying risk-free rate. 4. **DeFi Integration:** Depositors receive receipt tokens, which can be used in protocols like Morpho for "looping," potentially pushing yields to 18-20%+. 5. **The "DeFi Mullet" Model:** A compliant front-end (regulated reinsurer) paired with a decentralized back-end (smart contracts, DeFi capital markets). 6. **RE Governance Token:** Modeled on Lloyd's of London, the token governs the central capital pool's allocation, counterparty acceptance, and parameters. 7. **Real Economic Impact:** Capital funds real-world productivity (factories, clinics, businesses) via insurance, moving beyond crypto's internal loops. The discussion highlights a pivotal moment: DeFi's supply-side infrastructure is now met by real demand for productive yield, potentially kickstarting a flywheel where vast on-chain stablecoin capital seeks these real-world returns.

链捕手3h ago

Stablecoins Finally Find Real Yield: An In-Depth Look at On-Chain Reinsurance Re | A Conversation with Re Founder Karan Saroya

链捕手3h ago

1996 or 1999? Walsh's First Test is 'How to View AI'

"1996 or 1999? Wall's First Big Test Is 'How to View AI'" Federal Reserve Chairman Wall's initial challenge is not whether to raise or cut rates, but a more fundamental judgment: what kind of boom is the current AI boom? This will determine the Fed's policy path and define his legacy. Economics is split between two opposing views, according to reporter Nick Timiraos. One sees imminent productivity gains that will increase supply and cool inflation, allowing the Fed to hold steady. The other argues that while productivity benefits are distant, demand shocks are here now, and waiting for data confirmation risks missing the intervention window, forcing sharper rate hikes later. Wall has signaled a leaning toward the first view, echoing 1996-era Alan Greenspan, who embraced strong, productivity-driven growth without fear of inflation. However, Wall faces a different macro environment than Greenspan did, with tariff pressures, expanding fiscal deficits, and diminishing globalization benefits, which could force more significant inflation pressures even if AI benefits materialize. Wall's logic, expressed before taking office, is that AI-driven productivity gains won't show in official data for years. If the Fed waits for confirmation, it might mistakenly tighten policy and choke off the very growth that could suppress inflation. This argues for using forward-looking narratives over lagging data. Chicago Fed President Austan Goolsbee presents a key counter-argument. He distinguishes between expected and unexpected productivity booms. A widely anticipated boom, like the current AI wave, can cause people to spend future wealth gains in advance, overheating the economy before productivity actually rises, thus requiring preemptive rate hikes. He cites rising costs for AI data centers as evidence of such overheating. Fed Governor Christopher Waller offers a rebuttal to Goolsbee, noting the "expected spending" mechanism only works if people can borrow against future income, which many households cannot do due to borrowing constraints. Wall also faces a paradox related to his desire to reduce the Fed's use of "forward guidance" (pre-announcing policy moves). This practice was established in 1999 when Greenspan began signaling hikes to avoid market shocks. If the economy follows a less optimistic path, Wall may be forced to choose between using the guidance he wants to abolish or risking market volatility by staying silent. The ultimate question defining Wall's first major test remains: Is this 1996 or 1999?

marsbit4h ago

1996 or 1999? Walsh's First Test is 'How to View AI'

marsbit4h ago

Ethereum Q1 2026 Report: Fees Decline, Users and Transaction Volume Hit New Highs

Ethereum Q1 2026 Report: Fees Down, Users & Transactions Hit New Highs Token Terminal's Q1 2026 report on Ethereum presents a pivotal development: the network achieved record highs in monthly active users (13.2M, +85.9% YoY), total transactions (200.4M, +81.5% YoY), and throughput (25.78 TPS), while transaction fees on the mainnet plummeted by 47.9% quarter-over-quarter. This shift is attributed to the network's strategic move into a "low fees for scale" phase, exemplified by the Fusaka upgrade which increased data capacity and lowered block space costs, releasing pent-up demand (a manifestation of Jevons's Paradox). The report highlights a core narrative shift for Ethereum: from a DeFi-centric blockchain to a global financial settlement layer. It maintains a dominant position in tokenized assets, holding majority market shares among top chains in stablecoins (61.8%), tokenized funds (73.0%), and tokenized commodities (84.0%). Growth in tokenized funds (+73.1% YoY) and commodities (+325.9% YoY) was particularly strong, driven by institutions like BlackRock and JPMorgan entering the space. Contrasting these usage gains, several USD-denominated value metrics declined in Q1: fully diluted market cap fell 30.3% QoQ, total value locked (TVL) dropped 11.0%, and ecosystem transaction volume decreased 24.0%. The report interprets this as Ethereum prioritizing long-term network expansion and cementing its role as the default settlement layer for finance over short-term fee capture. The commentary from Etherealize argues that, much like the early internet, Ethereum's open, permissionless model is poised to win over closed alternatives as institutional tokenization accelerates.

marsbit6h ago

Ethereum Q1 2026 Report: Fees Decline, Users and Transaction Volume Hit New Highs

marsbit6h ago

Trading

Spot
Futures
活动图片