He Just Raised 2.7 Billion, and Li Fei-Fei Also Invested

marsbitPublicado a 2026-06-20Actualizado a 2026-06-20

Resumen

Pete Florence, a former senior research scientist at Google DeepMind and a key contributor to the Vision-Language-Action (VLA) model architecture, is deliberately distancing his startup, Generalist AI, from the trendy "world model" label. He argues that the industry should prioritize concrete goals over buzzwords. His goal is to create robots that can perform a vast range of unseen tasks with high speed and success rates, without needing task-specific training data. Recently, his company raised $400 million (¥2.7 billion) at a $2 billion valuation. Notable investors include NVIDIA's NVentures, Bezos Expeditions, NFDG, as well as Xiaomi co-founder Lin Bin, Zoom founder Eric Yuan, and renowned AI scientist Fei-Fei Li. Florence's approach stems from his academic background at MIT under Professor Russ Tedrake, focusing on understanding the physical world. After joining DeepMind, he developed models like Transporter Network and co-created the VLA framework. He left in 2025 to found Generalist AI. The company has launched two models: GEN-0, which demonstrated that scaling laws apply to physical motion, and GEN-1. GEN-1 was trained on over 500,000 hours of physical interaction data collected via a specialized wearable device. It achieves a 99% success rate on precise mechanical tasks like folding boxes and maintains performance three times faster than its predecessor. Florence believes GEN-1 is reaching a commercial utility threshold similar to the GPT-3 inflection point. The su...

In today's venture capital market, 'World Model' is undoubtedly a buzzword among buzzwords. We see almost daily news of new 'World Model' companies completing financing rounds, with valuations soaring rapidly and investor rosters filled with big names. Furthermore, in the press releases accompanying these funding announcements, a fact is often repeatedly emphasized: a qualified super-intelligent agent shouldn't rely solely on data feeding to gain capabilities; it should actively understand the physical world like humans do.

However, after starting his venture, Pete Florence wrote a lengthy open letter, beginning with this statement: "Do not label my company as a world model."

This is truly turning the tables. Because Pete Florence is far from just being an 'entrepreneur.' Before starting his company, Pete Florence worked at Google's DeepMind team, rising from a regular researcher to a senior research scientist. He was one of the core developers behind DeepMind's robotics control model Gemini Robotics, released in 2025. But perhaps his most influential achievement during this time was introducing, along with his colleagues in 2023, a novel robotics model architecture called "Vision-Language-Action Models" to the world.

(Pete Florence, Source: Social Media)

Yes, that's right. If 'World Model' or 'VLA' is currently the most cutting-edge and consensual direction, then Pete Florence is undoubtedly a pioneer on this path. For someone like him to take the lead in discarding the 'world model' label is truly shocking.

And now, the shock factor has doubled. Recently, the embodied AI company Generalist AI, founded by Pete Florence, completed a new funding round totaling $400 million (approximately 2.7 billion RMB), with a valuation of $2 billion (approximately 13.55 billion RMB). Investors in this round include Nvidia's NVentures, the NFDG fund co-managed by prominent angel investors Nat Friedman and Daniel Gross, Bezos's family office Bezos Expeditions, Xiaomi co-founder Bin Lin, Zoom founder Eric Yuan, and Li Fei-Fei, one of the most representative scientists in the world model field.

"Goals" Are More Important Than "Labels"

Why does Pete Florence, as one of the main architects of the world model concept, so strongly resist the 'world model' label? Why would Li Fei-Fei, as a leading scholar in the world model domain, use real money to support such a publicly 'heretical' maverick? The story might begin in 2019.

At that time, Pete Florence was pursuing his Ph.D. in Computer Science at MIT, focusing on robot manipulation, computer vision, and natural language processing—from this background, Florence was 'orthodox,' with a conventional research direction and academic pedigree, not a 'free spirit' who needed to rely on being 'unique' to secure resources. The twist, however, was that MIT assigned him a supervisor named Russ Tedrake.

Who is Russ Tedrake? First, he is undoubtedly a top academic. In 2019, he held positions as an MIT professor of Electrical Engineering and Computer Science and Director of the Robot Locomotion Group at the Computer Science and Artificial Intelligence Laboratory (CSAIL). During the famous DARPA Robotics Challenge, he also led the MIT team. Outside academia, he served as Vice President of Robotics Research at the Toyota Research Institute. In short, Russ Tedrake is one of the top scholars in robotics, with ample resources to help the young Pete Florence realize his academic dreams.

However, in Russ Tedrake's own perception, what fascinates him is not programming code, but 'physics.' In a self-introduction, Tedrake recalled that his entry into computer science academia stemmed from seeing 'rich dynamic behaviors' while studying 'bipedal humanoid robots,' sparking a deep interest in 'complex fluid dynamics control.' Therefore, unlike other researchers who might start by having robots pick apples or fold blankets, his initial research topics involved controlling 'aircraft in post-stall flight or flapping-wing flight' and 'high-speed navigation through dense obstacles.'

This background destined Russ Tedrake to place great importance on 'understanding the physical world.' The MIT website introduces his academic focus as: "His research is focused on finding elegant control solutions for interesting (underactuated, stochastic, and/or hard-to-model) dynamical systems, and on building those systems to experimentally verify his findings. He is particularly interested in the connection between mechanics (especially non-smooth mechanics) and machine learning/optimization theory, to achieve robust control design for complex mechanical systems."

Influenced by this environment, Pete Florence naturally became a 'physics faction' within computer science. For instance, his most representative doctoral work was a paper titled "Self-Supervised Correspondence in Vision and Robotic Manipulation." This paper proposed a method using imitation learning, allowing robots to complete challenging manipulation tasks with just 50 demonstrations, generalize to different object categories, and adapt to configurations of deformable objects. This paper consequently won the 2020 IEEE Robotics and Automation Society Best Paper Award.

Of course, belonging to any 'faction' isn't the key point; what matters is that Pete Florence developed a distinct way of thinking under this influence. Many researchers are accustomed to starting with existing technology, exploring its possibilities through experiments, and then determining its application scenarios. Pete Florence believes the correct order should be "first set a specific goal," then design the technological path.

After joining Google DeepMind, Pete Florence proceeded along this direction. His first major work was the initial robotics model architecture, Transporter Network, launched by Google in 2021. In the paper announcing the model, Florence stated that organizing items should be a fundamental skill, but for robots, completing this action involves "high-level and low-level perceptual reasoning," requiring consideration of where a book should be placed, the stacking order, and ensuring the edges align to form neat piles.

Transporter Network was precisely a model architecture launched to "make simple actions simple," enabling robots to perform various manipulations universally based on vision, with faster training speeds and lower dependency on training environments.

The subsequent co-development and release of the VLA architecture with the DeepMind team in 2023 was a natural extension of this line of thinking. In that seminal paper that ushered in the current 'world model' boom, the authors expressed their hope that the VLA architecture would "significantly enhance generalization to novel objects, interpret instructions not seen in the robot's training data (e.g., placing an object on a specific number or icon), and perform basic reasoning based on user instructions (e.g., picking up the smallest or largest object, or the one closest to another object)."

Returning to the opening question: Why does Pete Florence, as a main architect of world models, so strongly resist the label? The answer lies here: Pete Florence believes "goals" are more important than "labels."

In his view, much of the current enthusiasm around world models is actually 'concept-driven.' For instance, a significant portion of the excitement can be attributed to the capital market's thrill at discovering non-consensus within a hot trend. Moreover, if the goal is truly to integrate robots into our work and lives to create productivity, then building a 'world model' is clearly not the objective. The real goal should be robots achieving extremely high success rates and speeds on a wide variety of never-before-seen tasks, entirely without any task-specific data.

And this is precisely why Pete Florence decided to leave Google DeepMind to start his own company. At Nvidia's GTC conference in 2025, Pete Florence first appeared in the public eye as the co-founder and CEO of Generalist AI. He stated: "We are determined to build robots that can do anything... Imagine a world where the marginal cost of physical labor drops to zero."

99% Success Rate

Beyond his 'heretical' technical philosophy, Pete Florence's entrepreneurial journey also seems unconventional.

Theoretically, an entrepreneur with such a resume would undoubtedly be highly sought after by VCs in the current climate. Examples like Yann LeCun, Ilya Sutskever, and Mira Murati are proof—their companies raised over $1 billion in seed rounds upon founding (or even before registration). However, Generalist AI, at its initial stage, only accepted investments from a handful of institutions like Nvidia, Bezos's family office, and NFDG. If Nvidia's venture arm, NVentures, hadn't organized a 'portfolio company roundtable' at the 2025 GTC conference, few would have known he had left to start a company.

Why is this? The most likely answer is Pete Florence's active choice. As mentioned, Florence joined Google DeepMind right after graduation, working there from 2019 until 2025 without any other work experience. That means Generalist AI is his first entrepreneurial endeavor, warranting extreme caution.

Indeed, at his first public appearance as an entrepreneur at Nvidia's GTC 2025, Pete Florence clearly demonstrated this 'caution.' Beyond stating he was building 'robots,' he revealed no specific business direction, directly stating, "We are still in stealth mode."

It wasn't until November 2025 that people got their first glimpse of Generalist AI's specific work. In November 2025, Generalist AI released their first-generation embodied AI model, GEN-0. In the official introduction, Generalist AI stated that GEN-0 combines the strengths of vision and language models while achieving a breakthrough—Gen-0 captures human-level reflexes and physical common sense.

In simple terms, its capabilities improve continuously with model scale and training data, breaking through the limitations of previous small models; it can think and act simultaneously like humans, making fast, natural reactions in real physical environments; it is natively adaptable to different robot types without extra modifications; more importantly, it leverages massive real-world operational data, no longer constrained by data scarcity, and allows flexible adjustment of training data composition. Many tech media outlets pointed out that GEN-0 demonstrated that the mathematical 'scaling laws' driving large language models like ChatGPT also apply to physical motion.

However, GEN-0 was not perfect. For example, it did not solve the dataset problem plaguing the embodied AI field. Therefore, by April 2026, Generalist AI rapidly iterated to a new version, GEN-1.

("Robotic Hands," Source: Generalist AI Social Media)

To address the dataset issue, Generalist AI developed a wearable device to capture minute human hand movements and visual information during manual tasks. Generalist AI stated that during GEN-1's development, they collected over 500,000 hours of 'petabyte-scale physical interaction data' using these robotic hands to train their physical model. After sufficient training, Generalist AI claimed that GEN-1 achieved a success rate of 99% on repetitive but precise mechanical tasks like folding cardboard boxes, packing phones, and maintaining robot vacuum cleaners, with speeds about three times faster than the previous GEN-0 model, and requiring only about an hour to achieve this level of performance.

Thus, Generalist AI proudly announced that GEN-1's physical model is approaching an inflection point similar to GPT-3, with performance on some tasks beginning to "reach the level required for deployment in commercial utility environments," and "we can anticipate that each new generation of models will bring a growing suite of increasingly complex new tasks that can be mastered."

In the official blog, Pete Florence pointed out that the development process of GEN-1 is the best illustration of his personal technical philosophy: First, he set a rational goal—robots achieving extremely high success rates and speeds on a wide variety of never-before-seen tasks, entirely without task-specific data. Then, based on this goal, he designed a solution path that allows using a small amount of robot data for specific tasks (call it X) to achieve high-level execution, and continuously reduces X while improving performance.

Discussing this, we now have an answer to the earlier question. Whether Generalist AI's product is called a 'world model' is no longer important. If you look at the embodied AI industry and believe in the large-scale integration of robots into actual production, then Generalist AI is indeed a choice worth betting on. And Generalist AI's latest funding round was indeed swiftly finalized within two months of GEN-1's release.

According to reports, existing investors Nvidia, Bezos Expeditions, and NFDG all chose to reinvest, and at increased amounts. Additionally, new investors include Xiaomi co-founder Bin Lin, Zoom founder Eric Yuan, Chinese-American scientist Li Fei-Fei, as well as institutional investors like Radical Ventures, 8VC, Union Square Ventures, Hanabi Capital, and Norwest.

In other words, as of June 2026, Pete Florence no longer needs to prove himself. At the very least, the bold claims he made over the years—like when he said on a podcast in 2025, shortly after starting the company: "A generalist robot isn't about being mediocre at everything, but being professional enough on real tasks to be genuinely useful"—are well on their way to being "fulfilled one by one."

This article is from the WeChat public account "投中网" (Touzhong Wang), author: Pu Fan

Preguntas relacionadas

QWho is Pete Florence and why did he write a letter rejecting the 'World Model' label for his company?

APete Florence is a former senior research scientist at Google DeepMind, a key developer of Gemini Robotics, and a pioneer of the Vision-Language-Action Models (VLA) architecture. He wrote a letter rejecting the 'World Model' label for his company, Generalist AI, because he believes focusing on a specific, tangible goal—creating robots that can perform any task with high success rates without task-specific data—is more important than aligning with popular but vague industry trends like 'World Models'.

QWhat was the recent funding round for Generalist AI, and which notable investors participated?

AGeneralist AI recently raised $4 billion in a new funding round, valuing the company at $20 billion. Notable investors included NVentures (Nvidia), NFDG (managed by Nat Friedman and Daniel Gross), Bezos Expeditions, Xiaomi co-founder Bin Lin, Zoom founder Eric Yuan, and renowned AI scientist Fei-Fei Li.

QHow did Pete Florence's academic background and his advisor Russ Tedrake influence his approach to robotics?

APete Florence studied at MIT under advisor Russ Tedrake, a leading robotics researcher fascinated by 'physics' and the mechanics of complex dynamic systems. This influence shaped Florence's 'physics-centric' thinking, leading him to prioritize understanding the physical world and to adopt a goal-first approach. He focuses on setting clear objectives (like high-success-rate task completion) before designing the technical path, rather than starting with existing technology.

QWhat are the key capabilities of Generalist AI's GEN-1 model as described in the article?

AGeneralist AI's GEN-1 model achieves a 99% success rate on repetitive but delicate mechanical tasks like folding cardboard boxes, packing phones, and maintaining robot vacuums. It performs these tasks about three times faster than its predecessor, GEN-0, and can be adapted to new tasks in about an hour. It was trained on a massive dataset of over 500,000 hours of physical interaction data collected via a specialized wearable device.

QAccording to the article, why is Pete Florence's approach considered significant for the future of embodied AI and robotics?

APete Florence's approach is significant because it shifts the focus from chasing trending labels like 'World Models' to solving the core, practical problem: creating general-purpose robots that can reliably and quickly perform a vast range of unseen tasks in the real world without needing tailored data for each one. His work with GEN-1 demonstrates that scaling laws, similar to those behind large language models, can be applied to physical motion, bringing robots closer to commercial viability and the potential to reduce the marginal cost of physical labor to near zero.

Lecturas Relacionadas

Ethereum Q1 2026 Report: Fees Decline, Users and Transaction Volume Hit New Highs

Ethereum Q1 2026 Report: Fees Down, Users & Transactions Hit New Highs Token Terminal's Q1 2026 report on Ethereum presents a pivotal development: the network achieved record highs in monthly active users (13.2M, +85.9% YoY), total transactions (200.4M, +81.5% YoY), and throughput (25.78 TPS), while transaction fees on the mainnet plummeted by 47.9% quarter-over-quarter. This shift is attributed to the network's strategic move into a "low fees for scale" phase, exemplified by the Fusaka upgrade which increased data capacity and lowered block space costs, releasing pent-up demand (a manifestation of Jevons's Paradox). The report highlights a core narrative shift for Ethereum: from a DeFi-centric blockchain to a global financial settlement layer. It maintains a dominant position in tokenized assets, holding majority market shares among top chains in stablecoins (61.8%), tokenized funds (73.0%), and tokenized commodities (84.0%). Growth in tokenized funds (+73.1% YoY) and commodities (+325.9% YoY) was particularly strong, driven by institutions like BlackRock and JPMorgan entering the space. Contrasting these usage gains, several USD-denominated value metrics declined in Q1: fully diluted market cap fell 30.3% QoQ, total value locked (TVL) dropped 11.0%, and ecosystem transaction volume decreased 24.0%. The report interprets this as Ethereum prioritizing long-term network expansion and cementing its role as the default settlement layer for finance over short-term fee capture. The commentary from Etherealize argues that, much like the early internet, Ethereum's open, permissionless model is poised to win over closed alternatives as institutional tokenization accelerates.

marsbitHace 1 hora(s)

Ethereum Q1 2026 Report: Fees Decline, Users and Transaction Volume Hit New Highs

marsbitHace 1 hora(s)

Intel CEO Liwu Chen's First Podcast Interview: Our Goal is '10x in 5-10 Years', Betting on Advanced Packaging, Glass Substrate, and Synthetic Diamond

Intel CEO Chen Liwu outlines a bold vision for the company's transformation in a podcast interview, targeting a "10x return in 5-10 years." He emphasizes a strategic shift beyond traditional process node scaling, focusing on advanced packaging (like EMIB), new materials (glass substrates, GaN, SiC, InP, synthetic diamond), and system-level integration. Key to this is Intel's foundry business, where he prioritizes building trust through superior yield, defect density, and cycle times. Chen sees strong CPU demand resurgence driven by agent AI and inference workloads. He details the Terafab collaboration with Elon Musk to address semiconductor infrastructure gaps and stresses the strategic importance of U.S.-based advanced manufacturing. Acknowledging Intel is still in the "crawl" phase of his crawl-walk-run framework, Chen believes the company's full potential—extending from its PC base into edge computing and physical AI—will become apparent by 2030-2032.

marsbitHace 1 hora(s)

Intel CEO Liwu Chen's First Podcast Interview: Our Goal is '10x in 5-10 Years', Betting on Advanced Packaging, Glass Substrate, and Synthetic Diamond

marsbitHace 1 hora(s)

Two Legends Lost in Three Days: Is Google's AI Talent Dam Cracking?

In three days, Google lost two AI legends. On June 18, Noam Shazeer, co-author of the seminal "Attention is All You Need" paper and Gemini co-lead, left for OpenAI. Just 48 hours later, John Jumper, 2024 Nobel laureate and AlphaFold lead, departed DeepMind for Anthropic. This follows Andrej Karpathy joining Anthropic in May. These moves highlight a structural trend: top AI talent is concentrating at mission-driven, pre-IPO firms like OpenAI and Anthropic, while Google becomes a primary source. The exodus stems from a core mission mismatch. Google's ad-centric model often subordinates AI research to product and revenue goals, creating friction for pioneers like Shazeer, who returned in 2024 only to leave again. In contrast, OpenAI and Anthropic offer singular focus on pushing AI boundaries, whether towards AGI or safety-aligned models, which deeply appeals to top researchers like Jumper. Financial incentives amplify the pull. With both OpenAI and Anthropic nearing IPO, employees stand to gain immensely from equity, an upside Google's mature stock cannot match. Furthermore, the 2023 merger of Google Brain and DeepMind, intended to consolidate strength, has instead created cultural tension and slowed the path from research to product, as evidenced by Gemini's pace. This talent redistribution is reshaping the AI landscape. While Google retains vast data and compute resources, its true crisis is the quiet, continuous loss of the people who define the field's future. The real moat in AI is not infrastructure, but the concentration of brilliant minds—a battle Google is currently losing.

marsbitHace 3 hora(s)

Two Legends Lost in Three Days: Is Google's AI Talent Dam Cracking?

marsbitHace 3 hora(s)

Behind the AI Report Card, Lies a Chinese 'Exam Setter'

Beyond the familiar performance charts like MMLU-Pro and MMMU, which major AI models strive to ace, stands a key "examiner": Chinese-Canadian researcher Wenhu Chen. An assistant professor at the University of Waterloo and founder of TIGERLab, Chen addresses the crucial need for more rigorous AI evaluation. As models like GPT-4 began scoring near-perfect results on older benchmarks like MMLU, it became difficult to distinguish their true capabilities. In response, Chen introduced MMLU-Pro in 2024, featuring harder, more reasoning-focused questions with more answer choices, successfully reintroducing meaningful performance gaps. His work extends to multi-modal evaluation with MMMU and its enhanced version, MMMU-Pro. These benchmarks test a model's ability to understand and reason with complex information from images, charts, and text across diverse academic subjects, exposing the significant challenges even top models face in genuine comprehension. Chen's background in complex QA, table reasoning, and his experience at Google DeepMind on projects like Gemini inform his approach. He understands that effective benchmarks must anticipate how models might "cheat" by memorizing data or avoiding visual analysis. His lab also actively researches video understanding and generation models (e.g., UniVideo, Vamba), ensuring his evaluation work is grounded in practical model-building challenges. Now at Meta's Super Intelligence Lab, Chen continues his focus on multi-modal data and evaluation, representing the deep yet often unseen contributions of Chinese talent in shaping the fundamental tools of the AI industry.

marsbitHace 3 hora(s)

Behind the AI Report Card, Lies a Chinese 'Exam Setter'

marsbitHace 3 hora(s)

Alliance Co-founder's Letter to Entrepreneurs: Written at the Moment Cursor Sold for $600 Billion

Alliance Co-founder's Letter to Entrepreneurs: On Cursor's $60 Billion Sale Many aspiring founders see massive exits like Cursor's $60B sale and wonder why they can't achieve the same, often concluding opportunities are exhausted. But great companies aren't built in obvious, crowded spaces. Cursor, like Stripe, Figma, and Shopify before it, started with a non-consensus belief about the future. Before ChatGPT, they believed AI would transform knowledge work. They focused on a genuinely exciting domain, became their own customer, and obsessed over power users. Their journey involved years of "glass-chewing" effort before the market was ready. The pattern is consistent: identify a long-term technological shift, find a missed entry point, and execute for years before the trend becomes obvious. First-generation products (PayPal, Adobe, Amazon) prove a market exists. Second-generation winners (Stripe, Figma, Shopify) rebuild that market around new insights, technology, or changing customer behaviors. Founders must identify their phase in the cycle. Early entrants like Coinbase or Cursor focus on making new technology usable for power users. Later entrants find the "yin" to the established "yang"—the blind spots incumbents miss as they grow distant from individual users. The key is deep market immersion. Use every product in your space. Talk to users. Build an audience. Stop looking for ideas and start *seeing* them everywhere. Then, choose one. The idea must offer a 10x improvement or solve a "hair-on-fire" pain point—something severe enough that users are already crafting workarounds. When building, avoid feature bloat. Ask: why would someone switch? Great startups rarely force new behaviors; they improve familiar workflows with drastically lower friction (e.g., Cursor forked VS Code instead of creating a new editor). Distribution is the underestimated moat. Before product-market fit, achieve distribution-market fit. How do customers discover new tools? Founders like those at Airbnb, Stripe, and Cursor did unscalable, manual work to recruit early users. The final, unteachable ingredient is resilience. Cursor built for years pre-market, faced rejection, and persisted. So did Airbnb, Nvidia, and Rain (which launched post-FTX collapse). The lesson isn't that these founders were smarter, but that they stayed in the game long enough for their insights to compound. Framework: Spot technological cycles. Cultivate unique insight. Obsess over your market. Talk to customers. Find a hair-on-fire problem. Build the simplest wedge. Win your distribution channel. Above all, don't quit when it gets hard. Most people won't do these things consistently. The few who do build the next generation of great companies. Go build.

marsbitHace 3 hora(s)

Alliance Co-founder's Letter to Entrepreneurs: Written at the Moment Cursor Sold for $600 Billion

marsbitHace 3 hora(s)

Trading

Spot

Futuros