In today's venture capital market, 'World Model' is undoubtedly a buzzword among buzzwords. We see almost daily news of new 'World Model' companies completing financing rounds, with valuations soaring rapidly and investor rosters filled with big names. Furthermore, in the press releases accompanying these funding announcements, a fact is often repeatedly emphasized: a qualified super-intelligent agent shouldn't rely solely on data feeding to gain capabilities; it should actively understand the physical world like humans do.
However, after starting his venture, Pete Florence wrote a lengthy open letter, beginning with this statement: "Do not label my company as a world model."
This is truly turning the tables. Because Pete Florence is far from just being an 'entrepreneur.' Before starting his company, Pete Florence worked at Google's DeepMind team, rising from a regular researcher to a senior research scientist. He was one of the core developers behind DeepMind's robotics control model Gemini Robotics, released in 2025. But perhaps his most influential achievement during this time was introducing, along with his colleagues in 2023, a novel robotics model architecture called "Vision-Language-Action Models" to the world.
(Pete Florence, Source: Social Media)
Yes, that's right. If 'World Model' or 'VLA' is currently the most cutting-edge and consensual direction, then Pete Florence is undoubtedly a pioneer on this path. For someone like him to take the lead in discarding the 'world model' label is truly shocking.
And now, the shock factor has doubled. Recently, the embodied AI company Generalist AI, founded by Pete Florence, completed a new funding round totaling $400 million (approximately 2.7 billion RMB), with a valuation of $2 billion (approximately 13.55 billion RMB). Investors in this round include Nvidia's NVentures, the NFDG fund co-managed by prominent angel investors Nat Friedman and Daniel Gross, Bezos's family office Bezos Expeditions, Xiaomi co-founder Bin Lin, Zoom founder Eric Yuan, and Li Fei-Fei, one of the most representative scientists in the world model field.
"Goals" Are More Important Than "Labels"
Why does Pete Florence, as one of the main architects of the world model concept, so strongly resist the 'world model' label? Why would Li Fei-Fei, as a leading scholar in the world model domain, use real money to support such a publicly 'heretical' maverick? The story might begin in 2019.
At that time, Pete Florence was pursuing his Ph.D. in Computer Science at MIT, focusing on robot manipulation, computer vision, and natural language processing—from this background, Florence was 'orthodox,' with a conventional research direction and academic pedigree, not a 'free spirit' who needed to rely on being 'unique' to secure resources. The twist, however, was that MIT assigned him a supervisor named Russ Tedrake.
Who is Russ Tedrake? First, he is undoubtedly a top academic. In 2019, he held positions as an MIT professor of Electrical Engineering and Computer Science and Director of the Robot Locomotion Group at the Computer Science and Artificial Intelligence Laboratory (CSAIL). During the famous DARPA Robotics Challenge, he also led the MIT team. Outside academia, he served as Vice President of Robotics Research at the Toyota Research Institute. In short, Russ Tedrake is one of the top scholars in robotics, with ample resources to help the young Pete Florence realize his academic dreams.
However, in Russ Tedrake's own perception, what fascinates him is not programming code, but 'physics.' In a self-introduction, Tedrake recalled that his entry into computer science academia stemmed from seeing 'rich dynamic behaviors' while studying 'bipedal humanoid robots,' sparking a deep interest in 'complex fluid dynamics control.' Therefore, unlike other researchers who might start by having robots pick apples or fold blankets, his initial research topics involved controlling 'aircraft in post-stall flight or flapping-wing flight' and 'high-speed navigation through dense obstacles.'
This background destined Russ Tedrake to place great importance on 'understanding the physical world.' The MIT website introduces his academic focus as: "His research is focused on finding elegant control solutions for interesting (underactuated, stochastic, and/or hard-to-model) dynamical systems, and on building those systems to experimentally verify his findings. He is particularly interested in the connection between mechanics (especially non-smooth mechanics) and machine learning/optimization theory, to achieve robust control design for complex mechanical systems."
Influenced by this environment, Pete Florence naturally became a 'physics faction' within computer science. For instance, his most representative doctoral work was a paper titled "Self-Supervised Correspondence in Vision and Robotic Manipulation." This paper proposed a method using imitation learning, allowing robots to complete challenging manipulation tasks with just 50 demonstrations, generalize to different object categories, and adapt to configurations of deformable objects. This paper consequently won the 2020 IEEE Robotics and Automation Society Best Paper Award.
Of course, belonging to any 'faction' isn't the key point; what matters is that Pete Florence developed a distinct way of thinking under this influence. Many researchers are accustomed to starting with existing technology, exploring its possibilities through experiments, and then determining its application scenarios. Pete Florence believes the correct order should be "first set a specific goal," then design the technological path.
After joining Google DeepMind, Pete Florence proceeded along this direction. His first major work was the initial robotics model architecture, Transporter Network, launched by Google in 2021. In the paper announcing the model, Florence stated that organizing items should be a fundamental skill, but for robots, completing this action involves "high-level and low-level perceptual reasoning," requiring consideration of where a book should be placed, the stacking order, and ensuring the edges align to form neat piles.
Transporter Network was precisely a model architecture launched to "make simple actions simple," enabling robots to perform various manipulations universally based on vision, with faster training speeds and lower dependency on training environments.
The subsequent co-development and release of the VLA architecture with the DeepMind team in 2023 was a natural extension of this line of thinking. In that seminal paper that ushered in the current 'world model' boom, the authors expressed their hope that the VLA architecture would "significantly enhance generalization to novel objects, interpret instructions not seen in the robot's training data (e.g., placing an object on a specific number or icon), and perform basic reasoning based on user instructions (e.g., picking up the smallest or largest object, or the one closest to another object)."
Returning to the opening question: Why does Pete Florence, as a main architect of world models, so strongly resist the label? The answer lies here: Pete Florence believes "goals" are more important than "labels."
In his view, much of the current enthusiasm around world models is actually 'concept-driven.' For instance, a significant portion of the excitement can be attributed to the capital market's thrill at discovering non-consensus within a hot trend. Moreover, if the goal is truly to integrate robots into our work and lives to create productivity, then building a 'world model' is clearly not the objective. The real goal should be robots achieving extremely high success rates and speeds on a wide variety of never-before-seen tasks, entirely without any task-specific data.
And this is precisely why Pete Florence decided to leave Google DeepMind to start his own company. At Nvidia's GTC conference in 2025, Pete Florence first appeared in the public eye as the co-founder and CEO of Generalist AI. He stated: "We are determined to build robots that can do anything... Imagine a world where the marginal cost of physical labor drops to zero."
99% Success Rate
Beyond his 'heretical' technical philosophy, Pete Florence's entrepreneurial journey also seems unconventional.
Theoretically, an entrepreneur with such a resume would undoubtedly be highly sought after by VCs in the current climate. Examples like Yann LeCun, Ilya Sutskever, and Mira Murati are proof—their companies raised over $1 billion in seed rounds upon founding (or even before registration). However, Generalist AI, at its initial stage, only accepted investments from a handful of institutions like Nvidia, Bezos's family office, and NFDG. If Nvidia's venture arm, NVentures, hadn't organized a 'portfolio company roundtable' at the 2025 GTC conference, few would have known he had left to start a company.
Why is this? The most likely answer is Pete Florence's active choice. As mentioned, Florence joined Google DeepMind right after graduation, working there from 2019 until 2025 without any other work experience. That means Generalist AI is his first entrepreneurial endeavor, warranting extreme caution.
Indeed, at his first public appearance as an entrepreneur at Nvidia's GTC 2025, Pete Florence clearly demonstrated this 'caution.' Beyond stating he was building 'robots,' he revealed no specific business direction, directly stating, "We are still in stealth mode."
It wasn't until November 2025 that people got their first glimpse of Generalist AI's specific work. In November 2025, Generalist AI released their first-generation embodied AI model, GEN-0. In the official introduction, Generalist AI stated that GEN-0 combines the strengths of vision and language models while achieving a breakthrough—Gen-0 captures human-level reflexes and physical common sense.
In simple terms, its capabilities improve continuously with model scale and training data, breaking through the limitations of previous small models; it can think and act simultaneously like humans, making fast, natural reactions in real physical environments; it is natively adaptable to different robot types without extra modifications; more importantly, it leverages massive real-world operational data, no longer constrained by data scarcity, and allows flexible adjustment of training data composition. Many tech media outlets pointed out that GEN-0 demonstrated that the mathematical 'scaling laws' driving large language models like ChatGPT also apply to physical motion.
However, GEN-0 was not perfect. For example, it did not solve the dataset problem plaguing the embodied AI field. Therefore, by April 2026, Generalist AI rapidly iterated to a new version, GEN-1.
("Robotic Hands," Source: Generalist AI Social Media)
To address the dataset issue, Generalist AI developed a wearable device to capture minute human hand movements and visual information during manual tasks. Generalist AI stated that during GEN-1's development, they collected over 500,000 hours of 'petabyte-scale physical interaction data' using these robotic hands to train their physical model. After sufficient training, Generalist AI claimed that GEN-1 achieved a success rate of 99% on repetitive but precise mechanical tasks like folding cardboard boxes, packing phones, and maintaining robot vacuum cleaners, with speeds about three times faster than the previous GEN-0 model, and requiring only about an hour to achieve this level of performance.
Thus, Generalist AI proudly announced that GEN-1's physical model is approaching an inflection point similar to GPT-3, with performance on some tasks beginning to "reach the level required for deployment in commercial utility environments," and "we can anticipate that each new generation of models will bring a growing suite of increasingly complex new tasks that can be mastered."
In the official blog, Pete Florence pointed out that the development process of GEN-1 is the best illustration of his personal technical philosophy: First, he set a rational goal—robots achieving extremely high success rates and speeds on a wide variety of never-before-seen tasks, entirely without task-specific data. Then, based on this goal, he designed a solution path that allows using a small amount of robot data for specific tasks (call it X) to achieve high-level execution, and continuously reduces X while improving performance.
Discussing this, we now have an answer to the earlier question. Whether Generalist AI's product is called a 'world model' is no longer important. If you look at the embodied AI industry and believe in the large-scale integration of robots into actual production, then Generalist AI is indeed a choice worth betting on. And Generalist AI's latest funding round was indeed swiftly finalized within two months of GEN-1's release.
According to reports, existing investors Nvidia, Bezos Expeditions, and NFDG all chose to reinvest, and at increased amounts. Additionally, new investors include Xiaomi co-founder Bin Lin, Zoom founder Eric Yuan, Chinese-American scientist Li Fei-Fei, as well as institutional investors like Radical Ventures, 8VC, Union Square Ventures, Hanabi Capital, and Norwest.
In other words, as of June 2026, Pete Florence no longer needs to prove himself. At the very least, the bold claims he made over the years—like when he said on a podcast in 2025, shortly after starting the company: "A generalist robot isn't about being mediocre at everything, but being professional enough on real tasks to be genuinely useful"—are well on their way to being "fulfilled one by one."
This article is from the WeChat public account "投中网" (Touzhong Wang), author: Pu Fan







