Yushu's old rival, Zhiyuan, has spun off another company.
With Mifeng Technology announcing the completion of a hundreds of millions RMB Angel+ strategic financing round, this data company initiated by Zhiyuan has surfaced once again. Following the dexterous hand company Línjiè Diǎn, Zhiyuan has spun off another core capability as an independent company, embarking on a path of separate financing and operation.
When Zhiyuan is mentioned, many instinctively consider it Yushu's number one rival.
After all, in 2025 alone, Yushu's actual shipments of pure humanoid robots exceeded 5,500 units, claiming to be the world's top shipper; this March, Zhiyuan announced the official roll-off of its 10,000th general-purpose embodied AI robot.
From production scale to commercial deployment, the two are consistently compared side-by-side.
And this time, as one of Yushu's most direct competitors, Zhiyuan has extended its competitive chips beyond the robot platform itself.
Because the company Zhiyuan spun off, Mifeng Technology, is tackling one of the hottest businesses in embodied AI right now: data collection, governance, and circulation. Its stated goal is also grand, aiming to achieve a data production capacity of tens of millions of hours by 2026.
We hear a lot about foundation models, computing power, and hardware—terms closely tied to embodied AI. But many may not realize that the importance of 'data' within the embodied AI industry is rapidly rising.
Even Zhiyuan's co-founder, president, and CTO, Peng Zhihui, previously stated frankly that Zhiyuan doesn't lack funding; what it lacks more at present is data.
Behind Zhiyuan's data shortage lies a 'data famine' that the entire embodied AI industry is experiencing—a shortage that hasn't been widely noticed by most but is extremely urgent.
Something More Important Than Computing Power Is Emerging
In the era of embodied AI, the importance of data is approaching that of computing power in the era of large models.
Large models primarily learn from the internet world, while robots must learn from the physical world. The former can obtain training corpus from web pages, books, and papers; the latter must pick up cups, push open doors, and fold clothes to understand actions and feedback in the real environment.
Beyond visual information, what robots need includes multimodal information such as touch, force, and motion trajectories. For high-quality real-world data, each data point often corresponds to a real physical interaction.
According to estimates shared by Mifeng at its launch event, training a GPT-5 level system requires corpus on the order of tens of billions of hours, whereas globally, the amount of high-quality, effective data available for embodied AI training is only about 500,000 hours.
On another front, the '2026 AI Index Report' released by Stanford's HAI highlights two disparate results: the highest success rate for robots on the RLBench simulated manipulation benchmark reaches 89.4%; on the BEHAVIOR-1K simulation benchmark, which targets real household needs with more complex task chains, the highest full-task success rate is only 12.4%.
These results come from different benchmarks, but they at least indicate that robots are progressing rapidly in short-range, controlled tasks, while their capabilities remain significantly inadequate when facing complex household tasks.
Insufficient high-quality, diverse training data is one of the key reasons.
In other words, the capabilities shortfall of today's robots largely stems from having seen too little of the real world.
Thus, the emerging industry of embodied AI data collection is rapidly rising.
The most common method currently is real-world teleoperation, where a human operator remotely controls a robot to complete tasks, recording visual, action, and state information during execution. Data quality is relatively high, but cost is not low.
Mifeng CEO Yao Maoqing previously explained that the price for one hour of real-world robot data in China typically ranges from 500 to 1000 RMB, requiring coordination between the robot platform, operators, and scenarios, limiting expansion speed.
Another path is synthetic/simulated data. Companies use digital twins and physics engines to train robots on large numbers of tasks in virtual environments, which can reduce collection costs. However, skills learned in the virtual world may still not fully transfer to the real world, a long-standing challenge known as the 'Sim-to-Real gap'.
After data is collected, there are even more fundamental issues.
Different companies use different robot platforms, sensors, and data formats; the same grasping action might be recorded as completely different data structures. Large amounts of raw data must also undergo cleaning, annotation, and structuring before entering model training.
Therefore, many companies are still in the 'self-collect, self-use, self-train' stage, with data scattered across different companies and platforms.
As the importance of data rises, competition is extending from the robot platform to infrastructure such as collection, governance, and circulation.
But there's no unified estimate on exactly how much data the industry lacks. What's certain is that relying on a single company to collect and use its own data is unlikely to cover the complex scenarios a general-purpose robot needs to face.
Whoever can first establish a standardized, large-scale data supply network will have a better chance of becoming the 'pickaxe seller' in this round of industrial expansion.
Mifeng Technology is targeting precisely this opportunity.
Turning Data into a Platform
Of course, data collection is important, but Mifeng Technology aspires to more than that.
Currently, high-quality data collection in the industry still heavily relies on the robot platform. Companies need to purchase robots, deploy scenarios, organize operators, and then perform collection via teleoperation, with the robot platform being one of the costliest components.
Mifeng retains the real-world data solution while also launching its MEgo series of 'non-platform' collection products, including the MEgo View head-mounted collection device and the MEgo Gripper collection gripper.
After an operator wears or holds the device, they can record operational processes in real scenarios like supermarkets, factories, or homes without needing a robot to participate throughout the collection.
Compared to real-world teleoperation, 'non-platform' collection makes it easier to reduce costs and scale up. According to plans disclosed by Mifeng, 60% to 70% of its 2026 data production capacity will come from non-platform collection.
But collecting data is just the first step; whether it can be processed and enter the training phase largely determines its ultimate value.
Raw data often contains noise and invalid content, requiring processes like time alignment, trajectory reconstruction, annotation, and quality screening. Even if a company possesses vast amounts of raw data, it may not be directly convertible into an effective training dataset.
Therefore, Mifeng has devoted significant effort to the data governance phase.
Its self-developed MEgo Engine data governance engine covers processes like data cleaning, 6D trajectory reconstruction, spatial perception reconstruction, quality verification, intelligent scoring, and automatic annotation. According to Mifeng, its automated annotation efficiency can improve over 10x compared to traditional methods, aiming to get collected data into the training pipeline faster.
Beyond selling data, Mifeng also hopes to provide the capability to process raw data into training datasets.
At a higher level, Mifeng has also built a data marketplace, hoping to standardize and package scattered data resources, opening up supply to the entire industry.
This vision somewhat resembles early cloud computing: cloud vendors turned computing power into an on-demand service, while Mifeng hopes to make data a tradable, reusable foundational resource.
According to the company's plan, Mifeng will achieve a data production capacity of tens of millions of hours by 2026, and through its 'Hive Data Co-creation Initiative,' collaborate with cloud vendors, scenario providers, and industry institutions to target a scale of tens of billions of hours by 2030.
These are currently production capacity targets; whether they can be delivered on schedule depends on hardware mass production, collection networks, and real orders.
Even so, capital is already willing to pay for this vision.
In February this year, Mifeng Technology completed hundreds of millions in Seed and Angel round financing, led by Sequoia Capital China;
In June, it completed another hundreds of millions in Angel+ strategic financing round, led by Guofang Venture Capital, with follow-on investment from multiple industry and state-owned capital institutions;
Companies like Alibaba Cloud, Baidu Cloud, and JD Cloud have also established strategic partnerships with Mifeng, involving data ecosystems, scenario collaboration, and computing power support.
Thus, the two companies spun off from Zhiyuan, Línjiè Diǎn and Mifeng, now have their respective business directions:
Línjiè Diǎn targets the hardware component of dexterous hands, while Mifeng targets the data component of embodied AI.
However, independent financing and operation left Mifeng room to serve external clients but didn't automatically solve the trust issue with its peers.
Would Zhiyuan's Rivals Dare to Use Mifeng?
The first issue Mifeng must address is neutrality.
Its proposed 'Hive Data Co-creation Initiative' is an attempt to establish an industry-wide data network. But to get more robotics companies to participate, Mifeng needs to prove that clients' proprietary data will not flow to Zhiyuan, nor be improperly used by other competitors.
Yao Maoqing has publicly responded to this issue. He stated that Mifeng's data transactions are divided into two modes: 'usage rights' and 'ownership rights'; for clients purchasing ownership, the company will complete asset transfer and locally destroy the relevant data.
Even Zhiyuan's only avenue to obtain Mifeng data is through market-based orders; there's no free access, at least clarifying the principle of data isolation.
However, to get Zhiyuan's competitors to procure long-term, Mifeng will need to consistently prove its neutrality through agreements, permission isolation, delivery processes, and third-party audits.
After all, for Zhiyuan's rivals, Mifeng is not a 'must-choose' option; it's not the only company eyeing the data business either.
JD has launched the JoyEgoCam collection terminal, embodied data infrastructure, and a data trading platform, aiming to accumulate over 10 million hours of real-world scenario video data in the next two years.
Luming Robotics is also deploying non-platform collection; Lingchu Intelligence focuses on real human operation data; while Guanglun Intelligence concentrates on synthetic data and simulation infrastructure.
They are all competing over the same thing: turning scattered scenarios and raw data into datasets that can be continuously used for training.
Mifeng also faces the twin tests of scale and quality.
Tens of millions of hours is currently just a production capacity plan, not delivered data; both real-world and non-platform collection require continuous investment in equipment, personnel, and scenarios to scale up. If data quality and generalization issues aren't solved, even vast datasets might just be repetitive accumulation.
What ultimately determines whether Mifeng can achieve network effects is still the trust of its peers.
However, by having Zhiyuan spin off Mifeng for independent financing and operation, it at least secured room for this business to serve external clients.
If data remained solely within Zhiyuan, it would only enhance one company's model capabilities; only by being standardized, commercialized, and gaining recognition from other robotics manufacturers does it have a chance to become industry infrastructure.
Ultimately, for Mifeng, achieving tens of millions of hours of production capacity is just the threshold.
Only when Zhiyuan's competitors are also willing to procure from it long-term, even entrusting it with their core data for processing, will this business truly stand firm.
This article is from WeChat Official Account: Blue Character Project , Author: Chester







