How Did Its Biggest Rival, Yushu, Split Up Again?

marsbitPublished on 2026-06-23Last updated on 2026-06-23

Abstract

China's humanoid robotics firm Zhiyuan has spun off another company, Mbee Technology, focusing on data collection and processing for embodied AI. Amid a severe "data famine" in the embodied intelligence industry, high-quality real-world interaction data is seen as critical as computing power was for large language models. While teleoperated robots provide high-quality data, the process is expensive. Mbee aims to scale data collection through "embodiment-free" wearable devices and offers a data governance platform and marketplace. However, as a Zhiyuan affiliate, Mbee must prove its neutrality to attract competitors. It faces competition from other data platform players like JD.com. The success of its ambitious goal to reach tens of millions of hours of data capacity by 2026 hinges on gaining industry trust and demonstrating the value of its standardized data supply.

Yushu's old rival, Zhiyuan, has spun off another company.

With Mifeng Technology announcing the completion of a hundreds of millions RMB Angel+ strategic financing round, this data company initiated by Zhiyuan has surfaced once again. Following the dexterous hand company Línjiè Diǎn, Zhiyuan has spun off another core capability as an independent company, embarking on a path of separate financing and operation.

When Zhiyuan is mentioned, many instinctively consider it Yushu's number one rival.

After all, in 2025 alone, Yushu's actual shipments of pure humanoid robots exceeded 5,500 units, claiming to be the world's top shipper; this March, Zhiyuan announced the official roll-off of its 10,000th general-purpose embodied AI robot.

From production scale to commercial deployment, the two are consistently compared side-by-side.

And this time, as one of Yushu's most direct competitors, Zhiyuan has extended its competitive chips beyond the robot platform itself.

Because the company Zhiyuan spun off, Mifeng Technology, is tackling one of the hottest businesses in embodied AI right now: data collection, governance, and circulation. Its stated goal is also grand, aiming to achieve a data production capacity of tens of millions of hours by 2026.

We hear a lot about foundation models, computing power, and hardware—terms closely tied to embodied AI. But many may not realize that the importance of 'data' within the embodied AI industry is rapidly rising.

Even Zhiyuan's co-founder, president, and CTO, Peng Zhihui, previously stated frankly that Zhiyuan doesn't lack funding; what it lacks more at present is data.

Behind Zhiyuan's data shortage lies a 'data famine' that the entire embodied AI industry is experiencing—a shortage that hasn't been widely noticed by most but is extremely urgent.

Something More Important Than Computing Power Is Emerging

In the era of embodied AI, the importance of data is approaching that of computing power in the era of large models.

Large models primarily learn from the internet world, while robots must learn from the physical world. The former can obtain training corpus from web pages, books, and papers; the latter must pick up cups, push open doors, and fold clothes to understand actions and feedback in the real environment.

Beyond visual information, what robots need includes multimodal information such as touch, force, and motion trajectories. For high-quality real-world data, each data point often corresponds to a real physical interaction.

According to estimates shared by Mifeng at its launch event, training a GPT-5 level system requires corpus on the order of tens of billions of hours, whereas globally, the amount of high-quality, effective data available for embodied AI training is only about 500,000 hours.

On another front, the '2026 AI Index Report' released by Stanford's HAI highlights two disparate results: the highest success rate for robots on the RLBench simulated manipulation benchmark reaches 89.4%; on the BEHAVIOR-1K simulation benchmark, which targets real household needs with more complex task chains, the highest full-task success rate is only 12.4%.

These results come from different benchmarks, but they at least indicate that robots are progressing rapidly in short-range, controlled tasks, while their capabilities remain significantly inadequate when facing complex household tasks.

Insufficient high-quality, diverse training data is one of the key reasons.

In other words, the capabilities shortfall of today's robots largely stems from having seen too little of the real world.

Thus, the emerging industry of embodied AI data collection is rapidly rising.

The most common method currently is real-world teleoperation, where a human operator remotely controls a robot to complete tasks, recording visual, action, and state information during execution. Data quality is relatively high, but cost is not low.

Mifeng CEO Yao Maoqing previously explained that the price for one hour of real-world robot data in China typically ranges from 500 to 1000 RMB, requiring coordination between the robot platform, operators, and scenarios, limiting expansion speed.

Another path is synthetic/simulated data. Companies use digital twins and physics engines to train robots on large numbers of tasks in virtual environments, which can reduce collection costs. However, skills learned in the virtual world may still not fully transfer to the real world, a long-standing challenge known as the 'Sim-to-Real gap'.

After data is collected, there are even more fundamental issues.

Different companies use different robot platforms, sensors, and data formats; the same grasping action might be recorded as completely different data structures. Large amounts of raw data must also undergo cleaning, annotation, and structuring before entering model training.

Therefore, many companies are still in the 'self-collect, self-use, self-train' stage, with data scattered across different companies and platforms.

As the importance of data rises, competition is extending from the robot platform to infrastructure such as collection, governance, and circulation.

But there's no unified estimate on exactly how much data the industry lacks. What's certain is that relying on a single company to collect and use its own data is unlikely to cover the complex scenarios a general-purpose robot needs to face.

Whoever can first establish a standardized, large-scale data supply network will have a better chance of becoming the 'pickaxe seller' in this round of industrial expansion.

Mifeng Technology is targeting precisely this opportunity.

Turning Data into a Platform

Of course, data collection is important, but Mifeng Technology aspires to more than that.

Currently, high-quality data collection in the industry still heavily relies on the robot platform. Companies need to purchase robots, deploy scenarios, organize operators, and then perform collection via teleoperation, with the robot platform being one of the costliest components.

Mifeng retains the real-world data solution while also launching its MEgo series of 'non-platform' collection products, including the MEgo View head-mounted collection device and the MEgo Gripper collection gripper.

After an operator wears or holds the device, they can record operational processes in real scenarios like supermarkets, factories, or homes without needing a robot to participate throughout the collection.

Compared to real-world teleoperation, 'non-platform' collection makes it easier to reduce costs and scale up. According to plans disclosed by Mifeng, 60% to 70% of its 2026 data production capacity will come from non-platform collection.

But collecting data is just the first step; whether it can be processed and enter the training phase largely determines its ultimate value.

Raw data often contains noise and invalid content, requiring processes like time alignment, trajectory reconstruction, annotation, and quality screening. Even if a company possesses vast amounts of raw data, it may not be directly convertible into an effective training dataset.

Therefore, Mifeng has devoted significant effort to the data governance phase.

Its self-developed MEgo Engine data governance engine covers processes like data cleaning, 6D trajectory reconstruction, spatial perception reconstruction, quality verification, intelligent scoring, and automatic annotation. According to Mifeng, its automated annotation efficiency can improve over 10x compared to traditional methods, aiming to get collected data into the training pipeline faster.

Beyond selling data, Mifeng also hopes to provide the capability to process raw data into training datasets.

At a higher level, Mifeng has also built a data marketplace, hoping to standardize and package scattered data resources, opening up supply to the entire industry.

This vision somewhat resembles early cloud computing: cloud vendors turned computing power into an on-demand service, while Mifeng hopes to make data a tradable, reusable foundational resource.

According to the company's plan, Mifeng will achieve a data production capacity of tens of millions of hours by 2026, and through its 'Hive Data Co-creation Initiative,' collaborate with cloud vendors, scenario providers, and industry institutions to target a scale of tens of billions of hours by 2030.

These are currently production capacity targets; whether they can be delivered on schedule depends on hardware mass production, collection networks, and real orders.

Even so, capital is already willing to pay for this vision.

In February this year, Mifeng Technology completed hundreds of millions in Seed and Angel round financing, led by Sequoia Capital China;

In June, it completed another hundreds of millions in Angel+ strategic financing round, led by Guofang Venture Capital, with follow-on investment from multiple industry and state-owned capital institutions;

Companies like Alibaba Cloud, Baidu Cloud, and JD Cloud have also established strategic partnerships with Mifeng, involving data ecosystems, scenario collaboration, and computing power support.

Thus, the two companies spun off from Zhiyuan, Línjiè Diǎn and Mifeng, now have their respective business directions:

Línjiè Diǎn targets the hardware component of dexterous hands, while Mifeng targets the data component of embodied AI.

However, independent financing and operation left Mifeng room to serve external clients but didn't automatically solve the trust issue with its peers.

Would Zhiyuan's Rivals Dare to Use Mifeng?

The first issue Mifeng must address is neutrality.

Its proposed 'Hive Data Co-creation Initiative' is an attempt to establish an industry-wide data network. But to get more robotics companies to participate, Mifeng needs to prove that clients' proprietary data will not flow to Zhiyuan, nor be improperly used by other competitors.

Yao Maoqing has publicly responded to this issue. He stated that Mifeng's data transactions are divided into two modes: 'usage rights' and 'ownership rights'; for clients purchasing ownership, the company will complete asset transfer and locally destroy the relevant data.

Even Zhiyuan's only avenue to obtain Mifeng data is through market-based orders; there's no free access, at least clarifying the principle of data isolation.

However, to get Zhiyuan's competitors to procure long-term, Mifeng will need to consistently prove its neutrality through agreements, permission isolation, delivery processes, and third-party audits.

After all, for Zhiyuan's rivals, Mifeng is not a 'must-choose' option; it's not the only company eyeing the data business either.

JD has launched the JoyEgoCam collection terminal, embodied data infrastructure, and a data trading platform, aiming to accumulate over 10 million hours of real-world scenario video data in the next two years.

Luming Robotics is also deploying non-platform collection; Lingchu Intelligence focuses on real human operation data; while Guanglun Intelligence concentrates on synthetic data and simulation infrastructure.

They are all competing over the same thing: turning scattered scenarios and raw data into datasets that can be continuously used for training.

Mifeng also faces the twin tests of scale and quality.

Tens of millions of hours is currently just a production capacity plan, not delivered data; both real-world and non-platform collection require continuous investment in equipment, personnel, and scenarios to scale up. If data quality and generalization issues aren't solved, even vast datasets might just be repetitive accumulation.

What ultimately determines whether Mifeng can achieve network effects is still the trust of its peers.

However, by having Zhiyuan spin off Mifeng for independent financing and operation, it at least secured room for this business to serve external clients.

If data remained solely within Zhiyuan, it would only enhance one company's model capabilities; only by being standardized, commercialized, and gaining recognition from other robotics manufacturers does it have a chance to become industry infrastructure.

Ultimately, for Mifeng, achieving tens of millions of hours of production capacity is just the threshold.

Only when Zhiyuan's competitors are also willing to procure from it long-term, even entrusting it with their core data for processing, will this business truly stand firm.

This article is from WeChat Official Account: Blue Character Project , Author: Chester

Trending Cryptos

Related Questions

QWhat is the main business of Mi Feng Technology, the company recently spun off from Zhi Yuan?

AMi Feng Technology focuses on the data collection, governance, and circulation for embodied AI. It provides services including hardware like the MEgo series for non-body data collection, a data governance engine, and a data marketplace platform.

QWhy is data becoming increasingly critical in the field of embodied intelligence?

AEmbodied AI robots must learn from interactions with the physical world, requiring multi-modal data (visual, tactile, force, motion). High-quality, diverse data for such training is currently extremely scarce, creating a 'data famine'. The lack of such data limits robots' ability to handle complex, real-world tasks.

QWhat are the two main methods for acquiring embodied AI training data mentioned in the article?

AThe two main methods are: 1. Physical robot teleoperation, where a human remotely controls a robot to collect high-quality but costly data. 2. Simulation data, generated in virtual environments, which is cheaper but faces challenges in bridging the 'Sim-to-Real gap' for effective real-world transfer.

QWhat is the key challenge Mi Feng Technology faces in attracting customers from its parent company Zhi Yuan's competitors?

AThe key challenge is proving its neutrality. Competitors must be assured that their proprietary data, when handled or purchased from Mi Feng, is completely isolated and will not be accessed or used by Zhi Yuan. Mi Feng claims to use commercial transactions and data deletion protocols to address this.

QBesides Mi Feng Technology, what other companies are mentioned as competitors in the embodied AI data infrastructure space?

AOther companies mentioned include JD.com (with JoyEgoCam and data infrastructure), Lu Ming Robotics, Ling Chu Intelligence (focusing on human operation data), and Guang Lun Intelligence (focusing on synthetic data and simulation infrastructure).

Related Reads

Research Report Analysis: Morgan Stanley Details SanDisk SNDK, The Truth About Cloud Data Center Pricing Power and AI Inference Benefits

Morgan Stanley raised its price target for SanDisk (SNDK) from $1100 to $1750 on June 22, maintaining an Overweight rating. The upgrade is driven by AI inference demand reshaping the NAND market, particularly for KV Cache and context window storage in cloud data centers. These cloud clients exhibit price inelasticity and sign long-term contracts, granting SanDisk significant pricing power. SanDisk's New Business Model (NBM) agreements, covering over one-third of FY27 bit shipments with 3-5 year terms and fixed price/price collar structures, are crucial. They are projected to sustain gross margins around 80% even at floor prices, providing a buffer against cyclical downturns. Morgan Stanley forecasts gross margins to surge from 30.3% in FY25 to 86.7% in FY27e. With NAND supply expected to remain tight into 2026/2027 and cloud/data centers becoming the largest end-market, SanDisk holds supply-side pricing power. The company targets 15-19% bit growth via technology transitions, not capacity expansion. Revenue is projected to grow ~6.6x from FY25 to FY27, with EPS rising from $2.74 to $14.73, driven by high-margin cloud business. Key upside catalysts include faster enterprise SSD adoption and edge AI growth. Downside risks involve slower industry growth, competitor capex increases, market share loss, and competition from Chinese players like YMTC. The investment thesis rests on AI-driven structural demand, NBM's margin protection, and sustained supply tightness. The $1750 target implies ~28x FY27e P/E.

marsbit10m ago

Research Report Analysis: Morgan Stanley Details SanDisk SNDK, The Truth About Cloud Data Center Pricing Power and AI Inference Benefits

marsbit10m ago

A Threefold Performance Leap! NEAR Achieves 200ms Physical Block Time Limit with SPICE

NEAR's core development team, Near One, has announced its next major protocol evolution: SPICE (Separation of Consensus and Execution). Currently in development, SPICE represents the most significant upgrade before the full implementation of Nightshade 3.0. Its core innovation is decoupling the consensus layer, responsible for ordering transactions, from the execution layer, which processes them. This allows the consensus layer to run at full speed without waiting for transaction execution to complete. Once deployed, SPICE is projected to triple NEAR's block production speed, achieving a 200ms block time, which is considered the physical limit due to the speed of light and network latency. This leap will dramatically reduce transaction latency and finality, with transactions confirming in roughly 0.4 seconds—faster than a typical card payment. The upgrade also enables more complex, long-running transactions and significantly improves user experience for applications like NEAR Intents and near.com. Beyond raw speed, SPICE enhances network scalability and security. It enables deeper parallelism, efficiently distributing workload across shards and improving resource utilization. The simpler block structure and lighter contracts also facilitate formal verification and security auditing. Furthermore, SPICE lays the critical groundwork for future Nightshade 3.0 features, most notably atomic cross-shard transactions, which would simplify complex contract logic and eliminate development hurdles caused by asynchronous execution. The Near One team is actively developing SPICE, targeting deployment in the coming months.

Foresight News1h ago

A Threefold Performance Leap! NEAR Achieves 200ms Physical Block Time Limit with SPICE

Foresight News1h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.

活动图片