By Letters AI
Rumors suggest that ByteDance's video generation model Seedance 2.1 will be released soon, with its generation effect expected to improve by 20% compared to the 2.0 version. ByteDance told Letters AI that this is false information.
Although Seedance 2.1 may not be released in the near future, it is true that Seedance 2.0's popularity has surged overseas.
The reason is that over the weekend, an article titled "Chinese AI groups pull ahead of US rivals in video generation race" went viral overseas.
Using Seedance 2.0 and Kuaishou's Kling 3.0 as core evidence, the article reached a surprising conclusion: "In the field of AI video generation, China not only leads the United States, but this advantage will last forever."
This judgment sounds somewhat counter-intuitive; it seems more like flattery for Chinese AI. After all, over the past few years, the AI industry has always seen Silicon Valley launch a product first, followed by similar Chinese products, as we have all witnessed.
But after reading the foreign media's viewpoint, I realized that my thinking was indeed too one-sided. In Chinese AI video generation, it truly is leading the United States.
The article specifically interviewed several American AI entrepreneurs and filmmakers using AI video generation technology. The result was unanimous: everyone agrees that Chinese AI video tools have comprehensively surpassed their American counterparts.
More importantly, this lead is not a phased technological advantage but a comprehensive one, leading in every aspect from data to practical application.
Not only that, this lead is of the "unbeatable" kind. That is to say, this leading position will be maintained indefinitely.
Has "leading by a wide margin" become reality?
Why Will Chinese AI Forever Lead American AI?
One argument in the article is that in the field of AI video generation, the gap at the algorithm level is rapidly narrowing.
Currently, the technical architectures of various companies are already "more or less the same." Underlying technological paths like Transformer, diffusion models, and spatiotemporal attention mechanisms have become relatively transparent.
So the key question becomes: who possesses higher quality and larger quantities of training data?
This happens to be where ByteDance and Kuaishou excel. Douyin and Kuaishou are among the world's largest video production machines.
More importantly, this data comes with complete user behavior annotations.
Which videos are liked, favorited, shared; which have high completion rates—this data is all clear in the backend.
Moreover, these annotations do not require manual labeling; they are naturally generated from users' real behavior. This kind of high-quality, annotated data is something you might not be able to buy on the market even if you wanted to.
In contrast, OpenAI and Anthropic have no accumulation of video data.
When OpenAI launched Sora, it primarily relied on publicly crawled video data from the internet and some licensed film and television materials.
The problem is that public videos on the internet are often of mixed quality, containing a large amount of duplicate content, low-quality content, and even secondarily processed content with watermarks and advertisements.
Therefore, during the training process, it often results in more effort for less gain.
On the global evaluation platform Artificial Analysis, ByteDance's Seedance 2.0, Kuaishou's Kling 3.0, and Alibaba's HappyHorse together took the top spots in the text-to-video and image-to-video rankings.
This ranking is generated by real user votes, meaning that everyone generally finds the content generated by these three Chinese AI video tools to be better.
Although Google has YouTube as a data source and its own video generation model Veo 3,
Google's problem lies in having too many constraints. Videos on YouTube are generally over 5 minutes long, but current GPUs cannot yet accommodate such long, high-definition videos as training data, which can cause the model to fail during training.
This has led to a market reception for Veo 3 that has not been very good, falling short of Chinese AI video generation models like Seedance 2.0 and Kling 3.0.
"We've tried most American models, but they haven't performed well enough in video generation," said Ben Chiang, founder of Director AI. Therefore, he currently mainly uses Chinese tools like Kling, Seedance 2.0, and Halulu for creation.
Independent AI filmmaker George Won stated, "Seedance 2.0 is a game-changer. It can handle aggressive camera angles and speeds without losing facial details of characters or the contrast of light and shadow. Most AI models start to shake or drift during rapid movement."
Moreover, this data advantage can also enable products to undergo "self-reinforcement."
ByteDance has integrated Seedance 2.0 into creative tools like CapCut, allowing ByteDance to receive feedback data on over 50 million generated videos daily.
This way, ByteDance can know that "this video satisfied the user, this one did not."
Each piece of such feedback makes the development direction of the next-generation Seedance product a bit clearer.
This kind of continuous, large-scale feedback loop in real-world scenarios is also unmatched by the lab environments of companies like OpenAI and Anthropic.
Even with massive resource investment, it is difficult to establish a similar data flywheel in the short term.
Technology can be caught up with, algorithms can be imitated, but the accumulation of ecosystems and data takes time, requires a user base, and needs a complete product cycle.
Application Scenarios
For companies developing AI video, there must be a "purpose."
Data advantage is just the starting point; what truly turns technology into competitiveness is finding profitable application scenarios. With landing scenarios, companies have the motivation to develop AI video generation.
In this dimension, ByteDance and Kuaishou also outperform American AI.
The first large-scale application scenario is e-commerce video.
In the past, the cost of shooting a professional video for a product could be as high as several thousand yuan, including photographer, lighting technician, venue rental, model fees, post-production editing, etc.
For most small and medium-sized merchants, an ordinary Taobao store might have hundreds of products; filming them all would cost at least several hundred thousand yuan.
AI video generation technology has changed this situation.
Vincent Yang, CEO of video infrastructure company Firework, said, "A retailer asked us to create 100,000 videos for their product pages. Without AI, this would be completely unfeasible in terms of cost. Now, each product can have its own video, and even multiple customized versions for different customers."
Data shows that product pages with videos have a conversion rate 30% to 80% higher than those with only images and text. Moreover, Douyin and Kuaishou are among China's largest e-commerce live-streaming and short video sales platforms.
Once AI generates the video, you can turn right out the door and directly launch an advertising campaign.
Alibaba's HappyHorse model also explicitly positions e-commerce video as a core application scenario. It supports batch generation of product showcase short videos and virtual host talking videos. A merchant can upload product images and simple text descriptions, and the system can automatically generate multiple versions of sales videos, each targeting different audience groups with different scripts and presentation styles.
The second scenario is advertising.
The production cycle for traditional TVC (television commercial) is too long.
A 30-second brand advertisement often takes several weeks from creative planning to filming and production.
With video generation models, dozens of different versions of advertising creatives can be generated in just a few minutes.
The third scenario is short dramas.
AI short dramas experienced explosive growth in 2026. Data shows that the number of AI short dramas airing in March 2026 increased by 138% compared to January, far exceeding the production speed of traditional film and television content.
Through AI video generation, a small team or even an individual creator can produce a short drama within a few days.
Furthermore, ByteDance's Hongguo Short Drama platform has integrated an "image search for same items" feature.
This feature is easy to understand: while watching a short drama, if you are interested in a character's outfit, furniture in a scene, or a car parked at the door, you can directly click on image search. The system will recommend the same or similar items, allowing you to purchase them directly.
This essentially turns short dramas into a commercial scenario that can generate conversions.
In contrast, in the American market, despite having content platforms like Netflix and YouTube, there is no comparable application and conversion mechanism.
American AI video tools remain more in the creative experimentation stage, with the only commercial application scenario being subscription memberships.
Moreover, in terms of product functionality, Chinese video generation models are also more suitable for commercial application.
Seedance 2.0 can incorporate multiple source photos, videos, and sounds into the same AI video. Sora cannot do this; it can only generate videos by specifying an image and text to the model.
This is not because Sora's technology is insufficient, but because it lacks a complete commercial ecosystem to leverage these technological capabilities.
The Computing Power Gap
However, Chinese video AI also faces an unavoidable hurdle: computing power.
Leading American AI companies treat computing power as gold, hoarding all the computing power available on the market.
Anthropic recently signed computing power agreements totaling over 10 gigawatts.
This figure includes leasing all the computing power of SpaceX's Colossus 1 data center, covering 220,000 NVIDIA GPUs; a 5-gigawatt agreement with Amazon; and 3.5-gigawatt agreements with Google and Broadcom.
OpenAI operates similarly.
Through its deep collaboration with Microsoft, OpenAI has gained access to hundreds of thousands of high-end GPUs, and Microsoft has specifically built several hyperscale data centers for OpenAI.
In comparison, although Chinese companies have made significant progress in algorithm efficiency optimization, there is still a gap in the absolute scale of computing power.
According to foreign media statistics, the gap in AI computing power between China and the US was about 3 times in 2023 and had expanded to about 8 times by early 2026.
Besides computing power, Chinese AI faces other challenges.
The first is copyright.
Taking Seedance 2.0 as an example, about a month after its release, six Hollywood giants including Disney, Warner Bros., Paramount, Skydance, and Netflix jointly sent a cease-and-desist letter to ByteDance. They claimed that Seedance 2.0 had used copyright-protected film and television materials on a large scale without authorization during its training phase.
Subsequently, ByteDance urgently suspended the originally planned global release of Seedance 2.0 in mid-March.
If you have been using Seedance 2.0 from February until now, you will find that IP characters that could be generated before can no longer be used; instead, only "passerby" images can be used.
The second is that the commercialization threshold is rising.
American video generation AI, represented by Sora, often rejects generation requests due to usage policies. Chinese tools are more lenient, and their prices are also cheaper.
But this has also brought a "happy trouble" for Chinese AI companies.
Since February, Seedance 2.0 has seen a surge in usage demand, and some users have already encountered quota limits and longer queue times.
Foreign media reported that ByteDance has adopted a heavier commercialization approach for some American enterprise clients, requiring them to prepay approximately $2 million in exchange for model access rights and usage quotas.
Kuaishou is in a similar situation; they are spinning off the Kling business and may promote Kling for a separate listing in the future.
This indicates that Kling is an independent business with a potentially stronger growth story than Kuaishou's main entity.
The bigger the growth story, the clearer the accounting needs to be.
However, the cost of AI video is higher. The computing power consumed behind generating a few seconds of video for a user is far higher than generating a piece of text.
The higher the quality and the longer the duration of the generated video, the higher the inference cost.
Many video generation models are like this: initially very cheap, even free, but once users flood in, they quickly start implementing limits, queues, and price increases.
It's not that companies don't want to scale up; it's that the landlord doesn't have surplus grain either.
So what Chinese video AI needs to face next is not just "whether it can create a good model," but "whether it can turn a good model into a good business."
If the price is too low, the faster the user growth, the greater the losses; if the price is too high, there are no users, which defeats the purpose.
The third is the generational gap in model capabilities.
Ultimately, video generation capabilities are built upon language models.
No matter how powerful a video generation model is, it still needs language understanding capabilities as a foundation to understand user prompts. Then it uses reasoning capabilities to understand the logical relationships of scenes and characters and maintain coherence in the generated content.
According to foreign media assessments, OpenAI's ChatGPT 5.5 and Anthropic's Mythos have taken a lead of 9 months to 1 year over domestic AI companies.
This generational gap is reflected in multiple aspects, such as reasoning ability, contextual understanding, multi-turn dialogue, complex task handling, etc.
Although China leads American AI in vertical fields like AI video, a relatively noticeable gap can still be felt in general-purpose large models.
In summary, Chinese AI's lead in the field of video generation is real, but it is not without worries. The gap in computing power and foundational models is always a sword hanging overhead. But at least for now, we finally don't have to look up at the back of Silicon Valley anymore.








