The Creator of Kling Returns to Alibaba and Builds Another Dark Horse

marsbitPublished on 2026-04-13Last updated on 2026-04-13

Abstract

The article discusses the rise of HappyHorse-1.0, an AI video generation model developed by Alibaba, which topped the Artificial Analysis leaderboard in both text-to-video and image-to-video categories in April 2026. The model was created under the leadership of Zhang Di, who returned to Alibaba in November 2025 after working at Kuaishou, where he led the development of the Kling model. HappyHorse is open-source and commercially available, similar to Alibaba's Qwen model. Zhang Di's background includes extensive experience in large-scale data systems and machine learning at Alibaba and Kuaishou, which contributed to the rapid development of HappyHorse within just five months. The model uses a 15-billion-parameter transformer architecture with native multimodal training, supporting multiple languages and lip-sync capabilities. It also focuses on reducing inference time and cost, making it practical for commercial use. The primary application of HappyHorse is in e-commerce, where it can generate product videos to enhance user engagement and conversion rates by creating contextual and personalized content. This aligns with Alibaba's strengths in commerce, advertising, and data feedback loops. The model's success with open-source approach contrasts with challenges faced by closed-source models like OpenAI's Sora (shut down due to high costs) and ByteDance's Seedance 2.0 (paused over copyright issues). HappyHorse represents a strategic move for Alibaba to integrate AI video gene...

By | Alphabet AI

The AI video track has been a bit cold lately. Seedance 2.0 is embroiled in copyright disputes, and OpenAI shut down Sora, casting a shadow over this field.

Right at this moment, Alibaba brought out a dark horse.

In April 2026, HappyHorse-1.0 surged to the top of the Artificial Analysis leaderboard, outperforming rivals like ByteDance and Kuaishou in both text-to-video and image-to-video (without audio) tracks.

Zhang Di returned to Alibaba in November 2025, taking up the position of Head of Taotian Group's Future Life Laboratory and reporting directly to Zheng Bo, CTO of Alimama.

This means that from Zhang Di's return to making a name for himself, only about 5 months passed.

The key point is that, like Alibaba's Qwen, HappyHorse released a commercially usable open-source version.

What is Qwen's status in Alibaba now? It is Alibaba Group's core general-purpose large model foundation, the absolute core carrier of its AI strategy. Everything Alibaba does now is centered around Qwen.

Therefore, the significance of HappyHorse to Alibaba might be far more than just a model that tops leaderboards to show off technology.

However, before understanding Alibaba's intentions, we should first talk about who Zhang Di is.

01 From Alibaba to Kuaishou and Back to Alibaba

Zhang Di graduated from Shanghai Jiao Tong University with a degree in Computer Science, completing a combined bachelor's and master's program. After graduating in 2010, he joined Alibaba, where he was long responsible for Alimama's big data and machine learning engineering architecture.

Alimama focuses on advertising, recommendation, search, and conversion, which involve large-scale data, massive distribution, and complex engineering systems. These things might not sound as exciting as large models, but they are precisely the places that later trained AI talent for Chinese internet companies.

Many people who can truly turn models into products did not purely come from laboratories. They earlier underwent training in systems like search, recommendation, advertising, and content distribution.

Let me give you a few examples to make this clear. Google CEO Sundar Pichai started out working on the search bar and YouTube content recommendations. Microsoft CEO Satya Nadella initially developed the Bing search engine and Microsoft's advertising system at Microsoft.

Because these systems process vast amounts of user behavior daily and require models to run stably in real business scenarios. They don't allow engineers to create just a flashy demo; they force you to build something truly useful, while constantly balancing latency, cost, effectiveness, and feedback.

Zhang Di's ten years at Alibaba were largely spent in such an environment. At that time, the outside world hadn't yet started calling everything large models, but Alibaba internally already had a training ground centered around data, algorithms, and engineering.

In 2020, Zhang Di left Alibaba for Kuaishou.

At that time, short video platforms had already moved from traffic competition to technology competition. Zhang Di served at Kuaishou as Vice President of Technology, Head of the Large Model and Multimedia Technology Team, and later led the underlying architecture R&D and application deployment of the Kling large model.

Kling's significance to Kuaishou is very substantial.

Kling enabled Kuaishou to upgrade from a past "content distribution platform" to a "content production infrastructure provider," building a complete closed loop of "creative generation - video production - one-click distribution - traffic monetization - data iteration."

In April 2025, Kuaishou established the Kling AI Division and upgraded it to a first-level department reporting directly to CEO Cheng Yixiao, on par with the main short video business.

Therefore, when he briefly joined Bilibili in September 2025 and returned to Alibaba two months later, this move could hardly be seen as just ordinary talent mobility.

Bilibili needs video technology, and Alibaba also needs video technology, but Alibaba's needs are more complex.

For Kuaishou, doing video generation is essentially about distribution. But if Alibaba does video generation, it involves many more linked aspects: e-commerce, advertising, live streaming, cloud services, and overseas merchants.

As mentioned earlier, after returning to Alibaba in November 2025, Zhang Di took up the position of Head of Taotian Group's "Future Life Laboratory" at level P11.

Arranged this way, it still has a strong Alibaba flavor. It didn't simply place the video model in a pure research department; instead, its position is closer to Taotian, a transaction scene.

In other words, from its conception, HappyHorse was a product emphasizing deployment and bound to Alibaba's existing ecosystem.

Five months later, HappyHorse appeared.

This speed was indeed fast. Alibaba gave Zhang Di a new business scenario and team, and he once again打通 (opened up) the video model route.

He neither started from scratch in AI video nor simply parachuted into Alibaba from outside.

His career path is like a line that went out and came back. He learned how large-scale commercial systems operate at Alibaba, then went to Kuaishou to turn video generation into a product, and then returned to Alibaba to integrate this capability into a larger commercial machine.

Many companies are scrambling for large model talent, but the scarce individuals are often those who can simultaneously understand models, business, and organization.

There are many people who solely know how to train models, and many who solely know how to talk strategy. The difficulty lies in finding someone who knows where every step from the technical route, to architecture design, to training and inference, to the product outlet, and finally to being used by merchants and users, might get stuck.

HappyHorse pushed Zhang Di back into the spotlight and also gave Alibaba's relatively dispersed AI narrative over the past few years a more concrete entry point through a person.

02 How an Open-Source Model Defeated Closed-Source Giants

The point that truly drew attention to HappyHorse is that it won too suddenly.

On the video generation track, overseas there are Runway, Pika, Luma, Google's Veo; domestically there are ByteDance's Seedance and Kuaishou's Kling. Alibaba wasn't even on the list.

So when HappyHorse first topped the charts, people were even more willing to believe it was a model developed by some startup than to believe it was an Alibaba model.

HappyHorse is in the first tier in both text-to-video and image-to-video tracks, with an Elo rating of 1333 for text-to-video and 1392 for image-to-video.

The Artificial Analysis leaderboard itself changes with user blind tests, and the scores on subsequent pages have been updated, but it indeed outperformed a group of earlier famous closed-source models in user preference tests.

This is actually quite unusual. Generally speaking, video generation is one of the directions that consumes the most money, data, and computing power.

Closed-source large companies can hide data, model details, inference systems, and product experience within their platforms, continuously iterating internally.

Open-source models face more practical limitations. Their parameters must be public, inference must be runnable, the community must be able to reproduce them, and the effects must withstand横向比较 (horizontal comparison).

So before HappyHorse appeared, most open-source video models were toys. The output videos weren't stable enough, and characters often experienced drift.

HappyHorse has 15 billion parameters, a 40-layer unified self-attention Transformer architecture, and jointly models text, video, and audio三种模态 (three modalities) tokens放入 (placed into) the same sequence.

This approach is very similar to Qwen, which also explains why Zhang Di managed to produce HappyHorse in just 5 months—it reused the high-quality native multimodal training methods left by Qwen.

Non-native multimodal video generation models like Sora often experience issues like characters' mouths moving but the sound being half a beat late. Sometimes character expressions are rich, but the tone is wrong. Characters might also move before the sound is emitted.

The reason for HappyHorse's high rating lies in its solution to this problem through native multimodality.

HappyHorse natively supports lip synchronization for multiple languages including English, Mandarin, Cantonese, Japanese, Korean, German, French, etc. Its word error rate is also compared with同类开源模型 (similar open-source models).

Why did Zhang Di do this? My understanding is that if Alibaba wants video generation technology to enter advertising, e-commerce, short dramas, education, and live streaming, it cannot rely solely on pretty pictures.

It must be able to speak, to dub, to make sound and picture成立 (hold true) simultaneously.

Another key point is cost and speed.

HappyHorse takes about 38 seconds to generate a 5-second 1080p video on a single H100 GPU and uses DMD-2 distillation technology to compress the denoising steps to 8.

This is an unavoidable hurdle for the commercialization of video generation. No matter how good the model effect is, if generating a short video costs too much or takes too long, it's hard to enter merchants' daily workflows.

Merchants won't wait half a day for each product, nor will they pay过高成本 (excessive costs) for dozens of test materials.

So the significance of HappyHorse is not just "able to generate," but also its attempt to push generation speed and inference costs into the usable range.

For developers, open source means they can self-host, fine-tune, and integrate it into their own products. For platforms, open source also brings more community feedback.

The progress of a closed-source model mainly relies on the internal team of the company. An open-source model will be subjected to各种奇怪测试 (various strange tests) by developers. Problems are exposed quickly, and improvement directions increase.

The Artificial Analysis video arena uses user preference voting. Often, it doesn't just look at a single technical indicator but rather at which of two videos users prefer.

Of course, Zhang Di cannot be too proud yet; topping the chart once does not mean leading forever.

Competitors won't stand still. HappyHorse has now only won one public test, not the entire war.

If HappyHorse were just a model that can刷榜 (top leaderboards), its significance would be limited. But if it can become the video generation foundation commonly used by Alibaba Cloud and Taotian's business, it will become an entry point.

Therefore, the most interesting part of HappyHorse defeating closed-source giants is not just the领先分数 (leading scores). What is truly worth paying attention to is that it allowed Alibaba to find a way to re-enter the video generation game.

It didn't first make a C-end user APP, nor did it only do internal demos. Instead, it directly subjected the open-source model to industry-wide scrutiny.

This victory might not last long, but Zhang Di changed the external perception of Alibaba's capabilities in video generation models.

The new question becomes: where does Alibaba plan to use this capability?

03 The Significance of HappyHorse for Alibaba

The most direct landing point for HappyHorse is e-commerce.

In the past, when people talked about AI video, they最容易想到 (most easily thought of) film, television, short dramas, advertising blockbusters, and creator tools. Admittedly, these are all substantial markets, but they are still some distance from Alibaba's main business.

Alibaba's strength does not lie in building a video community itself, nor in having ordinary users open an AI video APP daily to kill time. Alibaba's real advantage is that it holds China's most concentrated collection of商品 (products),商家 (merchants),交易 (transactions), and广告系统 (advertising systems).

This is also why many people care that HappyHorse was born in Taotian Group's "Future Life Laboratory."

Taotian deals daily with how merchants sell goods, how products are seen, why users click in, and why they place orders. Placing HappyHorse here naturally makes people think: can it improve商品内容生产 (product content production) efficiency? Can it improve conversion? Can it help the platform do more business?

For an ordinary merchant, video content has always been a hassle.

To shoot a 30-second product video, you need to find a scene, find a模特 (model), set up lighting, edit, and dub. Big brands can hire teams, but small and medium-sized merchants often have to make do on their own.

Many product selling points are not complicated; the problem is that no one films them. They look very ordinary against a white background, but once placed in a specific scene, users realize what they can be used for.

Recently overseas, the solar fountain pump product sold out. It was originally just a small garden item, and the effect was so-so. But after being packaged by AI video as a bird bath, fish pond, and children's bathtub with cool water-spraying toys, everyone went crazy for it.

AI didn't change the product itself, but it changed the way users understand the product. It turned "functional description" into "usage scenario."

This正好击中 (hits right at) the pain point of e-commerce content.

Product pages filled with parameters未必有耐心看 (users may not have the patience to read); hosts talking for a long time未必相信 (users may not believe). But a十几秒 (ten-plus second) video, if it can clearly explain the scene, the conversion efficiency might be much higher.

More importantly, AI videos can be generated in batches. Merchants can generate children's version, family version, holiday version, outdoor version for the same product, or generate different languages, different characters, and different scenes for different countries.

This is more significant for Alibaba than simply making a video generation tool. Whether it's Taobao or Tmall,上面都有大量商家 (there are a large number of merchants on them), and also a large amount of product data and transaction feedback.

If an AI video tool only knows how to generate pretty pictures, it will quickly become a material software. If it can know in what场景 (scenes) this product is more likely to be clicked, what copy is more likely to lead to add-to-cart, what video's first few seconds are more likely to retain users, it will approach being part of an e-commerce operating system.

What Alibaba has more than other video generation model companies is precisely this反馈闭环 (feedback loop).

Product images, detail pages, reviews, Q&A, search terms, click-through rates, add-to-cart rates, refund reasons, live stream dwell time—these things seem fragmented but are all fuel for training e-commerce content capabilities.

If HappyHorse接入 (connects to) these feedbacks, it can evolve from "helping merchants generate a video" to "helping merchants generate videos more likely to sell goods."

面向淘天 (Facing Taotian), it can handle主图视频 (main image videos),商品场景短片 (product scenario short films),直播切片 (live stream clips),虚拟主播 (virtual hosts), and营销素材 (marketing materials).

In the past, when a merchant launched a new product, they might only upload a few pictures, at most shooting a rough short video. In the future, they can give the product images, selling points, reviews, and audience tags to the system, let the system generate multiple different versions of videos, and then use real投放 (placement) and成交数据 (transaction data) to筛选出 (filter out) the more effective one.

If this process runs smoothly, platform content supply will increase significantly, and the content threshold for small and medium-sized merchants will also decrease.

However, AI video带货 (product placement) also has risks. It can amplify selling points but can also amplify illusions. A fountain pump喷得很高 (sprays very high) in an AI video might not achieve that effect in reality.

Alibaba's opportunity should not be to allow merchants to use AI to create dreams. The focus should be on product parameters,实拍素材 (real-shot materials), buyer reviews, and platform auditing (platform auditing) to keep generated content within boundaries.

In late March, OpenAI announced the shutdown of the Sora standalone application and related APIs. The reason is realistic: video generation burns too much money, user retention cannot support the cost, and OpenAI needs to put computing power back into coding, enterprise services, and robotics directions.

Sora fell on the commercial account.

ByteDance also encountered trouble on another front. Although Seedance 2.0's effects are also fierce, due to copyright issues, ByteDance paused the global release of Seedance 2.0.

The stronger the model is trained, the easier it is to step into the mire of copyright,肖像权 (portrait rights), and training data.

Looking back at HappyHorse, led by Zhang Di, it has a clear commercial scenario. Moreover, the product images, merchant materials, real-shot videos, and transaction feedback in Alibaba's hands are naturally more suitable for controlled generation than影视IP (film and TV IP).

Therefore, the value of HappyHorse is not only in the leaderboard. It found a more stable landing point for AI video.

Related Questions

QWho is Zhang Di and what is his role in the development of HappyHorse?

AZhang Di is a computer science graduate from Shanghai Jiao Tong University who worked at Alibaba for a decade, later joining Kuaishou to lead the development of the Kling model. He returned to Alibaba in November 2025 as the head of Taotian Group's Future Life Laboratory, where he led the development of the HappyHorse video generation model.

QWhat makes HappyHorse stand out in the AI video generation field according to the article?

AHappyHorse stands out because it topped the Artificial Analysis leaderboard in both text-to-video and image-to-video categories, uses a native multimodal architecture for better lip-sync and audio-video alignment, and is open-source with commercial use allowed, enabling faster iteration and community feedback.

QHow does HappyHorse address the challenges of cost and speed in video generation?

AHappyHorse reduces generation time and cost by requiring about 38 seconds to generate a 5-second 1080p video on a single H100 GPU and employing DMD-2 distillation technology to compress the denoising steps to just 8 steps, making it more feasible for commercial use.

QWhat is the primary application scenario for HappyHorse within Alibaba's ecosystem?

AThe primary application for HappyHorse is in e-commerce, particularly within Taotian Group, where it is used to generate product videos, marketing materials, virtual hosts, and live stream clips to enhance product presentation, improve conversion rates, and lower content production barriers for merchants.

QWhat are some risks associated with AI-generated video content in e-commerce, and how might Alibaba mitigate them?

ARisks include misleading exaggerations of product features and potential copyright issues. Alibaba can mitigate these by leveraging product parameters, actual product images, buyer reviews, and platform auditing mechanisms to ensure generated content remains accurate and trustworthy.

Related Reads

AI "Transfer Station" Earning Millions Monthly? Five Questions Uncover the Truth of Token Arbitrage

The article "AI 'Transfer Station' Earns Millions Monthly? Five Questions Uncover the Truth of Token Arbitrage" explores the emerging business of API token transfer stations, which profit from global AI service price disparities and access barriers. These intermediaries purchase low-cost tokens from overseas AI providers (e.g., OpenAI, Claude) through grey-market methods—such as exploiting enterprise credits, bulk accounts, or subscription benefits—and resell them to Chinese users at a markup. Key drivers include the high cost of using top AI models (e.g., Claude Code costs ~$5 per million tokens), the performance gap between domestic and foreign models, and mismatches between subscription and API pricing. However, the practice carries significant risks: upstream token sources may be unstable or illegal; user data passing through intermediaries can be harvested or injected with hidden prompts; and models might be downgraded without disclosure. The market is evolving, with some operators now exporting cheaper Chinese models (e.g., Qwen3.5 at ~$0.11 per million tokens) to overseas users, leveraging price gaps. Yet, sustainability is low due to compliance crackdowns, instability, and reputational risks. Users are advised to employ detection methods (e.g., prompt adherence tests) and avoid sensitive data usage. The authors caution that while transfer stations offer short-term arbitrage, they lack long-term reliability and security compared to official APIs.

marsbit9m ago

AI "Transfer Station" Earning Millions Monthly? Five Questions Uncover the Truth of Token Arbitrage

marsbit9m ago

The Cost of an 11.5% Annualized Return: Will MicroStrategy's STRC Face a Moment of Reckoning?

This article analyzes the potential risks associated with MicroStrategy's (MSTR) use of structured financial products like STRC to leverage its BTC exposure. While these tools have enabled impressive returns (e.g., 11.5% annualized) and fueled significant capital inflows ($13.5B outstanding), they also create substantial annual dividend obligations (~$400M). The author argues that this structure, while effective in a bull market, could become a liability if BTC price stagnates or declines. The core risk is a potential negative feedback loop: the growing dividend burden from continued STRC issuance may eventually outweigh the benefits of increased BTC holdings. To meet these obligations, MicroStrategy might need to use new issuance proceeds for dividends instead of buying more BTC, which could disappoint equity investors. If the market capitalization (mNAV) falls below the value of its BTC holdings, the company could be forced to sell BTC instead of issuing new shares, potentially triggering a panic. The author estimates a potential inflection point in 6 months, where annual dividend costs reach $3-4B. At that stage, CEO Michael Saylor might face a difficult choice: sell BTC to meet obligations or sacrifice the credibility of the preferred shares by halting dividends. The article concludes that this financial engineering, while powerful, could ultimately "backfire" on MicroStrategy if market conditions turn.

marsbit1h ago

The Cost of an 11.5% Annualized Return: Will MicroStrategy's STRC Face a Moment of Reckoning?

marsbit1h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.

活动图片