👩🏻💻 Meet Vidu, China's Answer to Sora, SenseTime Surges on LLM Upgrade, Moonshot AI Founder Cashes Out, and Huawei's Cloth-Removing AI
Weekly China AI News from April 22, 2024 to April 28, 2024
Hello readers, this week I bring you three interesting stories, plus one concerning development. So this issue is a must-read. Just a day before I publish this newsletter, a Tsinghua University spinoff unveiled its text-to-video model touted as China’s first equivalent to Sora. The demo video is truly impressive! SenseTime's stock surged 36% following a major LLM upgrade and a new collaboration with Huawei. The founder of Moonshot AI has reportedly cashed out tens of millions of dollars. On a more troubling note, Huawei’s latest photo-editing AI feature, which can partially remove clothing from images, has sparked significant concerns over potential misuse.
Meet Vidu, China’s Answer to Sora
What’s New: A Tsinghua University spinoff yesterday unveiled a Sora-like text-to-video platform, named Vidu, capable of creating high-definition videos up to 16 seconds long with a resolution of 1080p.
According to its demo video, Vidu produces high-quality visuals and effects that are almost as good as those in OpenAI’s Sora. It excels in visual storytelling, and maintains consistent movements through. Vidu also includes Chinese cultural themes in its videos, such as pandas and dragons.
Technical Details: While the research team hasn’t released a technical paper, their press release says Vidu is built on its U-ViT architecture, a framework proposed in September 2022 that merges Diffusion and Transformers, two key backbone architecture of visual models and LLMs, respectively (All are Worth Words: A ViT Backbone for Diffusion Models).
In March 2023, the team developed and open-sourced its text-to-image model UniDiffuser with billions of parameters, trained on LAION-5B. Like Sora was built on top of Dalle 3, UniDiffuser lays the foundation for Vidu.
Who Are They: Shengshu Technology, a spinoff startup from Tsinghua University founded in 2023, is an AI solution provider that focuses on creating multi-modal application products. The company has raised hundreds of millions of RMB from Ant Group, Baidu Ventures, Qiming Ventures, etc.
First Impression: Even though Vidu isn’t as good as Sora, it’s probably the second-best text-to-video model out there (based on its demo), especially impressive since it’s only been two months since Sora was unveiled.
Vidu was created using its text-to-image generator, UniDiffuser, on the LAION-5B dataset. So, it’s great at creating Western-style character animations, which makes it a lot like Sora, but it’s missing those unique Chinese touches.
SenseTime Surges on New LLM Upgrade & Collaboration with Huawei
What’s New: SenseTime’s stock soared by 36% following the unveiling of its next-generation SenseNova 5.0 model and a collaboration with Huawei on domain-specific models. The company said that SenseNova 5.0 is “generally comparable to GPT-4 Turbo” and showcases significant enhancements in areas such as knowledge acquisition, mathematics, reasoning, and coding.
Performance Insights: SenseNova 5.0 is a mixture-of-experts multimodal model with 600 billion parameters. It is trained on 10TB of data, which includes a huge amount of synthetic data. This model features a context window of up to 200,000 tokens.
SenseChat 5.0, a sub-LLM of SenseNova, has outperformed GPT-4 Turbo (released last November) and Llama 3 70B across multiple benchmarks, including MMLU and HumanEval. These evaluations were conducted on OpenCompass, an open-source LLM evaluation platform developed by the Shanghai AI Lab.
SenseChat Lite, a more compact version of the flagship with 1.8 billion parameters, excels in Chinese-specific tasks, surpassing Llama 2-7B and Gemma-7B.
InternVL-Chat-V1.5, a sub-VLM of SenseNova, approaches the performance of GPT-4V and Gemini Pro across various benchmarks such as MMMU, DocVQA, ChartQA, and MathVista.
Collaboration with Huawei: SenseTime also announced domain-specific models optimized for Huawei’s Atlas chips in sectors like finance, healthcare, governance, and coding. At last month’s Huawei China Partner event, a SenseTime executive revealed that the company now has 3,200 Atlas chips.
Challenges in 2023: In 2023, SenseTime’s revenue declined to RMB 3.4 billion, a decrease of 11% from the previous year. The company’s net losses also widened, increasing by 6.5% to reach RMB 6.44 billion.
Despite these challenges, SenseTime reported that its generative AI segment generated nearly RMB 1.2 billion in revenue. The company has expanded its GPUs to 45,000.
Moonshot AI Denies Founder Cashout Rumors
What’s New: Yang Zhilin, founder of the AI startup Moonshot AI, cashed out approximately $40 million from shares after a funding round, multiple sources familiar with the matter told Chinese media Jiemian.
Moonshot denied these claims and referenced a previously announced employee incentive plan, though the connection between this plan and the cashout rumor was not clarified.
How it Works: Moonshot, since its inception in April 2023, has been under the spotlight. Its AI chatbot Kimi Chat has gained popularity in the past two months with a new feature of processing two million Chinese words.
Moonshot raised $1 billion with a valuation reaching $2.5 billion. Investors included Sequoia China, Xiaohongshu, Meituan, Alibaba, and others.
The company in March announced an employee stock buyback plan starting in 2024 to attract and retain talent, indicating that adjustments in stock ownership could be linked to long-term incentive strategies rather than personal gains.
David Zhang, a renowned investor and co-founder of Matrix Partners China, defends the practice of founders cashing out a portion of their shares during rapid growth phases. He argued it provides a more stable foundation for a long-term commitment.
Huawei’s Photo-Editing AI Under Fire for Removing Clothes
What’s New: Huawei’s latest flagship Pura 70 series smartphones are facing scrutiny over an AI-powered photo editing feature that allows users to digitally remove clothing from images, potentially generating explicit content without consent.
How it Works: The Pura 70 series offers a photo editing AI feature akin to Google’s Magic Eraser, designed to remove unwanted elements from images. However, many users discovered that this feature can erase clothing from photos and reveal AI-generated body parts in their place.
It has raised significant privacy and ethical concerns due to the potential for misuse in creating inappropriate pictures, particularly misogynistic actions targeting females.
A Huawei customer service agent told Chinese media that the “undress” AI feature is an unexpected loophole and will be optimized. As of April 24, the latest software version has addressed the issue.
Why It Matters: This incident underscores the significant challenges that technology companies face in ensuring the responsible and ethical use of AI applications. While it is unlikely that Huawei intentionally designed this “clothing-removing” feature, there appears to be an oversight in their AI training pipeline. This could stem from either the dataset used to train the AI editing model or the lack of ethical evaluations prior to the product’s launch.
Weekly News Roundup
On Friday, April 26, U.S. Secretary of State Antony Blinken said in his visit to China that high-level U.S.-China discussions on AI are scheduled to occur within the “coming weeks.” “Earlier today, we agreed to convene the inaugural U.S.-PRC talks on artificial intelligence in the coming weeks, where we will exchange views on the risks and safety issues posed by advanced AI and explore effective management strategies.” (CNBC)
Toyota and Tencent, as well as Nissan and Baidu, have announced collaboration on AI during the Beijing Auto Show. (Reuters)
Tsinghua University’s LLM evaluation shows that Baidu’s ERNIE 4.0 and startup Zhipu AI’s GLM-4 are the top-performing AI models in China, but still lag behind OpenAI’s GPT-4 and Anthropic’s Claude 3.0. (SCMP)
On April 23, the Beijing Internet Court ruled on the nation’s first “AI voice infringement case,” recognizing the plaintiff’s voice rights in an AI-generated voice. The court found that the defendants, a Beijing-based technology company and a software company, used the plaintiff’s voice to develop a text-to-speech AI product without consent, constituting infringement. The verdict requires these defendants to issue a written apology to the plaintiff and compensate the plaintiff with RMB 250,000 for economic losses. (The Science Daily)
Shenzhen-based robot company Astribot’s AI robot, Astribot S1, demonstrated household capabilities in a video demo last week, including folding clothes, sorting items, flipping and frying in a pan, vacuuming, and competitive cup stacking.