📽️AI Video Generators to Watch, Chinese Smartphone Giants Love Gemini, and Startups Secure Big Cash
Weekly China AI News from August 5, 2024 to August 11, 2024
Hi, this is Tony! Welcome to this week’s issue of Recode China AI, a newsletter for China’s trending AI news and papers.
Three things to know
Chinese AI companies are advancing rapidly in video generation, with ByteDance’s Jimeng AI and Shengshu AI’s Vidu leading the way.
Chinese smartphone makers like Oppo, OnePlus, Honor, and Xiaomi are adopting Google’s Gemini AI to capture the favor of international users.
AI startups are also attracting significant investment, as seen with Moonshot AI and 01.AI securing substantial funding.
Chinese Companies Push Ahead in AI Video Generation
What’s New: While OpenAI’s AI video generator Sora remains under wraps, Chinese tech companies are racing to release their text-to-video models. In the last two weeks, in addition to Kuaishou’s Kling, which we talked about before, several new players have entered the market, expanding China’s footprint in AI video generation.
ByteDance’s Jimeng AI: ByteDance, the parent company of TikTok, launched a new text-to-video app called Jimeng AI on iOS and Android. This app allows users to create short videos based on text prompts and is currently available exclusively in the Chinese market. Jimeng AI can generate both videos and images, and is designed to integrate seamlessly with social media platforms like China’s TikTok Douyin.
Zhipu AI’s CogVideoX: Zhipu AI has open-sourced CogVideoX, a text-to-video generation model homologous to Ying, its vido generator app released last month. CogVideoX-2B, the first model in the CogVideoX series, can generate videos with a resolution of 720x480 at 8 frames per second. Recent updates include the integration of CogVideoX into the diffusers library and the open-sourcing of the 3D Causal VAE used in CogVideoX-2B.
Shengshu AI’s Vidu: Vidu by Shengshu AI, which is another entrant in the text-to-video space, can generate realistic and detailed video content. Released worldwide on July 30, Vidu can produce 5-8 seconds of short videos in response to either Chinese or English text inputs, as well as from image prompts, with a resolution of 1080 pixels. The tool was co-developed with Tsinghua University, where Shengshu AI’s founders originated, and powered by Baidu AI Cloud.
Alibaba’s Tora: Alibaba’s Tora is an open-sourced video generation framework that leverages the Diffusion Transformer (DiT) architecture of OpenAI’s Sora. This tool is designed to create videos that precisely adhere to designated movement trajectories while accurately replicating real-world physics.
Why It Matters: Unlike most Chinese chatbots that are limited to the mainland China, these video generators are targeting a global audience. Platforms like Kuaishou’s Kling and Shengshu AI’s Vidu are expanding access worldwide, directly challenging established players such as Runway and Luma.
As demand for efficient, high-quality video production tools grows, AI video generators are becoming essential for content creators, marketers, and educators.
Chinese Smartphone Makers Bet on Gemini to Win Global Favor
What’s New: Several Chinese smartphone makers are partnering with Google to incorporate its AI features into their devices internationally except China.
Oppo and OnePlus: Oppo, the fourth-largest smartphone maker globally, and its subsidiary OnePlus, plan to integrate Google’s Gemini AI model into their smartphones. The rollout of Gemini-based features, such as news summarization and multimodal content generation, will begin later this year.
Honor: Honor announced in May that it would bring Google’s AI features, including the Gemini AI model and Imagen-2, Google’s text-to-image model, to its upcoming devices. This partnership enables Honor to enhance user experience with features like AI-powered app opening via eye-tracking technology.
Xiaomi: Xiaomi recently announced that it is working with Google to integrate the Gemini into its next flagship series for international markets.
Why Gemini, Not GPT: Choosing Gemini over GPT is a fairly easy choice for Chinese smartphone makers, who run their Android phones. Google’s AI features are seamlessly integrated and optimized for Android. For example, the smallest version, Gemini Nano, can run on-device using Android AICore, offering local processing of sensitive data, offline access, and reduced reliance on cloud-based AI. A multimodal version of Gemini Nano will be available later this year.
Not to mention the popular Circle to Search, a new way to search what's on your screen without switching apps, by simply circling target content.
Gemini on Android also serves as a generative AI assistant to help users be more creative and productive by understanding the context of their screen and the apps in use. It includes features like dragging and dropping generated content into apps like Gmail and Google Messages and providing real-time alerts for potential scams during phone calls.
AI Startups Secure Major Funding Amid Fierce Competition
What’s New: Two Chinese AI startups raised huge funds last week that underscores the sustained interest and rapid growth within China’s AI sector.
Moonshot AI: Alibaba-backed Moonshot AI secured $300 million in a new funding round. This round included participation from Tencent Holdings and other investors, raising the company’s valuation to $3.3 billion. The round included investments from Tencent Holdings and others, potentially setting the stage for deeper collaborations between Tencent’s WeChat super app and Moonshot AI’s chatbot.
01.AI: 01.AI, founded by Kai-Fu Lee, has also announced the completion of a new financing round that secures hundreds of millions of dollars. This round included participation from an international strategic investor, Southeast Asian financial groups, and several other institutions.
$2.5 billion: A $2.5 billion valuation has become the benchmark for Chinese AI startups aiming to lead the market. Four companies—Moonshot AI, Zhipu AI, MiniMax, and Baichuan AI—have reached this milestone. Other strong contenders include 01.AI and StepFun, the latter founded by former Microsoft executives, which is reportedly raising new funds at a $2 billion valuation.
Weekly News Roundup
Chinese cybersecurity company 360 put together 16 LLMs from leading tech firms in China, including Baidu, Alibaba, and Tencent, to develop an AI assistant that surpasses GPT-4o’s capabilities in most metrics. This collaborative model, using a Collaboration-of-Experts architecture, allows users to access and switch between multiple AI models seamlessly, providing tailored responses by matching tasks with the most suitable AI expert. (Qbit)
Mainland China saw a surge in AI-related companies, with over 1.67 million enterprises by the first half of 2024, including 237,000 new additions this year. (South China Morning Post)
Chinese researchers from Tsinghua University have developed Taichi-II, the world's first fully optical AI chip, which surpasses NVIDIA's H100 GPU in energy efficiency and accelerates AI training through innovative optical processes and fully forward mode (FFM) learning. (Interesting Engineering)
A sophisticated industry has emerged to smuggle Nvidia’s advanced AI chips into China, circumventing U.S. export controls through a network of shell companies and fake data centers. (The Information)
Meta Sota (秘塔搜索), an AI search company, recently raised over RMB 100 million (~$14 million) in a new funding round. Ant Group led the round, with Lightspeed China Partners also participating. The post-money valuation of the company has reached $150 million. (LatePost)
Trending Research
Researchers from Alibaba introduced Qwen2-Math, a series of advanced LLMs designed specifically for mathematics that showcase improved capabilities in solving intricate mathematical problems compared to other open-source and closed-source models such as GPT4o. Offered in different parameter sizes (1.5B, 7B, and 72B), these models have demonstrated superior performance on English and Chinese math benchmarks across zero-shot and few-shot learning scenarios.
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
The high computational costs and the need for powerful cloud servers limit the real-world application of multimodal LLM, particularly in mobile, offline, and privacy-sensitive scenarios. To address this, researchers from OpenBMB, founded by ModelBest (面壁智能) and TsinghuaNLP, introduced MiniCPM-V, offer a series of efficient MLLMs designed for deployment on end-side devices. The latest and most capable model in the MiniCPM-V series IS MiniCPM-V2.6 (demo above). Researchers said the 8B model surpasses GPT-4V in single image, multi-image and video understanding.
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Researchers from the Shanghai AI Laboratory, Shanghai Jiao Tong University, and others present MMIU, a new benchmark designed to assess the multimodal understanding abilities of large vision-language models. MMIU focuses on the models’ capacity to process sets of images with accompanying text, which is crucial for complex tasks such as storytelling, summarization, and question-answering. The benchmark includes 77,659 images, 7 types of image relationships, and 5 image modalities, along with 11,698 multiple-choice questions, that require the models to perform complex reasoning, align visual and textual information, and handle various data distributions.
Chinese phone makers adoption of Gemini seems to augur well for Google