👏🏻Baidu's Chatbot Becomes 'ChatGPT + SearchGPT', Ant Group Introduces Personal Assistant, and MiniMax’s Video Generator Takes on Sora
Weekly China AI News from September 2, 2024 to September 8, 2024
Hi, this is Tony! Welcome to this week’s issue of Recode China AI, a newsletter for China’s trending AI news and papers.
Three things to know
Baidu upgraded its ERNIE Bot mobile app by integrating the chatbot with search and remembering users’ preference.
Alibaba’s Ant Group released a new personal chatbot assistant that can order food and book a ride.
Chinese AI startup MiniMax has launched a new video generation model, abab-video-1, which is touted as “possibly the best video generator in China.”
(By the way, I was invited to write a guest post on
’s AI Supremacy about Chinese AI startups. I hope you would enjoy it.)Baidu Upgrades Chatbot App With New Search and Memory Features
What’s New: Baidu last week upgraded its mobile chatbot, ERNIE Bot App, now rebranded as Wenxiaoyan (文小言). Marketed as a “New Search” smart assistant, the chatbot combines multiple functions that allow users to search for anything from music to map navigations, chat with the camera on, remember users’ preference, and schedule daily news written by AI.
How It Works: The name “Wenxiaoyan” adds a playful twist to ERNIE Bot’s Chinese name, “Wenxin Yiyan” (文心一言). It also plays on the phrase “Ask Xiaoyan” (问小言), a nickname users had already coined for ERNIE Bot.
Wenxiaoyan introduces a range of new features that set it apart from other AI chatbots:
Memory & Personalization: Wenxiaoyan can remember detailed user preferences — such as profession, hobbies, nicknames, interests, favorite celebrities — allowing for a highly personalized user experience.
Custom Subscriptions: Users can now ask Wenxiaoyan to deliver AI-generated content on a schedule. For example:
Every Monday at noon, send me 5 trending AI news articles.
Tell me the weather for Haidian, Beijing, every day at 8 a.m.
Search-Augmented Answers: Responses from Wenxiaoyan are backed by sources retrieved from search engines, pulling from a wide array of platforms, including news outlets, blogs, and more.
Digital Avatars: Users can interact with various virtual avatars designed for different purposes, such as practicing English or having casual conversations.
Multimedia Search: By leveraging Baidu’s search engines, Wenxiaoyan allows users to search articles, images, map directions, encyclopedia entries from Baidu Baike, and even music. Users are not limited to text input; they can also use speech, images, and documents to query the AI.
Image Generation & Editing: Beyond generating images, Wenxiaoyan offers advanced photo editing features, including expanding images, removing unwanted objects, and applying filters.
Specialized LLM Agents: Users can chat with various LLM agents, each specializing in different areas—from travel guides to financial advice.
According to a Baidu executive, Wenxiaoyan has already surpassed ten million monthly active users, with 70% of its user base being young people.
Why It Matters: The launch of Wenxiaoyan reflects the rising interest in AI-driven search engines, which have the potential to reshape search behaviors, with platforms like Perplexity and SearchGPT gaining traction. In response, traditional search giants like Google and Baidu are increasingly incorporating AI-generated answers into their search results. Google’s generative AI feature, AI Overview, is reducing errors and expanding its user base, while Baidu recently reported that 18% of its search results now include AI-generated summary. Wenxiaoyan represents Baidu’s most ambitious step yet in incorporating AI and search.
Ant Group Unveils an AI Assistant to Order Coffee and Pay Bills
What’s New: Ant Group launched a new AI mobile app, Zhixiaobao (支小宝), at the 2024 INCLUSION · Conference on the Bund in Shanghai. The app, marketed as an “AI life assistant,” integrates seamlessly with the company’s digital life platform, Alipay, which already hosts over 4 million mini-programs and 8,000 digital services. Zhixiaobao is now available for download on iOS and Android devices.
Why It Matters: For nearly two decades, Alipay has been a super app in China catering to all aspects of life, from paying bills and taxes to booking travel and even managing marriage certificates.
Now with Zhixiaobao, Ant Group is betting on AI to redefine people’s everyday experiences. Whether it’s ordering food, booking a ride, or finding local entertainment options, users can simply tell Zhixiaobao what they want, and it gets the job done. This is a major shift from users having to follow tedious step-by-step guides or click through endless mini-programs. However, when it comes to more complex tasks, such as purchasing a product from Taobao, the app falls short.
How It Works: The app features three sections— Moment, Chat, and Agents:
Moment: A personalized dashboard that changes based on time, location, and user habits. For example, in the morning, it might show a “5-Minute News Flash” for users who like to stay updated, or suggest a bike nearby for the commute. If you’re a frequent last-minute cab rider, it will offer a taxi reminder instead.
Chat: Powered by Ant Group’s proprietary model Bailing, Zhixiaobao can chat with you like other AI chatbots. You can ask it about local attractions, and it will comb through publicly available information to provide summaries, including user reviews and specific articles.
Agents: This section hosts specialized AI agents designed to handle specific tasks. These range from a health manager for medical advice to a Fitness Pro that can design workout plans.
At the INCLUSION Conference, Ant Group also introduced three additional AI-driven products, including an AI Agent dev platform, an AI healthcare manager, and Ant Bridge, which is an open platform leveraging AI models and financial to help insurance companies provide personalized customer responses in real-time.
MiniMax Joins the Text-to-Video Race with New Model
What’s New: MiniMax, a Chinese AI startup, launched a new video generation model, abab-video-1, accessible via its chatbot Hailuo AI. Users can now try text-to-video generation on its web site for free.
Although a latecomer to the trend that began with OpenAI’s Sora earlier this year, MiniMax is confident that its model stands out. CEO and Co-Founder Yan Junjie claims it “might be the best video generator in China.”
How it Works: MiniMax’s text-to-video tool is quite simple. Users input a prompt, and the model generates a video within 5 minutes.
According to MiniMax, the model excels in high compression, text-to-video alignment, and diverse style generation. It can produce videos at a resolution of 1280x720, 25 fps, and a length of 6 seconds, which resemble cinematic quality. This advantage comes from solving complex issues related to token compression and optimizing the model’s training to handle high-dynamic content, according to the company.
The model uses the DiT architecture, similar to Sora, but the company has not disclosed further technical details. However, the CEO mentioned an innovative architecture known as Linear Attention.
I tested Hailuo AI with a few prompts and reviewed some video clips generated and shared on X. My first impression is that the model has a great understanding of text prompts, regardless of their complexity, and accurately reflects them in the generated videos. Although it currently only produces 6-second videos, it manages to include most elements specified in the prompt. Another remarkable feature is the range of styles the model can generate, from cinematic effects to anime.
Prompt: The Millennium Falcon soars through space, Jedi wield lightsabers against the Sith, stormtroopers engage in fierce battles, and the Death Star explodes.
Prompt: In a tropical rainforest, sunlight filters through the dense foliage and spills onto the forest floor. An explorer in a red jacket stands by a small stream, carefully observing a school of fish in the water. Suddenly, a brilliantly colored parrot flies overhead and perches on a branch, beginning to mimic the explorer’s whistle.
However, while Hailuo AI showcased some fascinating user-generated examples, I was unable to replicate their results using the same prompts. In fact, my generated videos were significantly inferior to their samples. I guess the free version may not be the most advanced model?
For example, using the same prompts, here is my generated video compared to the displayed example. Prompt: Dreamlike and surreal scenes with constantly changing locations in FPV flight: a city at night, a beach, underwater coral, and outer space. Dynamic motion, motion blur, time-lapse, ultra-high speed, 30x speed, cinematic effects, and soft tones.
Weekly News Roundup
Chinese AI startup StepFun last week launched Step-1X, a large-scale image generation model, as part of their open platform updates. This model is designed to align deeply with semantic content and excel in generating detailed images, accommodating prompts of up to 2000 characters, and is optimized for Chinese cultural elements. (Leiphone)
Tencent last week announced a range of AI upgrades, proprietary innovations, and global solutions at the Tencent Global Digital Ecosystem Summit. Tencent unveiled Hunyuan Turbo, its latest foundation model based on the Mixture of Experts (MoE) architecture that doubles training efficiency and reduces inference costs by 50 percent. (Tencent)
Nvidia’s AI chips, despite US export restrictions, are more affordable to rent in China than in the US due to plentiful supply and a thriving black market. (Financial Times)
TIME included Zhuang Rongwen, Director of China’s Cyberspace Administration, in its annual TIME AI 100 list. The magazine commended The magazine praised Zhuang for being pivotal in regulating AI while balancing innovation, implementing historic AI regulations, and overseeing the development of AI models aligned with socialist values. (TIME)
China’s Ant Group, Tencent, Baidu, and US tech giants Microsoft, Google, and Meta have collaborated to establish the first international standard for securing large language models in AI supply chains, addressing the increasing need for AI governance. (SCMP)
Trending Research
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Researchers from ByteDance and Zhejiang University introduced Loopy, an audio-driven video diffusion model for generating natural and diverse portrait avatar movements by leveraging long-term motion dependencies. The model can produce vivid motion details, such as subtle non-speech movements, emotion-driven facial expressions, and natural head movements, all while using only the first frame as a reference image and the accompanying audio.
Researchers from DeepSeek released DeepSeek-V2.5, which combines the coder and chatbot model of DeepSeek-V2. The model has been fine-tuned to align more closely with human preferences and shows improved performance across various benchmarks, including AlpacaEval 2.0, ArenaHard, and HumanEval Python. Additionally, the model supports function calling by interacting with external tools and can generate outputs in JSON format or complete Fill In the Middle tasks.
Researchers from Tsinghua University and ModelBest released MiniCPM3-4B, the latest iteration in the MiniCPM series. The model surpasses the performance of larger models like Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, while offering a 32k context window and support for function calls and code interpretation.