ERNIE Bot vs ChatGPT; Alibaba's Text-to-Video AI; ChatGLM-6B; Caffe Creator's New AI Venture
Weekly China AI News from Mar 20 to Mar 26
Dear readers, in this week's newsletter, we will delve into Baidu’s ERNIE Bot and compare it to ChatGPT. We’ll also explore the impact of Tsinghua University’s open-source ChatGLM-6B, discuss Alibaba’s recent text-to-image model, and find out what Tencent President has to say about their application of generative AI. Don't miss these AI updates!
Weekly News Roundup
ERNIE Bot vs ChatGPT
What’s new: Baidu’s dialogue AI, ERNIE Bot, has attracted over one million users to its waitlist within five days of its March 16 debut. Mixed reviews and comparisons with GPT-4 and New Bing are circulating on the Chinese internet. I’ve compiled a quick collection of online reviews and first-hand experiences.
Let’s look at opinions from top-tier media outlets:
Bloomberg: Ernie appears to be a big step up from conventional search engines like Baidu’s but isn’t quite able to simulate an authentic, human conversation.
Nikkei Asia: One area where Ernie wins out -- sort of -- is visuals. ChatGPT cannot produce drawings, while Baidu’s bot makes a decent effort.
SCMP: While Ernie Bot has a tendency to avoid certain political questions, it fares better than its OpenAI rival ChatGPT when it comes to providing up-to-date information.
Wired: One WeChat poster compared the Chinese bot’s demoed capabilities to those of ChatGPT and found it better at handling Chinese idioms and more accurate in some instances.
What about Chinese media?
CSDN: Although there is a gap between ERNIE Bot and ChatGPT, the overall performance is still commendable.
Pingwest (品玩): From a “fun” perspective, there’s no substantial difference between ERNIE Bot and ChatGPT.
Chaping (差评君): There is a certain gap between ERNIE Bot and Bing, but the gap is not too outrageous. In fact, ERNIE Bot’s performance on some issues is stronger than Bing’s.
Baidu Co-founder and CEO Robin Li in an interview with 36Kr:
“Two months ago, ERNIE Bot lagged 40 points behind ChatGPT (out of 100). The gap widened to 70 points a month later.”
“Li believes ERNIE Bot matches ChatGPT in November and is close to the version in January.”
Funny image generation:
Chinese netizens discovered a hilarious use for ERNIE Bot: generating humorous images.
Given text prompts of Chinese cuisines, such as Braised Lion’s Head (红烧狮子头 a braised meatball) or Pork with Garlic Sauce (鱼香肉丝, minced Pork with “Fish Flavor”), ERNIE Bot created literal illustrations based on the names. The issues have been gradually resolved over the past few days.
Baidu said last Thursday that the chatbot’s text-to-image capability was “entirely self-developed” after users raised concerns about potential copying from overseas sources.
My personal take (Disclaimer: while I am from Baidu’s global comms team, my comment in my newsletter doesn’t represent the views of Baidu).
ERNIE Bot understands Chinese slang, online buzzwords, history, and other Chinese-specific elements better than ChatGPT. For example, ERNIE Bot knows “一坤 is 2.5 years” or “龙场悟道” while ChatGPT has no clues.
ERNIE Bot shows basic logical reasoning but lacks stability. Check out how ERNIE Bot solves the “let’s find the diamond” question from The Verge.
ERNIE Bot can help write press releases, design travel itineraries, and create business plans, but with limited tokens.
ERNIE Bot seems to be less creative and fun compared to ChatGPT, and its performance declines in multiple-round conversations.
ERNIE Bot is improving rapidly. As Robin Li suggests, it’s crucial to release conversational AI like ERNIE Bot to the market for stress testing.
Tsinghua University Open Sources ChatGPT-Like Model Compatible with Consumer GPUs
What’s new: The last two weeks are swarmed with big AI news announcements: GPT-4, Microsoft 365’s AI upgrade, ERNIE Bot, Bard, Claunde, Midjourney V5. Many might have missed a big one: Tsinghua University open-sourced ChatGLM-6B.
ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. What’s exhilarating is users can deploy the model locally on consumer-grade graphics cards (only 6GB of GPU memory is required at the INT4 quantization level).
ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese QA and dialogue. The model is trained for about 1T tokens of Chinese and English corpus, supplemented by supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback.
Try the online demo on Huggingface Spaces.
Early reviews: ChatGLM-6B has reached the top spot of GitHub’s 7-day trending and has received 9.4K stars. I collected some reviews from the Chinese ML community:
ChatGLM-6B might be the first ChatGPT-like model that requires only 6GB of memory and can be quickly deployed privately on gaming laptops.
This model is the best-performing Chinese model I have tried at the same parameter scale.
GLM-130B: ChatGLM-6B is spawned from GLM-130B, one of the few open-source large language models (LLMs) with over 100 billion parameters and likely one of the best Chinese LLMs. Introduced in October 2022 and accepted by ICLR 2023, GLM-130B outperforms GPT-3 on popular English benchmarks and ERNIE 3.0 Titan on Chinese benchmarks. The model exhibits a unique scaling property that allows INT4 quantization without significant performance loss.
Alibaba’s Damo Academy Releases Text-to-Video Model
What’s new: Damo Academy, Alibaba’s research arm, released a text-to-video synthesis model last week on HuggingFace and its ModelScope Studio. Here is everything we know about this model:
This model is based on a multi-stage text-to-video generation diffusion model, which takes a text description as input and generates a video that matches the given description. It currently supports only English input.
The text-to-video generation diffusion model comprises three sub-networks: text feature extraction, text feature-to-video latent space diffusion model, and video latent space to video visual space. The total model parameters amount to approximately 1.7 billion. The diffusion model employs the Unet3D structure and generates videos through an iterative denoising process from pure Gaussian noise videos.
This model currently supports inference only on GPUs and requires roughly 16GB of CPU RAM and 16GB of GPU RAM.
What Else Do You Need to Know?
💯 Yangqing Jia, the creator of the deep learning framework Caffe, has departed from Alibaba after serving four years as the corporate vice president. Jia is reportedly starting his new AI venture aimed at helping enterprises deploy AI more efficiently.
🤖 Yuanyu Intelligent, a startup boasting itself as the first Chinese ChatGPT developer, open-sourced ChatYuan large v2. This is a bilingual LLM that can run on mobile phones and generate 4,096 tokens at maximum. Code here.
🛻 Autonomous vehicle upstart Pony.ai has announced a strategic partnership with Meituan. Pony.ai will develop automotive-grade domain controllers for Meituan’s driverless-delivery operations.
🚙 Baidu and AutoX got the green light to test fully driverless robotaxis on public roads in Shanghai’s Pudong New Area.
🏎 Shenzhen-based self-driving startup DeepRoute introduced the Driver 3.0 production-ready self-driving solution without the assistance of HD Maps.
🤺 Hongxia Yang, former head of Alibaba’s multimodal model M6, has joined ByteDance (reportedly for a year) to develop generative large language models.
🤓 Tencent President Martin Lau said it’s natural for Tencent to incorporate some of the generative AI technologies into Weixin and QQ.
Trending Research
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Affiliations: University of Science and Technology of China, Microsoft Research Asia, Microsoft Azure AI
This paper introduces NUWA-XL, a novel Diffusion over Diffusion architecture for extremely long video generation. Instead of the inefficient sequential generation, NUWA-XL uses a “coarse-to-fine” process that allows parallel video generation. It employs global and local diffusion models to generate keyframes and fill content between frames. The method reduces the training inference gap and significantly decreases the average inference time.
RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset
Affiliations: SSE CUHKSZ, FNii CUHKSZ, Huawei Technologies, Tsinghua University
This paper introduces 3DBiCar, the first large-scale dataset of 3D biped cartoon characters, and RaBit, a corresponding parametric model based on a SMPL-like blend shape model and StyleGAN-based texture generator. The dataset enables various applications such as single-view reconstruction, sketch-based modeling, and 3D cartoon animation, with the part-sensitive texture reasoner ensuring detailed local appearances are preserved.
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Affiliations: Huawei Technologies
This paper introduced PanGu-Σ, a 1.085 trillion-parameter language model, trained on Ascend 910 AI processors and the MindSpore framework. Using Random Routed Experts (RRE) and Expert Computation and Storage Separation (ECSS), researchers achieved a 6.3x increase in training throughput. PanGu-Σ excels in zero-shot learning for Chinese NLP tasks and performs strongly in open-domain dialogue, question answering, machine translation, and code generation when fine-tuned.