🤩ByteDance Unveils ChatGPT Rival; Baidu Upgrades ERNIE Bot with Image and Video Features; Chinese Universities Open-Source Virtual AI Town
Weekly China AI News from August 14 to August 20
Dear readers, in this issue I will share my first impression of ByteDance’s ChatGPT clone Doubao (豆包). ERNIE Bot received a major upgrade with five plugins, including visual understanding and text-to-videos. Meet AgentSims, an open-source sandbox for large language model (LLM) evaluation. China’s leading milk maker just announced their GPT models for health nutrition,
ByteDance Quietly Releases its ChatGPT Rival
What’s new: ByteDance has quietly released its long-awaited ChatGPT rival, an AI chatbot named Doubao (豆包). Previously known as “Grace” during internal testing, Doubao is now available for Android, iOS, and the web. (Correction: There is currently a waitlist for Doubao, so I have revised my previous headline and description).
How it works: Like ChatGPT and other large language model chatbots, Doubao understands questions in natural language and generates human-like responses. It provides information, supports conversation, and breaks down complex problems through step-by-step reasoning, also known as “chain-of-thought.” According to Doubao, its training data was current as of April 2023.
Doubao offers four specialized bots: Doubao itself, an English teacher, a lively personality named Xiao Ning, and a writing assistant. Xiao Ning is designed with her own interests and hobbies. For example, while Doubao claims no number preference, Xiao Ning calls 10 her favorite. However, for unknown reasons, Xiao Ning seems less intelligent than Doubao, occasionally hallucinating answers.
In my early tests, Doubao exceeded my expectations. We played the Chinese game idiom solitaire (成语接龙), linking four-word idioms by matching first and last words, indefinitely long. According to Chinese media, Doubao also produces better-than-Google translations and lively, emoji-rich social posts.
I then tested Doubao with two tricky questions from the NYT article “ChatGPT VS ERNIE Bot.” The first question, “Write a five-character quatrain about The New York Times,” was meant to assess its Chinese generation ability, a task at which it failed to understand the question and generate a six-character poem instead. The second question, “Here we have a book, nine eggs, a laptop, a bottle, and a nail. Please tell me how to stack them onto each other in a stable manner,” was designed to test the chatbot's human-level intuition. Surprisingly, Doubao’s response was that the task was impossible.
ERNIE Bot Levels Up as Versatile AI Assistant with New Video and Image Features
What’s new: Baidu has expanded the capabilities of ERNIE Bot, its LLM chatbot, with two new plugins that allow it to interact with images and convert text into videos, respectively. The upgrades were unveiled at Baidu's Wave Summit Developer Conference.
How it works: The “Visual Interaction” plugin enables ERNIE Bot to recognize celebrities, objects, and scenes in images to generate relevant text outputs. The “Text-to-Video” plugin transforms text descriptions into short, engaging videos by intelligently selecting and editing clips, and synthesizing voiceovers.
Baidu says since its debut in March, ERNIE Bot’s training throughput has increased 3x and its inference speed is up over 30x. The other new plugins are Baidu Search, ChatFile, and Data Analytics & Visualization.
Why it matters: The image and video capabilities expand ERNIE Bot's versatility as an AI assistant that can be widely deployed across applications. The plugins customize it for diverse uses while improving speed and performance.
Other major announcements from Baidu’s Wave Summit include updates to its PaddlePaddle deep learning platform now engaging 8 million developers; the launch of InfoFlow for AI-powered business insights; and the availability of Baidu Comate AI coding assistant for enterprises.
Meet AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
What’s new: Researchers from multiple Chinese universities and Pennsylvania State University have created their “West World” - a virtual AI town - comprising only LLM-based agents that can interact with each other and do different types of tasks, such as tasting delicious food to become a chef or running the city as a mayor. Named AgentSims, this sandbox environment built with LLMs through goal-driven tasks in a simulated social world has been open-sourced.
How it works: AgentSims allows users to construct an artificial town with buildings, interactive objects, and resident agents driven by LLMs. The LLM agents are equipped with three systems to enable more human-like behavior: a planning system that breaks down goals into coherent subtask prompts, a memory system that stores agent experiences in a vector DB for consistency, and a tool-use system that learns operation skills from equipment interactions. Users can design evaluation tasks by defining agent goals, configuring the town layout and objects, and observing whether agents successfully complete their goals.
Why it matters: AgentSims facilitates comprehensive, task-based evaluation as an alternative to existing LLM benchmarks with constrained abilities tested, vulnerable test data, and subjective metrics, according to the paper. The interactive graphical interface allows non-expert users like social scientists to build environments and design tasks through menus and drag-and-drop easily. The highly modular codebase allows AI experts and developers to test different LLM support mechanisms by customizing the abstracted agent, planning, memory, and tool-use classes. The goal-driven evaluation provides an objective performance metric of task success rate. Overall, AgentSims enables community-wide, multi-disciplinary development of diverse, robust LLM benchmarks centered on goal-driven social simulations.
Example tasks: Proposed tasks include evaluating LLMs as participants in adversarial social scenarios to test a theory of mind, appointing the LLM as mayor of a town to test planning abilities, and using AgentSims as a controllable environment for social science experiments.
Weekly News Roundup
🥛 Mengniu Diary, China’s leading dairy maker, is jumping on the LLM bandwagon by announcing MENGNIU.GPT, which has passed 21 nutrition and health exams and can help consumers develop personalized meal and workout plans.
💰 Tencent’s AI model is among the top leading foundation models produced in China, said Tencent President Martin Lau on its second-quarter earnings call Wednesday. Read more on Bloomberg.
🤖 On August 15, iFlytek released Spark Cognitive Model V2.0, with upgraded abilities in code generation, image captioning and creation, and digital human generation. iFlytek also partnered with Huawei to release the Spark All-in-One machine, supporting training and inference with customizable optimization.
🎥 Alibaba Cloud launched a digital human video generation tool called Live Portrait. Users can generate a talking digital human video based on a photo and text/audio. Potential applications include live streaming, chatbots, and marketing.
🔎 Lenovo said it would invest another $1 billion in AI over three years to accelerate AI deployment, including developing AI hardware, infrastructure, and industry solutions.
⚖️ On August 15, the Chinese Academy of Social Sciences released the “AI Law Model V1.0 (Expert Proposal)” 《人工智能法示范法1.0(专家建议稿)》, emphasizing pilot programs by state agencies to promote AI adoption. The proposal suggests a negative list system for AI risk management, with ex-ante regulation of activities on the list, and ex-post regulation for other activities.
Trending Research
Database administrators face challenges managing large systems. We propose D-Bot, an LLM-based assistant that acquires maintenance knowledge from texts and tools. D-Bot conducts tree-of-thought reasoning to analyze root causes and enables collaborative diagnosis between LLMs. Experiments show D-Bot efficiently diagnoses root causes. (Affiliations: Tsinghua University)
WizardMath models (70B/13B/7B) is a recent open-sourced LLM that adopts the so-called Reinforced Evol-Instruct method for math LLMs. WizardMath 70B surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Chinchilla on GSM8k, exceeds Text-DaVinci-002, GAL, PaLM, GPT-3 on MATH, and outperforms all other open-source LLMs on both GSM8k and MATH by a substantial margin. (Affiliations: Microsoft, Chinese Academy of Sciences)