💨 Baidu, Alibaba Intensify AI Battles; Huawei AI Forecasts Weather 10,000 Times Faster; AI Converts Dreams into Images

Weekly China AI News from July 3 to July 9

Jul 10, 2023

Dear readers, I have returned to the States after a four-week journey in China. I finally reunited with my family, friends, and colleagues after a three-year separation due to Covid. So what happened while I was away for the past two weeks? Wait, Meituan acquired Light Year for RMB 2 billion??!!

This week, I'll delve into the latest AI offerings from industry giants Baidu, Alibaba, and Huawei. AI can now transform your brain signals into images. Plus, did you know that this year alone, over 80 large language models have been introduced in China?

Baidu, Alibaba, Huawei Heat up AI Games with New Offerings

Alibaba Cloud Debuts AI Image Generation Model Tongyi Wanxiang

Baidu: Introducing ERNIE 3.5 and the New ERNIE Bot App

Last week, Baidu announced significant upgrades to its large language model ERNIE, releasing version 3.5. The latest version has been engineered to double training throughput and accelerate inference speed by an impressive 30-fold compared to its predecessor, ERNIE 3.0. Additionally, ERNIE 3.5 incorporates new plugins and reportedly outperforms GPT-4 in some Chinese language tasks according to public benchmarks, a claim reported by China Science Daily.

Alongside ERNIE 3.5, the company quietly launched the ERNIE Bot app on the App Store. While the app is freely available, access is limited to approved users. According to Chinese media outlet Qbitai, the app introduces over 120 new features, including unique elements not found in the desktop version. These features include chatbots impersonating public figures like Elon Musk or TV characters such as Zhenhuan (甄嬛), slide generations, and avatar image creation.

Alibaba: Tongyi Wanxiang as a Contender to Midjourney

Alibaba Cloud unveiled its innovative AI image generation model, Tongyi Wanxiang, at the World Artificial Intelligence Conference 2023. The model is now accessible for enterprise customers in China for beta testing.

Much like Midjourney and Stable Diffusion, Tongyi Wanxiang converts text prompts in Chinese and English into images, replicating a wide range of styles, including watercolors, oil and Chinese painting, animation, sketching, flat illustrations, and 3D cartoons. The model operates under the Composer, Alibaba's diffusion model, which grants users versatile control over the final image output.

Huawei: Introducing Pangu 3.0

Huawei's annual developer conference witnessed the much-awaited announcement of Huawei Cloud Pangu Models 3.0. The telecom titan aims to differentiate itself by unveiling five foundation models, encompassing NLP, CV, multimodal, prediction, and scientific computing, offered in varying sizes: 10 billion parameters, 38 billion parameters, 71 billion parameters, and 100 billion parameters.

Additionally, Huawei launched the Ascend AI cloud services. One single compute cluster can deliver 2,000 petaFLOPS of compute capacity, while a 1,000-card cluster can maintain training a multi-billion parameter model for an uninterrupted span of 30 days.

Huawei’s Pangu-Weather Leverages AI for Rapid and Accurate Weather Forecasts

What is the Fujiwhara Effect? Super Typhoon Hinnamnor provided a textbook example

What’s New: Huawei Cloud has achieved a significant milestone in the realm of weather prediction with their advanced AI model, “Pangu-Weather”. The model’s precision in weather forecasting outperforms traditional numerical methods while offering an astounding 10,000-fold improvement in prediction speed. This paper, authored solely by Huawei, has made it to Nature.

How it Works: Pangu-Weather’s efficacy is rooted in its innovative application of a 3D Earth-Specific Transformer (3DEST) architecture. This approach allows it to effectively process complex, non-uniform, 3D meteorological data. The model was trained using a hierarchical, temporal, aggregation strategy across different forecast intervals and 43 years of weather data.

As a result, Pangu-Weather can generate forecasts ranging from one hour to seven days, accurately predicting meteorological features such as humidity, wind speed, temperature, and sea level pressure in mere seconds.

Why It Matters: The profound implications of this technological breakthrough stretch from everyday life to global economies. The accuracy and speed of Pangu-Weather’s predictions hold the potential to significantly mitigate the impact of severe weather events by allowing more time for preparation and response. This has the potential to save lives and reduce economic losses, as was evidenced in 2022 when typhoons caused an economic loss of 5.42 billion yuan in China alone.

Pangu-Weather has already shown its capabilities in real-world applications. It successfully predicted the trajectory of Typhoon Mawar five days before it changed course in the eastern waters of Taiwan. These early and accurate predictions could drastically improve disaster management and preparedness efforts.

DreamDiffusion Transforms Brainwaves into Stunning Images

What's New: Researchers from Tsinghua Shenzhen International Graduate School, Tencent AI Lab, and Peng Cheng Laboratory have introduced DreamDiffusion, a groundbreaking method for generating high-quality images directly from brain electroencephalogram (EEG) signals. This innovative technique bypasses the need to translate thoughts into text, creating a pathway for a direct brain-to-image interface.

How It Works: Here's how the magic happens: The DreamDiffusion model leverages pre-trained text-to-image models and employs a process known as temporal masked signal modeling. This is used to pre-train the EEG encoder, which is responsible for creating effective and robust EEG representations. These representations are what get converted into images.

The method also uses the CLIP image encoder to provide extra supervision, helping to better align EEG, text, and image embeddings even when there is limited EEG-image paired data.

Why It Matters: Imagine a world where your dreams, thoughts, or ideas can be directly visualized. Struggling to explain that dream you had last night about a unicorn and a spaceship? This could be particularly beneficial for people with communication difficulties. Further, it may have implications for fields such as psychology and neuroscience, where understanding and visualizing thought processes and dreams is a key research area.

Weekly News Roundup

UAE Awards China's WeRide The First National Self-Driving Vehicle License Ever | Carscoops

🚙 The United Arab Emirates has given Chinese autonomous driving company WeRide the first national license for self-driving vehicles. The permit allows WeRide to test its Level 4 autonomous vehicles on public roads throughout the country.

🥴 AI debate: Allen Zhu Xiaohu, managing partner at GSR Ventures, suggested at a Beijing conference that ChatGPT could pose a threat to AI startups due to its superior performance. However, Fu Sheng, CEO of Cheetah Mobile, contradicted Zhu, stating that he is “fearlessly ignorant” about the business opportunities that ChatGPT can provide. Read SCMP for more.

📖 China Telecom has released its LLM called TeleChat. This model can generate code, draft speeches, among other things.

🤖 ByteDance is reportedly building robots with a team of around 50 members, planning to expand to hundreds by the end of the year. The robots are designed to meet the company's e-commerce fulfillment needs, capable of sorting and packaging goods in warehouses.

🔍 Bilibili is beta testing a “Search AI Assistant”. By inputting a question or a “?” in the search bar, users can experience the feature, which provides generated answers to user queries and adds related reference videos in the response.

🧑🏻‍⚖️ A team from Peking University has released ChatLaw, providing basic legal services to the public by generating legal documents.

🦜 The Institute of Computing Technology at the Chinese Academy of Sciences has developed an AI model named “Bai Ling”, with claimed performance comparable to GPT-3.5-turbo. The model has been deployed online at the Nanjing High-Speed Rail Academy.

🧠 Beijing-based AI startup “01.AI”, founded by renowned AI expert Kai-Fu Lee, has officially launched. The company is making strides in the development of AI 2.0 platforms and applications, already testing a model with tens of billions of parameters.

Planning-oriented Autonomous Driving (Best Paper Award at CVPR 2023)

Affiliations: OpenDriveLab and OpenGVLab, Shanghai AI Laboratory; Wuhan University; SenseTime Research
Modern autonomous driving systems use sequential modular tasks: perception, prediction, and planning. Current methods, either standalone or multi-task models, can face cumulative errors or insufficient coordination. The proposed Unified Autonomous Driving (UniAD) framework integrates all driving tasks into one network, prioritizing tasks to support planning. It uses unified query interfaces for task communication and offers global perspective agent interaction. Tested on the nuScenes benchmark, UniAD outperforms previous models, validating its philosophy. Code and models are publicly available.

M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

Affiliations: DAMO Academy, Alibaba Group
M3Exam, a new benchmark sourced from real human exam questions, has been introduced to evaluate large language models (LLMs). It tests LLMs in a multilingual, multimodal, and multilevel context with 12,317 questions in nine languages across three education levels. Initial tests reveal that current models, including GPT-4, struggle with multilingual text, especially in low-resource, non-Latin languages, and complex multimodal questions. The M3Exam data and evaluation code are available online.

Training Transformers with 4-bit Integers

Affiliations: Tsinghua University
The authors propose a method for training transformers using 4-bit INT4 arithmetic, focusing on the specific structures of activation and gradients. By introducing dedicated quantizers and techniques to accurately quantize gradients, the method achieves competitive accuracy across tasks such as natural language understanding, machine translation, and image classification. The approach, up to 2.2 times faster than FP16 counterparts, can be implemented on current GPUs. The code is available online.

Recode China AI

Discussion about this post