Alibaba Launches "Model as a Service"; IDEA Open Sources Chinese Stable Diffusion; Pony.ai Delivers Trucks Despite Layoff Rumors
Weekly China AI News from Oct.31 to Nov.6
Dear readers, this week, we will shed light on Alibaba’s new AI model as a service platform named ModelScope. Shenzhen-based research institute IDEA open sources Chinese Stable Diffusion trained on Chinese datasets. Plus, Chinese autonomous driving upstart Pony.ai delivers 30 autonomous trucks while reportedly pruning its workforce in mapping and infrastructure.
Weekly News Roundup
Alibaba Cloud Launches ModelScope with 300 Open-Source Models
What’s new: Alibaba Cloud last week introduced an open-source AI platform and community named ModelScope. Branded as a “Model as a Service” platform aiming to lower the threshold of AI deployment, ModelScope has released over 300 off-the-shelf models for free public access, including over 100 Chinese AI models and 150 state-of-the-art models.
You can find Alibaba’s text-to-image, Tongyi and cross-modal model OFA on ModelScope. Also made available on the platform are Chinese SOTA models from third-party institutes, including Chinese CLIP and Uni-Fold, for developing protein folding models beyond AlphaFold.
What is “Model-as-a-Service”: Alike “infrastructure as a service” or “software as a service,” MaaS refers to a way of delivering models - machine learning models under this context - as a service based on the Internet. Without building a model from scratch, developers and researchers can access and use the models on ModelScope for free, develop customized AI applications, and run the models either backed by Alibaba Cloud or a premise set. Think of ModelScope as a Chinese version of HuggingFace, a global-wide popular open-source AI community that offers free models and datasets.
Why Alibaba bets on MaaS: The boom of large pre-trained models, such as GPT-3 in generative texts and Diffusion models in text-to-image creations, package dozens or even hundreds of billions of parameters. These models are backed by a massive amount of infrastructure investments, leaving universities and small-and-medium enterprises with limited capacity. ModelScope aims to narrow the gap.
ModelScope is not the only Chinese platform that offers pre-trained models. Baidu’s PaddlePaddle and Huawei’s MindSpore are the top two Chinese deep learning platforms that provide foundational machine learning toolkits and models.
ModelScope is available at: http://www.modelscope.cn/.
Shenzhen Institute Led by Ex-Microsoft AI Chief Introduces Chinese Stable Diffusion
What’s new: IDEA, a Shenzhen-based research institute founded by Harry Shum, former Microsoft executive and head of AI, released the first open-sourced Chinese Stable Diffusion. Unlike existing Chinese Stable Diffusion models built upon Translation APIs and English-based Stable Diffusion, IDEA researchers retrained the Stable Diffusion model on Chinese datasets to better capture Chinese concepts.
We use Noah-Wukong(100M) 和 Zero(23M) as our dataset, and take the image and text pairs with CLIP Score (based on IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese) greater than 0.2 as our Training set. We use IDEA-CCNL/Taiyi-CLIP-RoBERTa-102M-ViT-L-Chinese as our init text encoder. To keep the powerful generative capability of stable diffusion and align Chinese concepts with the images, We only train the text encoder and freeze other part of the stable-diffusion-v1-4(paper) model. It takes 100 hours to train this model based on 32 x A100. This model is a preliminary version and we will update this model continuously and open source.
Below is a direct comparison of AI-created paintings between Chinese Stable Diffusion (left) and Chinese-translated Stable Diffusion (right). For example, theFor example, the Chinese Stable Diffusion can recognize Luosifen, a smelly snail rice noodle widely popular in China, and create corresponding pictures while the Chinese-translated Stable Diffusion output a Chinese-style garden instead.
Implementation in Chinese: Chinese Stable Diffusion is IDEA’s latest Chinese implementation of SOTA models. Earlier this year, IDEA released Chinese CLIP and Chinese Disco Diffusion.
Pony.ai Delivers Autonomous Trucks Despite Layoff Rumors
What’s new: Chinese autonomous vehicle (AV) upstart Pony.ai has revealed 30 autonomous heavy trucks in partnership with SANY Heavy Truck. The trucks have been delivered to a joint venture established by Pony.ai and Sinotrans, a state-owned logistics giant. With 20 sensors and Pony.ai’s 3rd-generation autonomous truck software, Pony.ai trucks can detect an obstacle vehicle 30 seconds ahead at a driving speed of 90 km/h and perceive surrounding objects within 200 meters.
Bad news: Local Chinese media reported that Pony.ai is downsizing its Infrastructure & Data department. Its Shanghai-based Data department has been dismissed. Layoffs will also extend to its maps and other departments.
At the same time, Kelvin, the head of the infrastructure and data department at Pony.ai’s California R&D center, and Feng Yi, the US head of the map, have also left.
In a statement to the media outlet, Pony.ai said it is adjusting its business structure. “The company's financial situation is good, and its business is operating normally.”
L4 malaise: The recent shutdown of Ford-backed robotaxi startup Argo AI and the leadership reshuffle of TuSimple have shocked the U.S. self-driving tech industry, adding uncertainties to the future of L4 autonomy, which indicate no human intervention in the driving. Given a macroeconomic slowdown, investors favor businesses that can bring positive cash flow.
It’s still unknow how the aftermath will affect Chinese robotaxi companies. WeRide.ai, a Guangzhou-based AV startup, seems unfazed as CEO Tony Han said on social media that the company has enough cash to live for another 6-7 years without generating any revenue.
Trending Research
Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems
Researchers from Sun Yat-sun University and Dark Matter AI proposed a simple but efficient method called Universal Expression Tree (UET) to make the first attempt to represent the equations of various MWPs uniformly. Experimental results on several MWPs datasets show that the model can solve universal types of MWPs and outperforms several state-of-the-art models.
UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance
Researchers from Baidu proposed UPainting to unify simple and complex scene image generation. Based on architecture improvements and diverse guidance schedules, UPainting effectively integrates cross-modal guidance from a pretrained image-text matching model into a text conditional diffusion model that utilizes a pretrained Transformer language model as the text encoder. UPainting greatly outperforms other models in terms of caption similarity and image fidelity in both simple and complex scenes.
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding
Researchers from Tencent introduced a new question answering dataset based on image, named ChiQA. It contains the real-world queries issued by internet users, combined with several related open-domain images. ChiQA contains more than 40K questions and more than 200K question-images pairs. Data analysis shows ChiQA requires a deep understanding of both language and vision, including grounding, comparisons, and reading.