🥸 Alibaba AI Turns Audio to Talking Head, Baidu‘s ERNIE Expected to Generate Billions in 2024, State Media Launches AI Dataset, and Meet Open Sora

Weekly China AI News from February 20, 2024 to March 3, 2024

Mar 04, 2024

Hi everyone, I am back in Beijing and will stay for three weeks! In this issue, I discussed Alibaba’s latest auto-to-video AI model EMO, which went viral on social media. Baidu last week released its quarterly financial results, and the CEO predicted ERNIE will drive several billions of revenue in 2024. State-backed media People’s Daily Online released a dataset that claims to help train Sora-like AI models. Plus, a group of researchers from Peking University plan to reproduce Sora.,

Alibaba Introduces Auto-to-Video Model EMO

What’s New: Researchers from Alibaba introduced a new AI model called EMO that can transform audio, such as speech or song, into a talking head video. With just a single reference image and audio clip, EMO can create videos featuring vivid facial expressions and varied head movements that match the audio’s length. The longest demo video reached 1 minute and 49 seconds.

How It Works: EMO operates in two primary stages: Frames Encoding and Diffusion Process.

ReferenceNet extracts features from the reference image and motion frames.
A pretrained audio encoder processes the vocal audio, guiding the generation of facial imagery through a facial region mask combined with multi-frame noise.
The Backbone Network, employing Reference-Attention and Audio-Attention mechanisms, ensures identity preservation and audio-synchronized movements.
Temporal Modules fine-tune the motion’s speed, enabling the creation of expressive, lifelike avatar videos.
The team collected over 250 hours of video and 15 million images into a vast and diverse audio-video dataset.

EMO is set to be open-sourced; however, its GitHub repository is currently empty.

Why It Matters: EMO will open new doors for content creation. It can handle rapid lyrical or speech rhythms to ensure synchronization between audio and animation. Beyond music and speech, EMO can animate historical portraits, paintings, 3D models, and AI-generated content. This innovation could also pave the way for movie characters to deliver monologues or performances in multiple languages and styles.

Like AnimateAnyone, Alibaba’s image-to-dance-video model, EMO will likely be added to Alibaba’s chatbot Tongyi Qianwen soon.

Baidu‘s ERNIE Predicted to Drive Billions in Revenue in 2024

百度正式发布大语言模型“文心一言” 李彦宏：将影响到每一家企业| 封面天天见_手机新浪网

What’s New: Baidu CEO, Robin Li, forecasts a multi-billion RMB revenue surge from their LLM ERNIE in 2024. This was highlighted in Baidu’s latest quarterly earnings report.

Last year, Baidu saw $18.96 billion in revenue and a non-GAAP net profit of $4.05 billion, marking a 39% year-over-year increase. The company’s quarterly revenue rose by 6% to $4.92 billion, with non-GAAP net profits soaring 44% year-over-year to $1.09 billion.

How it Works: In the final quarter, Baidu’s incremental revenue gains from ERNIE and ERNIE Bot stemmed primarily from advancements in advertising and cloud services for enterprise clients:

Ad revenues experienced a boost of several hundred million RMB, thanks to improvements in bidding and targeting technologies. Additionally, ERNIE has been used to automate the creation of advertising content, including posters and videos.
The cloud services saw over 26,000 monthly ERNIE’s enterprise users, including major firms like Samsung China and Honor, marking a 150% quarter-over-quarter increase.
Baidu AI Cloud’s revenue hit 8.4 billion RMB in Q4, with generative AI and foundation models contributing 656 million RMB.
ERNIE’s daily API calls jumped to 50 million in December, a 190% increase from the previous quarter.

Why It Matters: Baidu’s latest financial outcomes signal that investments in generative AI are starting to pay off. Li emphasized, “Looking ahead, our commitment to Gen-AI and foundation models remains unwavering, paving the way for the gradual creation of a new growth engine.”

However, competition from other AI giants like Alibaba and ByteDance remains intense. Starting last Thursday, Alibaba slashed prices on numerous internet-based services, offering cuts of up to 55%, with an average discount of 20%. These reductions apply to over 100 products, encompassing data storage and elastic computing solutions for online processing power.

Chinese State Media Unveils World’s Largest Chinese Corpus to Power Sora-like AI

What’s New: On February 20, People’s Data, a subsidiary of People’s Daily Online (人民网), announced a semantic corpus of nearly 300 million pieces of data, including news and Q&A, for Sora-like AI applications. People’s Daily Online’s stock soared by 10%.

How It Works: According to the press release, the corpus is designed for AI models, AGI, and smart internet applications. To address critical, sensitive, challenging, and complex questions that AI could struggle with, the dataset encompasses over 10,000 key issues. This corpus is set to make information retrieval more accessible, lowering the barrier for everyday users to obtain comprehensive information.

The dataset was first unveiled in October 2023. It’s unclear how the dataset can be used to help train text-to-video models like Sora.

Why It Matters: The initiative mirrors the practices in the US, where media outlets sell training data to AI companies like OpenAI for profit. As a state-backed media, People’s Daily Online also has an advantage in ensuring content compliance and safety.

Weekly News Roundup

An Alibaba researcher’s X post provides insight into the development of LLMs at the company, highlighting efforts to compete with ChatGPT. (TechCrunch)
Honor globally launched its Magic 6 Pro smartphone, featuring an experimental eye-tracking AI for car control by looking at the phone screen. (Reuters)
Oppo’s Air Glass 3, appearing as ordinary glasses, connects to smartphones for access to AndesGPT and offers music playback, information display, and voice calls. (The Verge)
A Chinese court ruled that AI-generated images violated the copyright of a Japanese superhero, marking a precedent in copyright infringement cases. (Semafor)
China’s State TV debuted Qianqiu Shisong, the first AI-developed cartoon series, amidst growing excitement for text-to-video technology. (SCMP)

Trending Research

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Open-Sora plan, aiming to reproduce OpenAI’s video generation model

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Recode China AI

Discussion about this post