ð€¥Stanford Students Plagiarize Chinese AI Projects, Kuaishou's Response to Sora, and Alibaba Claims Qwen2 Surpasses Llama 3
Weekly China AI News from June 3 2024 to June 9, 2024
Hi, this is Tony! A friendly heads-up: This issue contains many videos, thanks to Kuaishouâs video generator Kling ð€.
Three things to know
A Stanford University team apologized for plagiarizing from an open-source project created by Chinese researchers.
ByteDanceâs rival Kuaishou released its new video generator Kling that can create up to 2-minute videos in 1080p quality.
Alibaba Cloud announced its latest LLM Qwen2 last Friday and said the model topped the rankings for open-sourced LLMs.
Stanford Team Admits to Plagiarizing Chinese AI Model and Apologizes
Whatâs New: A Stanford University team admitted their latest AI model is indeed plagiarism from an open-source project created by Tsinghua University researchers, followed by an apology.
How it Works:
On May 29, a team of three Stanford University computer science students introduced Llama3-V, claiming it to be a powerful multimodal model comparable to GPT-4-V, yet costing only $500 for training. It quickly gained traction and became one of the top 5 projects on the model hosting platform Hugging Face.
However, in the days following its release, Llama3-V faced accusations of stealing substantial parts of its model from MiniCPM-Llama3-V 2.5, an open-source multimodal model developed by Tsinghua Universityâs NLP lab and Chinese AI firm ModelBest (é¢å£æºèœ). Released on May 20, MiniCPM-Llama3-V 2.5 is tailored for edge devices with only 8 billion parameters. The model performs incredibly well in understanding images and converting images of text into machine-readable text.
A developer quickly noticed that both models had âexactly the same model structures and codeâ.
The authors of MiniCPM-Llama3-V 2.5 launched an investigation and confirmed the plagiarism. Ironically, one piece of evidence showed that Llama3-V behaved similarly in recognizing unique Chinese ancient characters, known as Tsinghua Bamboo Characters, which was an unrevealed experimental feature of MiniCPM-Llama3-V 2.5.
In response to the controversy, one of the Llama3-V authors deleted their GitHub and HuggingFace repositories and set the model to private, citing compatibility issues and the need for fixes. The other two authors issued an apology for their insufficient due diligence in verifying the model's originality.
On June 3, ModelBest CEO Li Dahai and co-founder Liu Zhiyuan issued statements. âWe deeply regret this situation. On one hand, it is a form of recognition from international teams, but on the other hand, we call for an open, collaborative, and trustworthy community environment. We hope the teamâs good work is acknowledged and recognized, but not in this manner.â
Why It Matters: The incident itself is not unusual. Plagiarism, a serious academic misconduct, is still common even at elite U.S. universities.
The story garnered huge attention in China. The headline âStanford University Stealing from Chinese AI Projectsâ ignited a sense of national pride.
I wrote this in part because there is a lack of attention and recognition for open-source AI projects from China-based companies and institutions. ModelBest is a prime example. MiniCPM is an outstanding open-source project not only because of its performance but also because it releases all model weights, checkpoints (the parameters that have been updated during the training), and most public training data, which makes it more open than many other LLMs including Llama 2. Other open models, such as Alibabaâs Qwen series and 01.AIâs Yi series, also significantly contribute to the development of LLMs.
Kuaishou Introduces Sora-Like Video Generator Kling to Overshadow Douyin
What's New: Kuaishou, Chinaâs second-largest short video maker, released its new video generator Kling that can create up to 2-minute videos in 1080p quality. Kling is now available for invited testing with a Chinese cell through Kuaishouâs Kwai app, which makes it more production-ready than OpenAIâs Sora.
How it Works: According to Kuaishou, Kling uses a so-called â3D spatiotemporal joint attentionâ to model complex movements, producing realistic and fluid video sequences. The model can simulate real-world physics, such as light reflections and fluid dynamics. The model generates high-resolution videos up to 1080p, with lengths up to 2 minutes at 30fps, supporting various aspect ratios. It also showcases strong concept combinations and imagination, turning user inputs into detailed visual stories.
Klingâs architecture is based on the Diffusion Transformer, similar to Sora, but the model utilizes different inputs. While Soraâs Transformer is fed with "spacetime patches of video and image latent codes," Keling is trained on text-to-video understanding, likely using text captions paired with video content.
Early testers also noted that Kling may be trained on Kuaishou's video data, as the app has been the video platform of choice for China's rural population.
Below are direct comparisons between Kling and Sora using similar prompts (credit to @Guizang). While Kling-generated videos are of lower quality than Soraâs, it's important to note that Klingâs outputs are not cherry-picked. As an average user-generated piece, they look impressive to me.
Why It Matters: Kuaishou's investment in AI video generation opens up extensive creative possibilities for content creators, as Kling also offers new AI-generated video features like âAI Dance Kingâ and âAI Singing and Dancing.â This move can position Kuaishou at the forefront of AI innovation and challenge its biggest rival, ByteDanceâs Douyin.
Alibaba Says New Qwen2 Beat Llama 3 on All Benchmarks
What's New: Another open LLM from Chinese companies that made a splash: Alibaba Cloud announced its latest LLM Qwen2 last Friday and said the model topped the rankings for open-sourced LLMs.
The Qwen2 series includes base and instruction-tuned models, ranging from 0.5 to 72 billion parameters, and an MoE model. The Qwen2 models are available for both commercial and research purposes.
How it Works: The Qwen2-72B model can handle context lengths up to 128K tokens. Training included 27 languages in addition to Chinese and English. The models also use Group Query Attention to optimize computational efficiency and performance, leading to increased speed and reduced memory usage.
Alibaba said Qwen2-72B outperforms other open-source models including Llama 3 in 15 benchmarks, including language understanding, generation, multilingual capability, coding, mathematics, and reasoning.
However, on the LMSYS Chatbot Arena leaderboard, a well-recognized LLM evaluation judged by humans, Qwen2-72B only ranked 15th. Nine of the top LLMs are close-sourced models.
Why It Matters: Qwen2 models also demonstrate great alignment with human values. They scored highly on benchmarks like MT-bench, which evaluates multi-turn conversational and instruction-following ability, which is crucial for human preference.
By incorporating human feedback, Qwen2 models achieve good performance in safety and responsibility. The models can handle multilingual unsafe queries related to illegal activities.
Weekly News Roundup
Despite U.S. government restrictions on selling advanced AI chips to China, Chinese firms like ByteDance and China Telecom are exploiting loopholes by renting Nvidiaâs chips from U.S. cloud providers such as Oracle, and others like Alibaba and Tencent are exploring similar arrangements. (The Information)
Alibaba-backed AI startup Zhipu AI slashed the cost of its newly released GLM-4-Air model to 0.1 yuan per 1 million tokens, significantly undercutting the industry average. (SCMP)
In response to U.S. chip restrictions, some Chinese AI chip companies are designing less powerful processors to maintain access to TSMC production, according to sources. Two top firms, MetaX and Enflame, submitted downgraded chip designs to TSMC in late 2023. (Reuters)
Feel the AGI
One of this yearâs Gaokao (Chinaâs college entrance exam) essay topics: With the widespread adoption of the internet and the application of AI, more and more questions can be answered quickly. So, will we have fewer and fewer questions? ð€ð€ð€
Trending Research
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Researchers from USTC, CUHK, PKU, and Shanghai AI Lab introduced ShareGPT4Video to improve video understanding and generation by creating high-quality video captions and models. ShareGPT4Video includes a dataset of 40,000 high-quality video captions generated using GPT-4V, a video captioning model, and a new large video-language model (LVLM) that achieves SOTA performance on multiple video benchmarks. The key innovation is the Differential Sliding-Window Captioning (DiffSW) strategy, which generates detailed captions by focusing on changes between consecutive frames. This method ensures precise temporal descriptions and scalability for videos of varying lengths.
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Researchers from Tencent Hunyuan, Peking University, and the University of Macau investigated the relationship between optimal learning rates and batch sizes for Adam-style optimizers. The authors introduced a novel scaling law for Adam-style optimizers that captures a âsurge phenomenonâ in the optimal learning rate as batch size increases. Unlike the monotonically increasing relationship seen with SGD optimizers, the optimal learning rate for Adam optimizers initially rises and then falls, forming a peak (surge) before stabilizing.