š§Kai-Fu Lee's LLM: A LlaMA Lookalike, China's LLM Overload, and US-China AI Risk Talk Begins
Weekly China AI News from November 13 to November 19
Hi readers, what a wild weekend itās been in the AI worldā¦The latest from OpenAI could really shake things up in the industry (I have great admiration for Sam Altman and wish him the best in his future endeavors).
In this issue, I delve into Kai-Fu Leeās newly unveiled Yi-34B model, which mirrors the renowned LLaMA. Baidu CEO Robin Li expressed concerns about whether LLMs are becoming oversaturated in China. Big news on the international front as the US and China kick off discussions on AI risks and safety. And there's more in research: A watermelon talking head, an agent for Minecraft, and text-to-hyperrealistic human images.
01.AIās New LLM Sparks Debate Over LLaMA Use and Open-Source Norms
Whatās New: Chinese developers and media recently revealed that Yi-34B, a LLM from Kai-Fu Leeās new AI startup 01.AI, closely adopts the architecture of Metaās LLaMA 2 model, with only minor modifications like renaming two tensors. This discovery sparked wide discussions within Chinaās AI community about the modelās originality and adherence to open-source licensing norms.
Tell Me More: Yi-34B, part of the Yi series including Yi-6B and Yi-34B, was developed and open-sourced by 01.AI, backed by Alibaba Cloud and valued at $1 billion. This model impresses with a context window size of 200k, enabling it to process up to 400,000 Chinese characters. Upon release, Yi-34B topped the Hugging Face open-source model leaderboard and C-Eval benchmarks, surpassing other open-source models like LLaMA2 and Falcon.
However, a revelation on HuggingFace by a developer stated two weeks ago that Yi-34B uses exactly the LLaMA architecture, with just two tensors renamed. This didn't attract much attention until Jia Yangqing, a notable AI scientist and founder of Lepton AI, hinted at the issue on social media last week. Chinese media began reporting on it, which prompted a response from 01.AI. The companyās open-source director also replied on HuggingFace that they would rename these tensors from Yi to LlaMA.
GPT is a well-recognized and mature architecture in the industry, and LLaMA is a summary built upon GPT. The structural design of the large model developed by 01.AI is based on the mature GPT structure, drawing from the top-level public achievements in the field. Meanwhile, it incorporates extensive work based on 01.AI teamās understanding of modeling and training, forming one of the foundations for our first release that achieved excellent results. Concurrently, 01.AI is also continuously exploring fundamental breakthroughs at the structural level of the model.
Is 01.AI at Fault? Building new LLMs based on the LLaMA architecture is common practice. Many high-profile Chinese LLMs, such as Baichuan and Qwen, are also based on LLaMA, which is now regarded as the āAndroidā of LLMs. The original developer who pointed out the similarities acknowledged that using LLaMA as a foundation is nothing wrong.
However, concerns have still arisen in the AI community and media regarding 01.AIās approach:
01.AI should have acknowledged LLaMA in their work to follow open-source norms and adhere to LLaMA's licensing agreement.
Multiple developers discovered that Yi models could be overly optimized for benchmark datasets to achieve higher scores.
Doubts linger about the reproducibility of Yiās long context window capabilities.
China Has 238 LLMs. Time to Develop AI Applications, Baidu CEO Says
Whatās New: China is facing a unique challenge in the AI sector: the country has developed an impressive count of 238 LLMs, a threefold increase in the past four months. However, thereās a lack of AI-native applications emerging from these models. Baidu CEO Robin Li highlighted this issue at a Shenzhen event last Wednesday expressing concern over the excessive focus on model development.
Why It Matters: The situation indicates a potential misallocation of resources, where energy and investment in AI are not yielding tangible results in user-centric applications. Li emphasizes the need for a shift in the AI sector, stating āIn the AI-native era, we require millions of AI-native applications, not just a hundred foundation models.ā
Li further noted the mobile era gave rise to āmobile-nativeā applications such as TikTok, WeChat, Uber, and Didi. āIn the current AI era, we have yet to see similarly groundbreaking AI-native applications.ā
According to Li, most LLMs, particularly specialized ones, lack emergent intelligence - the ability to effectively learn and generate insights from untaught data. Only the best and most powerful foundation models have the potential to drive the development of AI-native applications.
U.S. and China Initiate Dialogue on AI Risks at APEC 2023
Whatās New: In a significant development at the Asia-Pacific Economic Cooperation (APEC) Summit 2023 last week, the United States and China have agreed to establish a bilateral dialogue focusing on the risks and safety concerns associated with AI. This move marks a critical step towards international cooperation in managing the evolving challenges posed by AI.
How It Works: According to the White House, āthe leaders affirmed the need to address the risks of advanced AI systems and improve AI safety through U.S.-China government talks.ā Chinese Foreign Minister Wang Yi also confirmed it following the summit.
Business leaders expressed optimism over U.S.-Chinaās AI cooperation.
Alphabet CEO Sundar Pichai said āI saw encouraging announcements even yesterday for the U.S. and China to start having a dialogue on AI. There is no way you make progress over the long term without China and the U.S. deeply talking to each other on something like AI.ā
OpenAI CEO Sam Altman said in an exclusive interview with China Entrepreneur that he believes China will excel in AI and will be an important part of the entire human journey of AI exploration.
In the end, I wanted to quote Matt Sheehan, a fellow at the Carnegie Endowment for International Peace.
The world is at a key moment in AI governanceāa time when scientists and policymakers in both countries are still trying to figure out how best to approach regulating this technology. Itās during these times of policy plasticity that dialogueādone strategically and with realistic goalsācan have the greatest impact on shaping the safe and productive deployment of AI around the world.
Weekly News Roundup
āļø Alibaba Group announced that it would not spin off its Cloud Intelligence Group due to various uncertainties, including recent US restrictions on advanced computing chip exports. Alibaba believes that the full spin-off may not enhance shareholder value as initially planned and will focus on establishing a sustainable growth model for the Cloud Intelligence Group amidst uncertain conditions.
š Didi Chuxing has recently formed an LLM team led by Chai Hua, head of Didi Maps and Public Transport Division, and rotating chair of the Algorithm Committee. The team aims to enhance travel and itinerary planning efficiency using AI technology.
š± OPPO has launched its self-trained AndesGPT large model, featuring dialogue augmentation, personalized experience, and cloud-edge coordination. AndesGPT focuses on knowledge, memory, tools, and creativity.
šš± Great Wall Motors and Douyin Group have entered into a strategic partnership, collaborating on big data, enterprise LLM applications, cloud infrastructure, digital marketing, intelligent cockpits, and autonomous driving. They aim to explore building enterprise knowledge bases, developing LLM-based office applications, and innovative business models with an internet mindset.
š£ļø NetEase Youdao introduces EmotiVoice, an open-source Text-to-Speech (TTS) engine available on GitHub. It supports emotional voice synthesis in Chinese and English with over 2000 voice tones, enabling emotion-rich audio creation.
Trending Research
ChatAnything enables the creation of unique LLM-based characters with diverse appearances and personalities from text descriptions. It blends text-to-speech and image generation techniques, but faces challenges with face recognition in generated characters. Incorporating pixel-level guidance enhances face landmark detection, allowing for effective animation of these anthropomorphized personas. Read the paper ChatAnything: Facetime Chat with LLM-Enhanced Personas.
HyperHuman, a new framework for generating hyper-realistic human images, tackles existing model limitations by structuring images across multiple levels. It uses a comprehensive dataset, HumanVerse, and a Latent Structural Diffusion Model, which jointly learns image appearance, spatial relationships, and geometry. A Structure-Guided Refiner enhances resolution and detail, outperforming current text-to-image models. Read the paper HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion.
JARVIS-1 is an advanced agent for the Minecraft universe that blends multimodal inputs and language models for planning and control in an open-world setting. It utilizes a unique multimodal memory for learning and adaptation, significantly outperforming predecessors in various tasks and showing potential for self-improvement and more general intelligence. Read the paper JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models.