😯Microsoft Cuts Chinese Developer Access to OpenAI, Chinese LLM 'Beats' GPT-4o, and Toyota-Backed Robotaxi Targets Nasdaq
Weekly China AI News from October 14, 2024 to October 20, 2024
Hi, this is Tony! Welcome to this week’s issue of Recode China AI, a newsletter for China’s trending AI news and papers.
Three things to know
Microsoft Azure OpenAI services will end for individual developers in China starting October 21, 2024.
Kai-Fu Lee’s new flagship model surpasses OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet on LLM benchmarks. Is it really good?
Chinese robotaxi startup Pony.ai files for US listing.
Microsoft Azure OpenAI Services to Terminate for Individual Developers in China
What’s New: Microsoft Azure OpenAI service, where Microsoft provides access to OpenAI’s advanced AI models through public cloud, will terminate for individual developers in China starting October 21, 2024, according to a recent email announcement. This change is due to local regulatory requirements that restrict Azure OpenAI services in mainland China to corporate clients only.
Previously, Microsoft Azure OpenAI service was the only compliant way to access OpenAI’s models in China, although it required proper corporate qualifications. Despite these restrictions, some individual developers still managed to access OpenAI’s APIs, primarily to analyze datasets and compare outcomes.
No China: In June of this year, OpenAI warned developers in China via email to limit their API usage. OpenAI cited unauthorized traffic from unsupported regions, and threatened to block API requests from these locations starting July 9. The ChatGPT maker has always listed mainland China and Hong Kong as restricted regions for their API services. (Read my previous post below.)
However, some developers reported that OpenAI took no follow-up action after the warning, and certain accounts remained active.
Why It Matters: The termination of Azure OpenAI service underscores the complexities of accessing western AI tools in China due to regulatory barriers. Individual developers and startups are caught in a difficult position where OpenAI is not providing access, and Chinese regulations are also restricting the use of OpenAI models. They must now consider switching to local LLMs, such as those offered by Baidu or Alibaba, which are becoming increasingly viable alternatives due to rapid advancements in their capabilities and better local support.
Is 01.AI’s New Model Really Better Than GPT-4o?
What’s New: Chinese AI startup 01.AI, founded by renowned computer scientist Kai-Fu Lee, released its new flagship model, Yi-Lightning, on October 16. On the LMSYS leaderboard, known as the global gold standard for evaluating LLMs, Yi-Lightning surprisingly surpassed OpenAI’s GPT-4o (the 2024-05-13 version) and Anthropic’s Claude 3.5 Sonnet, ranking sixth globally and first among Chinese models.
The training of Yi-Lightning was completed using only 2,000 GPUs over a period of one and a half months, costing just over $3 million, Lee bragged. A light version of Yi-Lightning, Yi-Lightning-Lite, ranked 10th globally and is priced at 0.99 yuan ($0.14) per million tokens.
Focus on Enterprise: 01.AI has decided to pivot its focus within the Chinese market from consumer-facing products to enterprise solutions, as the domestic market for consumer applications has proven too competitive and difficult to monetize. For instance, Wanzhi, 01.AI’s ChatGPT-like chatbot, received only 100,000 visits in September, significantly lagging behind Baidu’s ERNIE Bot and Moonshot’s Kimi, which both surpassed 20 million visits, according to SimilarWeb.
On the same day as Yi-Lightning’s release, 01.AI launched a digital human solution aimed at retail and e-commerce sectors.
Despite this pivot, 01.AI continues to focus on consumer applications in the global market, including its PopAI chatbot, the overseas version of Wanzhi, and Monoland, a character-playing app. The company is also testing an AI search app, named BeaGo.
Is Yi-Lightning Better: Yi-Lightning’s surpassing GPT-4o and ranking highly on the LMSYS leaderboard demonstrates Chinese companies’ growing capability in developing competitive LLMs. Lower inference costs and improved generation speed also make Yi-Lightning a compelling choice for developers seeking efficient and affordable models.
However, this also raises concerns that companies might over-optimize their models to perform well in evaluation benchmarks, potentially compromising their general applicability and robustness. I haven’t tested Yi-Lightning yet, but I find it hard to believe that an LLM ranking so highly on the leaderboard is still relatively unknown in both the Chinese and U.S. markets.
Chinese Autonomous Driving Company Pony.ai Files For US Listing
What’s New: Chinese autonomous driving company Pony.ai filed for an initial public offering (IPO) on the Nasdaq on October 18, with the stock symbol “PONY.” The company plans to issue up to 98.15 million shares, aiming to raise over $300 million. Its IPO is expected to be one of the largest Chinese listings in the U.S. this year.
The IPO comes after Tesla’s recent cybercab reveal, which has re-ignited the Wall Street’s interest in autonomous ride-hailing services. However, Pony.ai might face challenges due to its lackluster revenue growth and recent U.S. regulations on Chinese connected vehicles, specifically targeting data privacy and national security concerns related to foreign ownership of autonomous driving systems. WeRide, another Chinese autonomous driving company, has postponed its IPO that was seeking to raise around $110 million.
Update: Pony.ai has revised its minimum valuation to $4.0 billion, just half of the $8.5 billion valuation estimated after its 2022 fundraising.
How It Works: Founded in 2016 out of Silicon Valley, Pony.ai has since emerged as a leading player in the Level 4 autonomous driving, meaning vehicles can operate autonomously without human supervision in a geofenced area.
To date, Pony.ai’s revenue has reached 1.2 billion RMB ($168 million). Between 2022 and 2023, Pony.ai reported revenues of $68.39 million and $71.9 million respectively. For the first half of 2024, the company logged $24.72 million in revenue, up over 100% compared to 2023.
Its core business spans three major areas: autonomous ride-hailing services (robotaxi), autonomous trucking services (robotruck), and technology licensing and applications. Among these, its autonomous freight service has been the largest revenue generator.
The company currently holds $473 million in cash and cash equivalents, which should be sufficient to sustain operations for the next five years. Globally, Pony.ai’s total autonomous driving mileage stands at nearly 40 million kilometers, including 4 million kilometers driven fully driverless.
Pony.ai’s founders, James Peng and Lou Tiancheng, were core team members of Baidu’s autonomous driving division. Both founders hold degrees from Tsinghua University and Stanford University.
Why It Matters: The timing of Pony.ai’s IPO aligns with a broader industry push towards commercialization in the autonomous driving sector. With a fleet of 190 robotrucks, 250 robotaxis and 220,000 registered users, the company has made substantial progress in providing driverless ride-hailing services in key Chinese cities, including Beijing, Guangzhou, and Shenzhen. Pony.ai also partners with Toyota China and GAC Toyota on a $1.5 billion joint venture aims to bring fully autonomous taxis to market.
Pony.ai projects that it will reach break-even for single vehicle operation by 2025, with full-scale commercialization expected by 2026.
Weekly News Roundup
In an interview with Harvard Business Review, Baidu CEO Robin Li likened the current AI industry’s boom to the dot-com bubble, forecasting that only a select 1% of AI companies will thrive after the hype. Li also said that hallucinations produced by LLMs have been “pretty much” solved. (The Information)
ByteDance introduced its first wireless earbuds, Ola Friend, exclusively in China. Priced at approximately $170, these open-ear earbuds integrate with ByteDance’s generative AI assistant, Doubao, allowing users to interact via voice commands without needing to access their smartphones. (TechNode)
Moonshot AI has upgraded its Kimi chatbot to enhance its problem-solving capabilities, aiming to match the advancements seen in OpenAI’s latest o1 LLM. The updated Kimi Chat Explore is now touted as being able to “think and reflect” in response to user queries and also has an expanded online search capacity (SCMP).
Lenovo announced collaborations with Meta and NVIDIA to enhance its AI offerings globally. The partnership with Meta includes integrating the Llama 3.1 into Lenovo’s AI Now personal assistant for PCs. Lenovo’s collaboration with NVIDIA has led to the development of the Hybrid AI Advantage platform, designed to accelerate AI innovation across various industries. (SCMP)
Alibaba introduced an AI-powered translation tool, named Marco, that it claims surpasses the capabilities of Google and ChatGPT. This tool is designed to provide more accurate and contextually relevant translations. (CNBC)
Trending Research
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Recent advancements in latent diffusion models have enhanced portrait image animation. Building on this, Hallo2 introduces capabilities for generating 4K resolution videos up to an hour long. It employs techniques like patch-drop augmentation with Gaussian noise to maintain visual consistency over extended durations and incorporates semantic textual labels for improved expression control.
MixCon: A Hybrid Architecture for Efficient and Adaptive Sequence Modeling
MixCon is a hybrid architecture for efficient sequence modeling, integrating Transformers, Conba, and Mixture of Experts (MoE). It excels at capturing long-range dependencies with high throughput, low memory usage, and robust adaptability. It achieves 4.5x higher throughput than Mixtral and 1.5x than Jamba on long sequences and surpasses benchmarks in natural language processing tasks.
Baichuan-Omni Technical Report
Baichuan-Omni is an open-source 7B Multimodal LLM that integrates text, images, video, and audio. It employs a two-stage training process with multimodal alignment and fine-tuning. Baichuan-Omni achieves SOTA results in language understanding (72.2% on CMMLU), visual comprehension (74.3% on TextVQA), video understanding (60.9% on MVBench), and audio processing (6.9% WER). It surpasses previous models like VITA by 4% on average, setting a new standard for open-source multimodal models.