š¤MBTI of GPT-4; Evaluating LLMs as Agents; China Gets Strict on Facial Recognition; GPT-5 Trademarks in China
Weekly China AI News from August 7 to August 13
Dear readers, I will discuss a new study that tested whether ChatGPT and GPT-4 exhibit personality traits using the MBTI. Researchers developed a benchmark that tests LLMs as agents. China released draft regulations aimed at curbing excessive use of facial recognition. OpenAI recently filed a trademark application for āGPT-5ā in China.
Using MBTI Tests to Evaluate LLMsā Personalities
Whatās new: A recent paper from ByteDance explored using the Myers-Briggs Type Indicator (MBTI) personality test to evaluate the āpersonalityā of LLMs like ChatGPT. This is a novel approach to assessing LLMs beyond just testing their accuracy on questions.
MBTI is a self-reported questionnaire that categorizes people into one of 16 personality types. It provides insights into four dimensions: extraversion/introversion, sensing/intuition, thinking/feeling, and judging/perceiving. The tests are vastly popular among Asian countries like China and Korea.
How it works: The researchers tested six popular LLMs by having them answer the 93 multiple-choice questions from the MBTI questionnaire. Based on the LLMsā responses, they were each classified into one of the 16 MBTI personality types.
LLMs exhibited distinct personalities according to their MBTI types. For example, ChatGPT was classified as ENTJ (Extraverted, Intuitive, Thinking, and Judging), while GPT-4 was INTJ (Introverted, Intuitive, Thinking, and Judging). Also, training the LLMs on different datasets did affect their MBTI types, especially on the Thinking/Feeling and Judging/Perceiving dimensions.
The researchers also tried changing the LLMsā personality types using prompt engineering but found their types remained unchanged, indicating a stable personality.
Why it matters: MBTI provides a simple way to capture differences between LLMs that goes beyond just factual knowledge. The personality types identified could inform how the models are used, as different types have different strengths/weaknesses. The results also suggest LLMs exhibit consistent traits reminiscent of human personalities, bringing them closer to human-like AI.
Introducing AgentBench: A Comprehensive Benchmark for Assessing LLMs as Agents
Whatās New: Researchers from Tsinghua University, Ohio State University, and UC Berkeley introduced AgentBench, a multi-dimensional evolving benchmark designed to assess LLMs as agents. It consists of eight distinct environments that evaluate LLMsā reasoning and decision-making abilities in a multi-turn open-ended generation setting.
How It Works: The benchmark includes various environments to test different aspects of LLMs, such as reasoning, decision-making, and interaction. Using AgentBench, researchers can assess how LLMs perform as agents in complex, interactive tasks. The paper also includes extensive tests over 25 LLMs, which may consist of both publicly available models and custom-developed ones.
Eight environments encompass operating system (OS), database (DB), knowledge graph (KG), digital card game (DCG), lateral thinking puzzles (LTP), house-holding (Alfworld), web shopping (WebShop), and web browsing (Mind2Web). GPT-4 topped the benchmark with a significant lead, capable of handling a wide array of real-world tasks.
Why Itās Important: With the increasing interest in LLM-based agents like AutoGPT and BabyGPT, traditional evaluations of LLMs might not fully capture their capabilities as interactive agents. AgentBench fills this gap by providing a specialized benchmark for assessing LLMs in more realistic and challenging contexts. This can help understand the strengths and weaknesses of different LLMs, guide further research, and contribute to developing more intelligent and responsive AI systems.
China Seeks to Regulate Facial Recognition Tech Usage
Whatās new: Chinaās Cyberspace Administration of China (CAC) last week released draft rules aimed at regulating the application of facial recognition technology to protect personal information rights. The Regulations on the Security Management of Facial Recognition Technology Applications (Draft for Comments) outlines specific circumstances where facial recognition can be used and requires consent in most cases.
How it works: The draft rules state that facial recognition should only be used when necessary for a specific purpose, not as a default for providing services. Public places like hotels and train stations cannot force people to use facial recognition for identification without a legal basis. Remote or invisible facial recognition in public is only allowed for national security, public safety, or protecting lives and property in emergencies.
Most uses of facial recognition to process biometric data will require individualsā consent. Operators with over 10,000 facial images must register with cyberspace authorities.
The draft rules are open for public feedback until September 7 before being finalized. The regulations reflect Chinaās increased focus on protecting personal data amid the broader use of technologies like face recognition.
New GPT-5 Trademarks Hint at OpenAIās Next LLM
Whatās new: OpenAI, the company behind the viral ChatGPT, has recently filed trademarks for āGPT-5ā in China, fueling speculation about their next major AI release. According to Chinaās National Intellectual Property Administration, OpenAI (OPENAI OPCO, LLC) submitted two āGPT-5ā trademarks last month in international category 9 (scientific instruments) and 42 (design research).
Last month, OpenAI reportedly filed an application for the trademark āGPT-5ā with the United States Patent and Trademark Office (USPTO).
Why it matters: While OpenAI has not officially confirmed they are developing a GPT-5 model, the trademark filings suggest it could be on the horizon. Earlier this year, OpenAI CEO Sam Altman said āWe have a lot of work to do before GPT5. It takes a lot of time. We are nowhere close to it.ā The applications indicate they are now taking steps to protect the GPT-5 name.
OpenAIās current GPT-4 model has marked a significant leap in natural language processing. The GPT-5 trademarks suggest its capabilities could include text generation, natural language understanding, speech transcription, translation, analysis, and more.
Watch Wandering Earth 3 āTeaser,ā Created with MidJourney and Runway
Weekly News Roundup
š° Alibaba Cloudās Q2 revenue rose 4% to 25.123 billion yuan. CEO Daniel Zhang said the GPU supply shortage currently limits the growth of AI cloud services.
š¤ ByteDance is reportedly testing conversational AI chatbot Grace.
š§ Baichuan unveiled its first closed-source LLM Baichuan-53B. The model features 53 billion parameters, enhancing text generation capabilities.
šµļøāāļø In a press conference, Chinaās Ministry of Public Security said they have cracked down on 79 cases of AI-based face-swapping crimes, arresting 515 suspects.
š„ Kuaishou unveils its multi-modal AI assistant and digital human product, Kuaishou Zhibo (åæ«ęęŗę).
Trending Research
LISA: Reasoning Segmentation via Large Language Model
Current perception systems rely on explicit instructions and lack reasoning abilities for complex visual tasks. This paper proposes a new segmentation task, reasoning segmentation, requiring models to output masks for implicit, complex queries. They present LISA, a large language-instructed segmentation assistant built on a multi-modal LLM that can handle complex reasoning and world knowledge through a new <SEG> token and embedding-as-mask approach. With minimal fine-tuning, LISA demonstrates robust reasoning segmentation and referring segmentation capabilities even when trained on reasoning-free data, unlocking new skills (Affiliations: The Chinese University of Hong Kong, SmartMore, Microsoft Research Asia).
All in One: Multi-Task Prompting for Graph Neural Networks (KDD 2023 Best Paper Award)
Pre-training strategies often fail to transfer well across diverse graph tasks. Inspired by prompt learning in NLP, this paper proposes a novel multi-task prompting method to narrow the gap between pre-trained models and various graph classification, regression, and link prediction tasks. By unifying graph and language prompt formats and introducing meta-learning for multi-task prompt initialization, the approach enables more reliable and general prompting that outperforms existing methods (Affiliations: The Chinese University of Hong Kong, The Hong Kong University of Science and Technology, Southeast University Purple Mountain Laboratories, Tongji University).