🕺🏻Alibaba's Chatbot Creates Dance Videos from Images, China Sets AI Rules in Scientific Research, and Explore ByteDance's 'GPTs'
Weekly China AI News from January 1, 2024 to January 7, 2024
Hello readers! Kicking off 2024, Alibaba introduced a fun image animation feature to its chatbot that makes Elon Musk dance the TikTok-viral Kemu San. In regulatory news, China issued new guidelines for the use of generative AI in research. ByteDance launched Coze, a platform for creating chatbots on third-party platforms.
If you’re enjoying Recode China AI, chances are your friends and colleagues will, too! Feel free to share the link to this issue with others.
Dance with AI: Alibaba Brings Dance Animation to Chatbots
What’s New: Alibaba enhanced its ChatGPT-like chatbot, Tongyi Qianwen, with a feature to animate images into various dance styles last week. This technique allows users to generate a 10-second dancing video, making themselves or other celebrities like Elon Musk or Tim Cook dance an Irish jig.
How it Works:
First, users select their preferred dance style within the Tongyi Qianwen app, including over ten dances like the Ghost Dance, Paddle Step, and Bunny Dance.
Then, they upload a full-body, front-facing photo with a minimum resolution of 500x500 pixels.
With a prompt of either “All-People’s Dance King” or “Tongyi Dance King,” and a tap on the “Generate Now” button, the animation springs into action.
This feature is powered by a model named “Animate Anyone”. To overcome the challenge of creating detailed, temporally consistent animations, Alibaba researchers proposed ReferenceNet based on diffusion models, a popular image generation architecture behind DALLE-3 and Stable Diffusion.
Why it Matters: This new feature introduces an entertaining dimension to Alibaba’s chatbot, aiming to attract new users amid fierce Chinese chatbot competition. In 2023, more than 250 Chinese LLMs have been developed, with over 20 of them gaining regulatory approvals to offer services to the public.
It also showcases the rapid pace at which research innovations are being incorporated into consumer products nowadays. Animate Anyone was published on Arxiv in late November 2023, and it took Alibaba just more than one month to integrate this research innovation into its chatbot.
China Prohibits ‘Direct’ Use of Generative AI in Scientific Research Applications
What’s New: China’s Ministry of Science and Technology (MOST) issued the “Guidelines for Responsible Research Behavior (2023)” last week to regulate the use of generative AI in scientific research.
How it Works: The Guidelines specifically state that:
Generative AI should not be used to “directly” generate application materials.
Application materials are typically submitted to apply for research qualifications, funding, projects, permits, etc.
Generative AI should not be listed as a co-contributor of research results.
Unverified literature and references created by generative AI are not allowed.
The Guidelines also mandate several appropriate uses of generative AI, including its role in data processing throughout research and in peer-review assessments. Academic publications should require authors to disclose whether and how generative AI was used in research. AI-generated content should not be cited as the primary source.
The guidelines apply to research entities, universities, medical and health organizations, enterprises, and their scientific staff.
Why it Matters: As generative AI rapidly evolves, offering new opportunities for scientific research, it poses challenges such as data fabrication, plagiarism, and IP rights problems. The guidelines are a response to these issues in the research area. Multiple scientists welcomed the regulation, SCMP reported.
One More Thing: The Guidelines specify that researchers should not disseminate research findings to the public unless they have been scientifically verified or peer-reviewed. This regulation might conflict with the current trend among AI researchers of promoting their non-peer-reviewed papers via news outlets and social media. Many researchers believe high public engagement and discussion can increase the chances of acceptance in research journals and conferences.
ByteDance Launches AI Bot Development Platform Coze
What’s New: ByteDance in recent days launched an AI chatbot development platform named Coze globally. This platform is designed to create AI bots without coding, and be deployed on platforms like Discord and Telegram.
How it Works: Coze is similar to OpenAI’s GPTs but is agnostic to third-party platforms. Users first create a bot by providing a name, a description of its capabilities, and a profile picture. Then they proceed to the dashboard where they can set up bot prompts/commands, extend the bot’s functionalities by adding plugins, workflows, databases, etc., and test the bot’s performance before it goes live.
Coze now provides over 60 plugin tools, including Google web search, Dalle 3, Twitter, YouTube, and more. Additionally, Coze offers a database that custom-builds bots to utilize knowledge like a PDF document or a website URL. Workflows is an advanced feature that allows bots to handle more complex tasks.
Let’s say we build an after-sales service using workflows. A LLM first determines the type of user issue.
The Condition then directs different results into corresponding branches.
Suppose it enters the return branch, then it consults the return policy document previously uploaded in Knowledge to find relevant information.
The user’s question and the information obtained are then fed back into the LLM to generate an appropriate response.
If the process involves a step for courier tracking, a courier tracking API can be integrated.
Weekly News Roundup
🔌 NVIDIA will reportedly begin mass production of its China-specific H20 AI GPU in the second quarter of 2024.
🎉 Xiaoice says its Xiaoice LLM has obtained algorithm registration and entered into official releases. Its AI clone platform, Xiaoice X Eva APP, has attracted over 800,000 creators to clone themselves.
📘 NetEase Youdao launches Ziyu Education LLM 2.0 and Hi Echo, a virtual spoken language coach. Additionally, they introduce Youdao Speed Reading, the new AI family tutor app Little P Teacher, and the latest AI learning machine X10.
🤖 SenseTime introduces a new Yuanluobo table lamp, priced at 1699 yuan.
📸 Meitu’s visual model MiracleVision has passed the regulation approval, set to be opened to the public.
🩺 Ant Group launches RJUA-QA, the first Chinese medical specialty question-answering inference dataset. Ant Group claims this is the first clinical specialty dataset in the industry.
Trending Research
LLaMA Pro: Progressive LLaMA with Block Expansion
Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.
GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
While Large Language Models (LLMs) like ChatGPT and GPT-4 have demonstrated exceptional proficiency in natural language processing, their efficacy in addressing complex, multifaceted tasks remains limited. A growing area of research focuses on LLM-based agents equipped with external tools capable of performing diverse tasks. However, existing LLM-based agents only support a limited set of tools which is unable to cover a diverse range of user queries, especially for those involving expertise domains. It remains a challenge for LLM-based agents to extend their tools autonomously when confronted with various user queries. As GitHub has hosted a multitude of repositories which can be seen as a good resource for tools, a promising solution is that LLM-based agents can autonomously integrate the repositories in GitHub according to the user queries to extend their tool set. In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub. GitAgent follows a four-phase procedure to incorporate repositories and it can learn human experience by resorting to GitHub Issues/PRs to solve problems encountered during the procedure. Experimental evaluation involving 30 user queries demonstrates GitAgent's effectiveness, achieving a 69.4% success rate on average.
Understanding LLMs: A Comprehensive Overview from Training to Inference
The introduction of ChatGPT has led to a significant increase in the utilization of Large Language Models (LLMs) for addressing downstream tasks. There's an increasing focus on cost-efficient training and deployment within this context. Low-cost training and deployment of LLMs represent the future development trend. This paper reviews the evolution of large language model training techniques and inference deployment technologies aligned with this emerging trend. The discussion on training includes various aspects, including data preprocessing, training architecture, pre-training tasks, parallel training, and relevant content related to model fine-tuning. On the inference side, the paper covers topics such as model compression, parallel computation, memory scheduling, and structural optimization. It also explores LLMs' utilization and provides insights into their future development.
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI
The burgeoning field of Artificial Intelligence Generated Content (AIGC) is witnessing rapid advancements, particularly in video generation. This paper introduces AIGCBench, a pioneering comprehensive and scalable benchmark designed to evaluate a variety of video generation tasks, with a primary focus on Image-to-Video (I2V) generation. AIGCBench tackles the limitations of existing benchmarks, which suffer from a lack of diverse datasets, by including a varied and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions. We employ a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance. These dimensions are control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-dependent and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. The findings from our extensive experiments aim to stimulate further research and development in the I2V field. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks.
Thanks for reading. If you’re enjoying this issue, feel free to subscribe to Recode China AI.