👀 Why More Users Don’t Necessarily Lead to Better AI Models
The CEO of MiniMax, the company behind Talkie and Hailuo Video, explains why applying mobile internet logic to AI development is a bad idea.
Hi, this is Tony! Welcome to this issue of Recode China AI (for the week of January 13, 2025), your go-to newsletter for the latest AI news and research in China.
MiniMax has emerged as one of the most aggressive Chinese startups in expanding its footprint in the global generative AI market.
Valued at $2.5 billion, the Shanghai-based, Alibaba and Tencent-backed startup is the creator of Talkie, an AI companion app taking over Character.AI’s throne. Talkie was one of the most downloaded AI apps in the U.S. last year, reportedly generating over $70 million in revenue. Its text-to-video generator Hailuo Video also posed a strong challenge to OpenAI’s Sora. The product earns acclaim for its high-quality, realistic action sequences. MiniMax recently open-sourced a Mixture of Experts (MoE) LLM, MiniMax-01, which can process four million tokens due to a technique called lightning attention. The model claims performance on par with GPT-4o and Claude-3.5-Sonnet.
The company’s CEO, Yan Junjie, is a former SenseTime executive who founded MiniMax in 2021. The low-profile CEO surprisingly made multiple bold statements on generative AI development in an interview with LatePost, often seen as China’s The Information.
For example, Yan said that metrics such as user numbers are not necessarily helping AI models to improve. In daily use, AI already is smarter than most users. Most companies are still using the mobile internet mindset, which at the core is recommendation system, to develop LLM products. It’s not difficult to build o1-like reasoning models.
I translated the three-hour interview because I personally found the interview quite interesting, even though I don’t agree with all of his arguments. Plus, MiniMax’s Hailuo chatbot, powered by the new MiniMax-01, made translating this interview (with 15,000 Chinese words) quite easy. Previously, I had to break such long document into 5-10 paragraph chunks and feed them into ChatGPT, which occasionally altered the original meaning or oversimplified the content. With Hailuo, I only needed to prompt twice, and the output was okay with sporadic hallucinations.
The following article is translated from LatePost, written by Cheng Manqi, and edited by Song Wei. You can find the original link here.
Don't Use the Mobile Internet Logic to Build AI
After the accelerated pace of 2024, discussions surrounding Chinese LLM (Large Language Model) startups have shifted from "Who's raised funds?" to "Who's going to be the first to fall?"
At this critical juncture for the industry, we interviewed Yan Junjie, the founder and CEO of MiniMax, one of China's "Big Six" LLM companies, with a valuation exceeding $3 billion. Over a three-hour conversation, we discussed MiniMax's new technological goals, its latest models, the company's changes and personnel adjustments over the past year, and Yan Junjie's self-reflection as a first-time CEO with "three years of practice." We also posed the "faith question" to him.
Ten months ago, Yan Junjie also spoke with LatePost. Back then, he mentioned ByteDance 16 times, OpenAI 47 times, and Anthropic 8 times.
This time, he voluntarily mentioned ByteDance less and Anthropic more. This shift reflects a subtle contrast to the industry's direction.
Yan Junjie is more concerned with ByteDance in March 2024, when Chinese LLM startups were thriving, and in the previous six months, various model companies had raised at least $2 billion.
Now, tech giants heavily investing in AI are overshadowing a number of star startups. MiniMax initially seemed relatively "safe": its AI community product, Xingye, had higher user numbers, usage duration, and retention rates than ByteDance's similar products.
However, Yan Junjie himself overturned these advantages. In his current understanding, user numbers and other metrics are not the core of AI competition:
Never use the product methodology of the last-generation mobile internet to think about new products.
Most Chinese companies, whether startups or big tech firms, are still using the method of building recommendation systems to develop LLM products.
The logic of recommendation is: the more users, the more feedback, and the smarter the recommendation engine becomes. However, Yan Junjie believes that the true relationship between AI & LLM and products is:
Better models can lead to better applications, but better applications and more users do not necessarily lead to better models.
ChatGPT's DAU is 50 to 100 times that of Claude, but their models are actually quite similar.
After becoming more aware that the improvement in intelligence level is not that dependent on a large number of users, Yan Junjie said he made a decision and ended more than half a year of anxiety. He said that now, MiniMax's most important goal is not growth, nor revenue, but to accelerate technological iteration.
The release of MiniMax's first open-source model series, MiniMax-01, on January 15th, is also one of the results of this goal.
MiniMax-01 is the first model in the 400 billion parameter range to use a new architecture with a linear attention mechanism (the standard Transformer uses a non-linear attention mechanism), which can efficiently process the world's longest 4 million token context.
Yan Junjie believes that long-context is an important capability for agents, as it can enhance AI's memory and improve the quality of single-agent interactions and the communication ability between multiple agents.
Yan Junjie reflected on some of the mistakes made when his understanding was not yet clear:
If I could choose again, I would open-source it from day one. Because open-sourcing can accelerate technological evolution.
Why wasn't Hailuo (chatbot) successful? — It was because we didn't stick to being technology-driven.
How to reflect on the failure to achieve the growth goals set at the beginning of 2024? — When setting goals at the beginning of the year, we were still using the business logic of the mobile internet. But this is actually two different industries.
Yan Junjie, who says he is no longer anxious, has made some unconventional technological choices: this time, the linear attention mechanism used by MiniMax-01 is not a strongly consensus direction; and in the direction of the most popular OpenAI o-series models, MiniMax did not follow in the first batch.
His expression style is also sharper than 10 months ago:
The people who liked to talk about faith the most a year ago, has their faith been fulfilled? (Tony’s note: He’s probably inferring to Yang Zhilin, CEO of Moonshot.)
It's actually not that difficult to make something that looks like o1... but we don't need to say we have an o1 and then issue a press release.
Born in 1989, Yan Junjie is young, but not a 90s kid; he is a PhD in artificial intelligence, but not an overseas returnee; he is not an industry big shot, nor a technical genius.
Most people who do technology think they are great, they are geniuses. But I don't see the world that way.
Yan Junjie said that last year he reflected the most on: why his cognition could not improve faster? His new approach is to let go of his ego and think deeper.
If I Could Choose Again, I Would Open-Source It From Day One
LatePost: After the release of the MiniMax-01 series of new models, what kind of interesting feedback did you get?
Yan Junjie: Technicians are more concerned with the fact that for the first time, a very large model did not completely use the traditional Transformer architecture, and that innovation is possible at the architecture level.
And some non-algorithmic partners and friends said that they felt that we seemed to be getting the hang of it, and began to realize that we needed to build a tech brand, and that cooperation had fewer restrictions due to the lack of open source.
LatePost: So have you really gotten the hang of it?
Yan Junjie: This is our first open-source series of models. Essentially, there are two reasons: First, we believe that what is truly valuable is not how well we are doing at the moment, but the speed of technological evolution. And open source will accelerate technological evolution. Good places will be encouraged, and bad places will receive a lot of criticism, and people outside will also contribute. This is the biggest driving force for us to open source.
Second, over the past two or three years, we have done a particularly bad job of not having a deep understanding of the tech brand. The reason why the tech brand is important is essentially because the biggest driving force of this industry is technological evolution. This requires computing power, data, money, and also good enough people.
LatePost: DeepSeek-V3 has become a hit in the global technical community. Did this stimulate you? Previously, when searching for DeepSeek on Hacker News, there were more than 470 posts, but very few for MiniMax.
Yan Junjie: When we realized that we needed to build a tech brand, DeepSeek-V3 had not yet been released.
After I got to know Liang Wenfeng at the beginning of 2023, there were two things that inspired me: First, their brand is very well built, and its reputation and word-of-mouth are among the best in the industry. Another inspiration is that DeepSeek didn't have a product at first, so it was more focused.
LatePost: Why didn't you open source earlier?
Yan Junjie: The first time I started a business, I didn't have a lot of experience. If I could choose again, I would open source it from day one.
If I were OpenAI, I should open source today, because its core capability is no longer whether the model is better than Claude or Gemini, but the brand and mindshare of ChatGPT.
We won't hide a better thing this time, it doesn't make sense, all models will be outdated in a year. Our general model will also continue to be open sourced.
LatePost: MiniMax was established with the intention of doing both models and products at the same time. However, DeepSeek's Liang Wenfeng said that at this stage, they are not doing products, only models. What do you think of this strategy?
Yan Junjie: First of all, DeepSeek recently also has an app.
But on the other hand, I think there has been a huge misunderstanding in the Chinese AI industry over the past year or two: that the more users, the faster the model capabilities will improve. This logic is very wrong.
Look at ChatGPT's DAU, which is 50 to 100 times that of Claude, but its model is not 50 times better, they are actually quite similar. This reflects that the improvement in intelligence level is actually not that dependent on having a lot of users.
LatePost: Not the more users, the faster the model capabilities improve. Last year, almost no one believed this view.
Yan Junjie: This issue should be viewed from two layers:
First, the model is the driving force behind the emergence of products. For example, last year there were many video products because there were stronger video models.
But the model is not iterated and improved based on user feedback and data. The reason why Claude 3.5 Sonnet's code capability is very good or the video models on the market are very strong is not because there were already large programming or video AI products, but because a tech benchmark was set first, and then it was achieved.
Therefore, better models can lead to better applications, but better applications and more users do not necessarily lead to better models.
The underlying principle of this phenomenon is that in daily use, the model is smarter than most users, and most users' queries are actually not as good as the model itself.
LatePost: What detours has this misunderstanding led the entire industry to take?
Yan Junjie: In order to have more users, a lot of money is spent to buy traffic. More importantly, most Chinese companies, whether startups or big tech firms, are still using the method of building recommendation systems to develop LLM products.
For example, for a content product, you cannot clearly know what will be popular, so you have to do a lot of AB testing, which is efficient. But when this logic is applied to the model, it becomes different researchers trying different algorithms, experimenting with different features, and if it doesn't work, adding more. This is not the way to do AGI.
LatePost: What is a more appropriate way?
Yan Junjie: We should very clearly define the model capability levels, and then figure out what kind of algorithms, data, and reasoning processes are needed for each generation of improvement, and use technical means to approach the defined indicators.
LatePost: When did you have this cognition? What is the relationship between this and your recent update?
Yan Junjie: March and April of last year. After we figured it out, we did a few things.
First, technology and products should be separated. Technology is to continuously improve the ceiling, which requires defining the capabilities of the next generation. For example, the reason why we used a new architecture this time is essentially because we believe that long context is important.
Second, don't think that after having a product, the model will get better. The purpose of the product is not to make the model better, it is a commercial product. What really needs to be considered is how to better meet the needs of users.
LatePost: So for you, what is more important, technology or product? Is MiniMax a technology-driven company or a product-driven company?
Yan Junjie: We are very clear that we are a technology-driven company. It's not just a slogan. The essence is, when there is a conflict, who has the final say?
LatePost: Can you give an example of technology having the final say?
Yan Junjie: For example, Hailuo Video, in terms of monthly visits, is now the largest video generation product in the world, but its webpage is still very rough, and when we first launched, there were many overseas users, but there was no English interface.
There will always be users who ask, why doesn Runway support a certain feature that you don't have, why does Kling have an app that you don't have. But once you go to solve these simple problems, the progress of the model will slow down because your energy will be diverted. At that time, our choice was to listen to the algorithm, and give priority to the functions with higher algorithm ceilings.
Another example, when a major algorithm change is launched, which may affect user data, how to choose? It is still based on the algorithm trend to make decisions. In 2023, we would still struggle, but in 2024, we basically don't struggle anymore.
LatePost: After last year, the discussion around LLM startups has changed from "Who's raised funds" to "Who's going to be the first to fall." Who do you think will fall first? Who will survive to the end?
Yan Junjie: I think we shouldn't divide startups into a separate category. Comparing startups with each other is not very meaningful, it should be the entire industry together.
And I want to say, I think DeepSeek and ZhiPu are quite good. DeepSeek is very pure. ZhiPu, they were the first to have an AI roadmap, which I admire.
A Year Ago, The People Who Liked To Talk About Faith The Most, Has Their Faith Been Fulfilled?
LatePost: The tech logic you described is consistent, but an investor's observation of MiniMax is: when you raised funds in 2021, you talked about virtual humans, and then you did Glow, Xingye, and other products similar to Character.ai; when Kimi became popular, you restarted the productivity tool Hailuo; after Sora, you put more resources into video generation; and now it's open source.
It seems that you have been following the hot spots.
Yan Junjie: This is a misunderstanding. We never wanted to make a digital human, it's just that when we started our business three years ago, we said we wanted to make an intelligent agent that is infinitely close to the Turing test, and some investors understood it as a digital human, when there was no big model investment concept at the time; and when we started to do Glow, there was no Character.ai; Hailuo was launched two years ago, but it didn't take off in the previous year, and when everyone realized that this direction was hot, the product experience of Kimi was better than ours at that time, so they might think we restarted.
Video, it was originally because we wanted the characters to move, so we started the project when we were doing Xingye and Talkie. After Sora appeared, I realized that this matter was bigger than I thought, so I made it more general.
Why open source? As mentioned earlier, the most core reason is to accelerate technological evolution.
LatePost: What is your faith in AI? It seems that you have done a lot of things.
Yan Junjie: Essentially, no one can define what AGI is. The only thing that can be defined is that the level of intelligence will continue to improve.
It is a bit like the Long March, you don't know exactly where the final destination is, but you know that better intelligence is meaningful.
LatePost: So it's hard to step-by-step deduce based on a clear end point?
Yan Junjie: Entrepreneurship is not about having an opportunity, where you are the most suitable, the chosen one.
The premise of entrepreneurship is that you have a unique understanding. Second, your resources are likely not the most, which is also a good thing, it forces you to have to make real innovations.
In this case, what is the path? Can you get to that point? It is not something that can be planned from the beginning, it needs to be fought for step by step.
LatePost: Lee Kai-Fu told us last week that the entire industry took only a year to shift from believing in the Scaling Law to doubting the Scaling Law.
Yan Junjie: I think as an entrepreneur, at this time, the thing I think about is not that the Scaling Law has hit a wall, so I give up, but what I want to do to make it continue.
Is it algorithm, organization, business-level innovation, or direction selection? At least while we still have the opportunity, we should strive to find a way.
LatePost: When discussing attitudes towards AI technology, do you think faith is an appropriate word?
Yan Junjie: The people who liked to talk about faith the most a year ago, has their faith been fulfilled?
LatePost: Who do you mean?
Yan Junjie: All the people who like to talk about faith the most in the industry, whether Chinese or overseas.
LatePost: Can faith be fulfilled in a year?
Yan Junjie: But at least we have to work in that direction.
LatePost: Is the road to faith a straight line? Can't you take a detour?
Yan Junjie: But some actions are the opposite. For example, the aforementioned spending a lot of money on advertising, but the problem is, more users will not bring faster model capability improvement.
LatePost: If you don't use faith, what do you think is a more appropriate word to describe the attitude towards technology?
Yan Junjie: (Thinking) I think it's belief.
LatePost: What is the difference between belief and faith?
Yan Junjie: Faith is a bit like describing a very distant future; belief is what you want to do and can stick to it.
Creating Something That Looks Like O1 Isn’t That Hard, But We Don’t Need A Press Release.
LatePost: Why did you choose to use the new architecture to start the Agent era in the tech blog title of the MiniMax-01 series update? Why is Agent an important goal? How do you define Agent?
Yan Junjie: There are two paths of thinking: First, in which direction should AI become stronger? Second, after becoming stronger, what beneficial changes can it bring to human society?
Obviously, an important thing is to be able to handle complex tasks, a sign of which may be multi-step, it can be a single output multi-step like o1, or it can be a single agent split into multi-step, or it can be like the workflow defined by Anthropic, which is a more complex collaboration between multiple agents.
If we define complex tasks again, my understanding is that it can reach the level of professionals in professional fields.
LatePost: Last year you said that no one had made a successful Agent application because the large model capabilities were not strong enough. Now MiniMax-01 says it is starting the Agent era, what has changed?
Yan Junjie: There are two layers, one is architecture, the other is capability.
At the architecture level, we have actually achieved it, because it can efficiently and quickly process a very long context.
Long context is important because it’s hard for AI to feel the passage of time like humans do, which requires processing increasingly longer memories. For a single agent, the core of improving the quality of interaction is to remember more things. For multiple agents, it involves mutual communication. For example, Anthropic has defined a communication protocol between agents called MCP (Model Context Protocol), and the communication volume is very long, so it also requires the ability to process long context.
At the capability level, there are still many areas where we can improve, such as AI's ability to use tools, planning ability, and our model has not yet been polished. But there are many standard benchmarks for these capabilities, which can be gradually realized.
LatePost: You first mentioned that this architecture is not completely Transformer, so what is it?
Yan Junjie: There are several modules in the standard Transformer, and we changed one of the most important attention modules from the original quadratic complexity attention to linear attention.
(Note: The attention module in the standard Transformer is non-linear, i.e., quadratic complexity. The linear attention mechanism improves the efficiency of processing long sequences by simplifying the calculation process. When the text is very long, the computational complexity grows linearly rather than quadratically, and the required computing power is smaller; however, the linear attention mechanism may not be as good as the non-linear attention mechanism in capturing complex dependencies.)
LatePost: It's actually a major variant of the Transformer?
Yan Junjie: You can understand it that way.
LatePost: Google's Gemini previously used a linear attention mechanism. What are the similarities and differences between MiniMax-01's and Gemini's linear attention?
Yan Junjie: I think Google will be stronger this year because it has mastered TPU (Google's self-developed AI chip), training framework (TensorFlow) and algorithms, which can be optimized together. So it's relatively easy for Google to do this.
And we can't customize our own GPU, we can only do it on a standard hardware, which will be more complicated.
LatePost: This is the difficulty of implementation, what about the method and effect?
Yan Junjie: Google is closed source, so I don't know exactly how they do it, but they should use sliding window attention. At first, the memory may not be that long, but it can be divided into many segments, and then a sliding window slides over.
We are not sliding windows, but we calculate everything, we just find some approximate algorithms to make it calculate faster.
(Note: Sliding window attention is a technology based on local context, it calculates attention by sliding a fixed-size window over the input sequence. This method can effectively capture local dependencies while reducing computational complexity.)
LatePost: In addition to the long-context and memory capabilities brought by linear attention, what other capabilities does the Agent need to improve?
Yan Junjie: There are some benchmarks, most of which are defined by the academic community, such as the important benchmark for improving the ability to translate code is SWE-bench.
A year ago, the model's resolution rate on this benchmark was only more than ten percent, and now it is more than 70%. There are also some benchmarks for multi-modality.
LatePost: Why didn't you test SWE-bench this time?
Yan Junjie: Code capability is the ability we want to improve in the next version.
LatePost: In technology, are achieving benchmarks and optimizing the computational architecture two separate things?
Yan Junjie: It is a whole. You can think of the architecture as the form of your calculation pattern, and the ability is to calculate specific parameters according to this pattern.
LatePost: How do you judge that the calculation pattern you choose can support a higher capability ceiling?
Yan Junjie: Cognition and experiments.
The first thing that determines the R&D efficiency of different companies is that your cognition has to be correct, but it is also possible that there are two cognitions that are correct, at this time, the experimental design and efficiency become very important.
How do we evaluate that our R&D capabilities are stronger than 9 months or a year ago? The key point is that under the condition of framework and data being determined, our experimental gains are higher. This is a core capability, it is very dependent on team cooperation.
LatePost: Why is MiniMax-01 oriented towards Agent, but not a model in the o1 direction? The o-series is considered to be very helpful in improving Agent capabilities.
Yan Junjie: Because we need to do each step solidly. Actually, making something that looks like o1 is not that difficult, just distill a few thousand o1 data and you can do it. We have done such experiments, and there have been many such academic papers recently, which is an industry consensus.
But we don't need to say we have an o1 and then issue a press release, our current business does not depend on such models.
LatePost: Is your next version of the model using the o1 method to improve programming capabilities?
Yan Junjie: Not only coding, but also planning. This also depends on how to use benchmarks to measure different tasks, and once the metrics are found, they can be optimized.
Even o3, its scores on some multi-modal benchmarks are also very low.
LatePost: How do you prioritize? Tongyi Qwen, Kimi, DeepSeek, Zhipu have all released models similar to the o-series, while you seem to think that the priority of multi-modal capabilities is higher?
Yan Junjie: First, a company's capabilities are limited.
Second, we think about which benchmark to optimize first, based on whether the field is sufficiently converged, and how much unique value we can create in it. The o-series needs time to evolve from the model to see a clearer product form.
Over the past few years, the companies that ultimately did well in a field were not necessarily the first to do the direction, but the ones that could fully realize the potential of the direction. It doesn't matter if it's a month early or a month late.
LatePost: Programming is already a scenario where Agent is being implemented, and o1 significantly improves programming capabilities. Don't you think this is a direction to grab the time window?
Yan Junjie: Cursor (AI programming assistant) is based on Claude 3.5 Sonnet, but Claude 3.5 is not an o-series model.
Four months ago, GitHub CoPilot (Microsoft's AI programming assistant) began to integrate o1, and it didn't become the first.
LatePost: One phenomenon is that after o1, Chinese companies followed up faster than Google, Anthropic and other American companies. What do you think this shows?
Yan Junjie: Because Chinese companies may think that distillation is something that can be done, while Anthropic or Google may not do it. But I don't think distillation is wrong.
LatePost: Is distillation a shortcut?
Yan Junjie: It is certainly a path. Whether it is a shortcut is a matter of opinion.
In fact, there has always been an alignment tax in text models, that is, if you must align the model with another model, such as the results of GPT, there will be some limitations on capabilities.
LatePost: In addition to the improvement in logical reasoning, mathematics, and programming capabilities brought by o1, how do you view the new space for Inference-Scaling opened up by it? What is the technical significance of this matter?
Yan Junjie: This trend has existed before, the simplest example is best of N, you sample ten times and choose the best result, and the accuracy will be improved.
The progress of o1 is to turn this idea into an end-to-end model, so it can be optimized as a whole, and the effect has improved a lot.
Agent Will Soon See A New Type Of Application: Information Acquisition
LatePost: What do you think is the first scenario for Agent to be implemented?
Yan Junjie: Coding is definitely, and I think there will soon be a new type of application, which is information acquisition.
LatePost: I know that you are quietly testing a new information acquisition product. Can you talk about the idea of using Agent to do this?
Yan Junjie: Now information acquisition is mainly based on recommendation, and the content recommended is likely what you want to see, but it cannot guarantee that everything you want to see will be pushed to you.
For example, I want to see the ten best papers in this field every day, and the current content platforms cannot meet this need. So I think information acquisition will change.
LatePost: This sounds like a Toutiao (ByteDance’s news app) using a new technology method.
Yan Junjie: Don't use the methodology of last-generation mobile internet products to think about new products.
LatePost: What's the difference?
Yan Junjie: Mobile internet products have to think about what are the supply and what are the consumption. But AI products don't need (human) supply.
AI has both distribution and supply capabilities, and AI capabilities will continue to change.
A mobile internet product experience gets better, most likely because the supply has changed. In AI products, it mainly depends on the model capabilities or the method of obtaining supply has changed.
Their (mobile internet products and AI products) cycles, certainty, and growth methods are different.
LatePost: When will you invest more resources in Agent products when you see what kind of signal?
Yan Junjie: This may not be the right question. If a product is particularly dependent on promotion, it is probably not right.
LatePost: Xingye also did a lot of promotions before.
Yan Junjie: Glow didn't promote, Xingye and Talkie had some promotions, and by the time Hailuo Video came out, we didn't spend money on promotion either domestically or overseas.
LatePost: Why is there a change from no promotion, to promotion, and then to no promotion? Is it because ByteDance joined the war aggressively?
Yan Junjie: No, it is because of cognitive upgrade - from never having done a product, to doing the first product; to starting to learn the product methodology of big companies, realizing the benefits and limitations; and then finding a more suitable method.
LatePost: Last month I talked with Pony.ai CTO Lou Tiancheng about L4, and he thinks that in the application of large models, MiniMax's Xingye is more like L4 in autonomous driving, it is AI interacting with users, it is replacing to create value; while ChatGPT, CoPilot are more like L2, it is auxiliary to create value. How do you see the similarities and differences between these two directions?
Yan Junjie: This summary is quite interesting, it is indeed very different.
For example, ChatGPT and Claude, ChatGPT is more like an assistant, helping you complete tasks, Claude is more emotional.
A interesting test is, you first tell the model a number between 1 and 100, for example, 50, and then you say to him, then I won't chat with you for 50 days. Claude will say, can I have another chance? And then he will say a very small number. ChatGPT will not do this.
The essence is, how to look at the issue of alignment. Anthropic has a set of values, based on which, it has launched a constitution. This leads its model to have some characteristics and capabilities.
This is a thing with a high ceiling, that is, clearly define what your model is.
I think the difference between Chinese and U.S. LLMs is that (in China) there is a lack of internal defined benchmarks, some of their own underlying thinking and design, and more is to align with the output of models like o1.
LatePost: Does MiniMax's model have its own internal benchmarks and roadmap? For example, OpenAI last year proposed an AI capability classification from L1 to L5 (chatbot, reasoner, agent, innovator, organizer).
Yan Junjie: This is something we need to gradually strengthen.
Our initial goal was Intelligence with Everyone, and the implementation method is to be with the user, but we didn't precisely define what each step was.
This may be a logic of escape from the dead, it has to be done step by step. I think for OpenAI, the most meaningful thing at present is also L3 (agent), and what L4, L5 look like does not affect their current actions.
ByteDance Has The Highest Talent Density; Other Companies Are A Notch Lower.
LatePost: What do you think are the technological achievements that you have actually done well from last year to today?
Yan Junjie: The things related to infrastructure and computing power. Because the amount of dialogue, images, videos and audio we generate every day is very large and very difficult. How to handle so much computing, optimize and schedule it well, and also have a reasonable cost. This is something we should be the best in the industry.
Then in terms of algorithms, our multi-modality is more advanced, the general text is not the most advanced for the time being, but it is beginning to have its own characteristics.
LatePost: This time MiniMax-01 update, for the first time it has achieved large-scale linear attention mechanism, you describe this as a "very bold innovation", how bold is it?
Yan Junjie: We are the first to do it on such a large-scale model.
LatePost: Others don't do it because they don't think it's a good direction, or because it's difficult?
Yan Janjie: Both. This is not a strong consensus.
LatePost: The progress of your company that has attracted the most external attention last year, as you said, is multi-modality, especially the Hailuo video generation large model, the effect and the number of visits are now the first tier in the world, how did you do it?
Yan Junjie: We have already done a text (large model) before, and we have also done text-to-image, and we have some accumulation.
But when we really start to do it, we will find that these Infra cannot be completely reused for video, and there are also many changes in the algorithm and how to do experiments, and how to evaluate, it is even more different. You can think of it as equivalent to a new company growing out.
LatePost: Over the years of developing so many models, what are the characteristics and methodology of your technical team?
Yan Janjie: We are relatively objective. This means that sometimes we may set the wrong goals, but once we can find the right goal, our efficiency and the depth we can achieve are better.
And flat, flexible, communication is simple and direct, we are still at three levels, me, my-1, and my-2.
LatePost: Is objectivity a characteristic? Are there many companies in the industry that are not objective?
Yan Janjie: I think so. Non-objectivity refers to some other considerations when evaluating technical results, such as whether morale is affected, the scope of different teams, etc.
LatePost: Why didn't you mention talent density? For example, how many competition winners do I have.
Yan Janjie: The highest talent density is ByteDance, other companies are a notch lower, this is a fact. And we don't want to package ourselves into something.
But I want to say, two equally excellent students, assuming one goes to ByteDance, and the other goes to a startup that relies on technology and innovation to survive. After 2 to 3 years, the probability of the one who goes to the startup becoming significantly more excellent is greater.
Most Tech People Think They Are Great, But I’m Not So Convinced About Their View Of The World.
LatePost: In the many changes in the industry in 2024, including your own clearer understanding of the relationship between the model and the application is not a simple positive cycle, is it still necessary to do both the model and the application at the same time? Why not focus on one?
Yan Janjie: First, there is no company that only does the model and does not do the application. DeepSeek and Anthropic are not.
Then there are companies that only do applications and do not do the model, there are obviously many such companies, some of which are doing very well, such as Perplexity and Cursor.
At the same time, there are companies that do both the model and the application, and we are also one of them. Every new product we launch is indeed because we first made a model and had an improvement.
LatePost: When you founded the company at the end of 2021, the ecosystem of LLMs was not very good, so you had to do it yourself. If you started your business later, would you be a company focused on applications?
Yan Janjie: No. First, to make products based on existing technology, and second, to make products based on future technology, I want to do the latter.
LatePost: Is this because you want to do a more valuable thing?
Yan Janjie: No. It is based on how to better realize the potential of oneself and the company.
LatePost: Those lighter, more focused application companies, and MiniMax, which does both model and application at the same time, how will they compete in 2025?
Yan Janjie: The market is not that if there is A, there can't be B, in fact, both things are right.
LatePost: How do you reflect on the fact that your productivity-oriented Hailuo AI (referring to the chat assistant product, not Hailuo Video) did not meet expectations last year?
Yan Janjie: I think it is because we did not stick to being technology-driven. When you find a lot of user dissatisfaction, the solution should not be to make up for these cases, but to find some real ways to improve.
And by May last year, I knew that Doubao would win. The experience of Doubao at the time was better than other similar products.
At the same time, I also began to realize the thing I mentioned at the beginning, that more users will not lead to model capability improvement. So we should think of Hailuo text as a product, a business to think about, and our subsequent decision was not to invest.
LatePost: You said you figured this out in March and April, why did you stopp promoting Hailuo chatbot later?
Yan Janjie: It is all part of the growth of entrepreneurship, in fact, many cognitions are very simple, but the execution is not so firm.
LatePost: What affected and interfered with you? Investors? Competitors?
Yan Janjie: I think it is people, mainly considering the feelings of the team.
LatePost: When did you start to become more ruthless?
Yan Janjie: Now it's not. The actual change is that I will very clearly tell everyone what I think is right. Some things can't be compromised.
LatePost: On the other hand, your AI community product Xingye is the best-performing in China, better than similar products of big companies such as ByteDance and Meituan. Why is it temporarily ahead?
Yan Janjie: The most critical thing is that the technical route must be chosen correctly. Second, when making business decisions, we understand the users better.
LatePost: How did you understand them, you don't seem to be very similar to the user profile of Xingye.
Yan Janjie: The core is empathy.
LatePost: Do you think you are a person with strong empathy?
Yan Janjie: I think so.
The essence is that most people who do technology think they are great, they are geniuses. But I am not so convinced of the world.
Do Not Distinguish Between Startups and Big Companies. Do Not Copy Mobile Internet
LatePost: From the last time we talked in early 2024 to now, what do you think is the biggest change in the competitive landscape of the Chinese LLM industry?
Yan Janjie: In 2024, many people thought AI was copying the mobile internet, and now at least some people have begun to realize that AI is not applicable to the logic of the mobile internet, these are two different things.
LatePost: How does this cognitive change affect the competitive landscape?
Yan Janjie: The advantages accumulated by big companies in the past are still meaningful, but not the only thing. This is still because, the more product users, the model will not naturally become better. And better intelligence may lead to new things, and new things will also have new business models.
LatePost: Globally, we see that Google spent $2.5 billion to acquire the team of Character.ai, whose product form is somewhat similar to yours. Will this be an option for MiniMax?
Yan Janjie: I have not considered selling the company for a certain price.
LatePost: I sent you the news last year, and you said it felt like a happy ending.
Yan Janjie: For them, it is, the founder didn't like the product that much, and after he went back, he contributed a lot to Gemini 2.0.
LatePost: Verify a rumor, did ByteDance talk about acquiring MiniMax at a valuation of $4 billion in early 2024?
Yan Janjie: There is no such thing.
LatePost: Have you communicated with Zhang Yiming, what did you gain?
Yan Janjie: At least it let me see what a very top entrepreneur is like.
LatePost: What kind of?
Yan Janjie: He hopes to bring a lot of positive value to society.
LatePost: By the second half of 2024, the investors in the big model field have been state-owned capital rounds, Middle Eastern rounds, and then who can continue to follow? How do you continue to obtain sufficient funds?
Yan Janjie: We are not at that point. This is still up to how we make our own good things.
LatePost: Actually, you said last year that you don't believe that Chinese LLM startups can rely purely on financing, the real turning point will come from the improvement of technology, products or business efficiency. But I learned that your 2024 revenue and income did not achieve the goals set at the beginning of the year, what do you think about this?
Yan Janjie: But we are the fastest growing, and most likely also the one with the most income.
LatePost: So the goal was set too high?
Yan Janjie: The core is that when the goals were set at the beginning of 2024, we were still using the business logic of the mobile internet, and the cognition had not yet changed. This is actually two industries.
LatePost: Now how would you set goals? How are the goals for 2025 set?
Yan Junjie: I think at this stage, we should not set a revenue goal, but rather a technical R&D goal.
LatePost: By the second half of last year, everyone saw the strength of big companies, such as ByteDance, Alibaba, etc., especially the product performance of Doubao. Was this unexpected for you?
Yan Junjie: It was mostly within my expectations, and I even anticipated it would be more intense.
I want to say that if you look at Doubao with the mindset of the mobile internet, it is indeed very impressive. But assuming that technology will develop in the long term, different stages will bring different products and business lines, and this may not necessarily be a good thing.
LatePost: Do you mean that Doubao's rapid user growth is not a good thing, or that looking at Doubao with the mindset of the mobile internet is a distraction for industry observers?
Yan Junjie: Neither is good. Let's compare OpenAI and Anthropic. The former's user base is dozens of times that of the latter, but its valuation, funding, and talent are only a little more than three times that of the latter. In order to cater to so many users, OpenAI has to bear a lot of things, which may slow down its R&D pace.
LatePost: So having more users cannot directly improve model capabilities, and to a certain extent, it may even affect the speed and flexibility of model development.
Yan Junjie: At least based on the comparison between OpenAI and Anthropic over the past year or so, this is the case.
LatePost: You repeatedly mentioned not applying the evaluation criteria of the mobile internet, so what indicators should we look at for AI products?
Yan Junjie: An important indicator for global products is the number of subscribers and paid users, while large mobile internet products in the past mainly relied on advertising, which is obviously different.
For domestic products, I guess there are also indicators, but I want to wait until we perform better before saying.
LatePost: Is not competing head-on with giants, such as not heavily investing in Doubao-like products, one of your competitive principles?
Yan Junjie: Yes. But in essence, I believe that building AGI and creating ChatGPT-like products are two different things.
And at that time, I also began to realize that improving model capabilities is not that dependent on having the most users, so giving up would not cause too much psychological pressure.
LatePost: A big model investor compared the market strategies of MiniMax and Lunar Dark. He believes that Lunar Dark is determined to focus on the "productivity scenario + Chinese market," which is also the direction where all big companies are investing the most, while MiniMax seems to be constantly adapting to the environment and has found gaps outside the giants' main battlefield. How would you summarize your positioning strategy?
Yan Junjie: We hope to always be in the wave. This has two meanings: First, to be able to participate in and promote the wave to continue; Second, to be able to sustain the company's development.
Don’t Assume That a Company Will Not Have Changes and Turnover; It Is Normal
LatePost: What do you think about some middle and senior managers leaving MiniMax last year?
Yan Junjie: Essentially, this matter should be technology-driven, and not everyone is suitable.
LatePost: Did you persuade some people to leave? Did you have a lot of psychological burden?
Yan Junjie: It had to be overcome.
LatePost: Did you procrastinate in doing these things?
Yan Junjie: I procrastinated, which is still due to insufficient cognition.
It should not be assumed that a company will not have changes and turnover. It is reasonable to have them.
LatePost: In reality, what changes has your team undergone compared to the beginning of 2024?
Yan Junjie: The main change is not in the organizational structure, but in the requirements for people.
First, I hope that the leaders of each direction are the ones who come up with proposals, and not wait for others to come up with proposals, they should be more hands-on.
Second, I found that we want people who can find rational solutions based on objective data analysis, rather than those who directly copy the experience of their previous company.
LatePost: You said that the most important goal for you in 2025 is technological iteration. What is your current organizational collaboration and division of labor like in this regard? How are resources allocated?
Yan Junjie: On the one hand, the resources for doing things are relatively sufficient, on the other hand, as a startup, we have to make trade-offs, but you should not assume that all trade-offs are correct. A key point is how to realize when you are wrong and how to correct it in time.
LatePost: What mechanism can make you realize that your judgment is wrong?
Yan Junjie: No ego. No self.
LatePost: Do you think you are a person who is easily persuaded?
Yan Junjie: No.
LatePost: Is this contradictory to not having an ego?
Yan Junjie: Many things are contradictory. The name MiniMax itself is contradictory (MiniMax as a technical term refers to the "minimax algorithm").
There are still some ways to balance, which is to think as deeply as possible and not be fooled by superficial, temporary things.
LatePost: Which important decisions in MiniMax are made by you, and which are delegated to others?
Yan Junjie: In the first year of starting a business, I thought this was very important, but later I found that it was not.
More importantly, how everyone has a common foundation for thinking. Because everyone has a different way of doing things, if the division is too clear, it will lead to different modules having completely different logics, and even if everyone is of high quality, a company cannot run well.
The correct approach is that the company's most fundamental cognition can match up, and whoever makes the decision, it will be similar, and the organization will become smooth.
LatePost: What if everyone is wrong together?
Yan Junjie: Diversity is also a very important indicator. But I feel that what really pulls the company together is that common thing.
LatePost: Will this kind of division of labor that does not emphasize division make employees feel that the company's management is chaotic?
Yan Junjie: Making employees feel that the company's management is very good is actually not a goal.
LatePost: Some MiniMax employees said that during a period of 2024, they felt that the company's management decisions were very wavering, this month the core goal is revenue, next month it is growth, and the month after it is revenue again.
Yan Junjie: We later unified, these two things are not goals, the goal is technological iteration.
LatePost: What is the biggest challenge in managing a new AI company that cannot be managed with previous logic?
Yan Junjie: Continuously recruiting better people. One fact is that ByteDance currently has the strongest attraction for talent. But the proportion of people who really unleash their potential after joining ByteDance is lower than that of startups, because there are too many people at ByteDance.
The Greatest Pain Is Not Knowing How to Make Trade-Offs.
LatePost: What is the biggest change in yourself in 2024?
Yan Junjie: More than half a year ago, many people thought I was a bit anxious, and in the past six months, I am no longer anxious. The core is that I began to realize that I needed to make trade-offs.
LatePost: What was the biggest pain this year?
Yan Junjie: The biggest pain was not knowing how to make trade-offs. When I knew, the pain was gone. Now I take technological iteration as our most important goal.
LatePost: What did you reflect on the most last year?
Yan Junjie: Why couldn't my cognitive ability improve faster?
LatePost: Have you found any new ways to improve?
Yan Junjie: I still have to let go of my ego and think more deeply.
LatePost: You mentioned that an important way for you to learn is to communicate with people who are better than you. Who did you meet last year, and what did you learn?
Yan Junjie: There are some, but I think this alone is not enough. The essence is that I can think very deeply.
LatePost: An investor shared with me the story of looking for computing power with you last year. He said you were extremely persistent in pursuing cheap rent and shorter lease terms. Some suppliers offered to give MiniMax some local to B AI orders so that you could consider more expensive rent, but you said you didn't need orders. Is this your tough, uncompromising side?
Yan Junjie: The reason why we don't want orders is that we can't deliver them. If we promise, it will distract our energy and also let others down.
LatePost: At MiniMax, you have always been called IO, which is a hero in Dota 2. MiniMax's Hailuo AI said: IO is a support hero, mainly providing buffs and protection for teammates, playing the 4th or 5th position in the team. Why did you choose IO when playing Dota 2 and have been using this name?
Yan Junjie: Actually, he is not always in the 4th or 5th position. At TI9 (Dota 2 2019 International Invitational Final), Ana (an e-sports player) used the God Wisp, which turned IO into the 1st position, and it was very strong, and their team won the championship that year. At that time, I thought the name was pretty cool.
LatePost: So IO has nothing to do with your own characteristics?
Yan Junjie: There are many random behaviors in entrepreneurship.
LatePost: Support type, output type, which type of hero do you think you are more like?
Yan Junjie: Actually, it shouldn't be split apart. If I have to say, I am a person who particularly believes in teamwork.
LatePost: What foreseeable changes do you think there will be in 2025?
Yan Junjie: AI will reach the level of professionals in professional fields. This will be a substantial improvement, although it may not be fully realized in 2025, but some of it will be realized.
LatePost: Do you have any new requirements for yourself in the new year?
Yan Junjie: I hope my technical level will become higher. When I first started my business, I also considered management, but later I found that it was not that important. The really important thing is whether the technical cognition can continue to improve.
LatePost: Sometimes you look very adaptable, and some people might even think you are wavering, sometimes you are very determined. Which is closer to the real you?
Yan Junjie: This is a process of progress. When you become stronger in a stage, you will be more determined.
LatePost: When you started your business, you were neither an industry big shot nor considered a "technical genius." What type of founder do you think you are?
Yan Junjie: I think it is simple. Simple means knowing that there is one thing, it is also very difficult, but the value of doing it well is quite big, so I insist on doing it.