🤯Kimi K2 Thinking: The $4.6M Model Shifting AI Narratives
Moonshot AI's new model challenges prevailing narratives about open versus closed models, US-China AI competition, training costs, and perhaps OpenAI's trillion-dollar infra deals.
Last week I was planning to write about MiniMax M2, the latest reasoning model from Alibaba and Tencent-backed AI startup MiniMax that climbed to the best open LLM on Artificial Analysis Intelligence Index. But just one week after the release of M2, another Chinese AI startup Moonshot AI released the thinking model of its latest LLM Kimi K2, named Kimi K2 Thinking, that outshone M2.
K2 Thinking is a reasoning model that thinks while using tools. It’s the best open LLM available right now, with specific agentic capabilities that actually beat GPT-5 and Claude Sonnet 4.5 in certain tasks. It’s cheap to train, though it’s not necessarily cheap to use.
The reaction across the AI community was explosive. People are calling it “the turning point of AI,” “China saved open-source LLMs,” and saying it will “make OpenAI bleed.”
The timing couldn’t be more interesting. As Kimi K2 went viral, OpenAI hit another wave of bad news that’s adding fuel to concerns about an AI bubble. Over the past five days alone, Nvidia and Oracle both plunged 10%.
This article breaks down the details of K2 Thinking and reveals why the model represents shifting narratives in the AI world.
For anyone who wants to know more about Moonshot AI and Kimi K2, I highly recommend this interview between Benita Zhang Xiaojun and Mootshot AI CEO Yang Zhilin.
K2 Thinking Breakdown
What is K2 Thinking: K2 Thinking is a reasoning model post-trained on top of Kimi K2, Moonshot AI’s foundation model released in July and September 2025. The relationship is similar to DeepSeek R1 and DeepSeek V3—same foundation, enhanced reasoning capabilities.
Size: Built on Kimi K2’s architecture, the model has 1 trillion parameters with 32 active parameters and a 256K token context window.
Training Cost: According to CNBC, training Kimi K2 Thinking cost $4.6 million, likely leaked by a Moonshot AI insider trying to replicate the PR success DeepSeek created earlier this year. The model was trained using the H800, an downgraded version of the H100 GPU, which was marketed exclusively in China until October 2023.
(Update: in Moonshot AI’s AMA, their researchers said this is not an official number. “It is hard to quantify the training cost because a major part is research and experiments.”)
Architecture: Kimi K2 and K2 Thinking share nearly identical architecture with DeepSeek-V3, but with minor differences. K2 is more sparse and omits DeepSeek’s double-head mechanism.
One real innovation is MuonClip, Moonshot’s proprietary Muon optimizer. This allowed them to train on a massive 15.5 trillion token dataset without a single training instability or loss spike.
Interleaved Thinking: K2 Thinking doesn’t just use tools. It thinks between every tool calling. This “interleaved thinking” (also found in MiniMax M2) mimics how humans actually work: you take an action, reflect on the results, assess if you’re on the right track, then decide your next move.
Agentic capabilities: This is K2 Thinking’s selling feature. The model can execute 200-300 sequential tool calls without human intervention, reasoning coherently across hundreds of steps. No other open model has demonstrated this level of sustained autonomous operation.
That means Kimi K2 Thinking can orchestrate a long chain of actions: think, select tool, call tool, get result, reason, call next tool, and repeat. For complex tasks that need many intermediate steps and decisions like deep research, multi-step coding, or combined web browsing and analysis, this is a major capability.
Moonshot calls this their latest advance in test-time scaling by scaling both thinking tokens and tool-calling steps. They first observed these continuous performance improvements back in Kimi K1.5 as context length increased.
INT4 Precision: Unlike previous Kimi K2 Instruct releases (which used FP8 precision), K2 Thinking ships natively in INT4. They achieved this through quantization-native training during post-training. As a result, K2 Thinking is only ~594GB versus over 1TB for K2 Instruct. This enables roughly 2 times generation speed while maintaining state-of-the-art performance. INT4 can also better support non-Blackwell architecture hardware.
One Moonshot AI engineer explained that INT4 quantization isn’t a trade-off. Done right during training, it reduces latency while maintaining quality, minimizes precision loss, enhances long-context reasoning stability, and even accelerates RL training by addressing “long-tail” inefficiencies.
Open Weights License: K2 Thinking surprisingly uses a modified MIT license with one requirement: if you’re using it for commercial products or services with over 100 million monthly active users OR over $20 million in monthly revenue, you must display the Kimi K2 branding in your user interface.
Benchmark Performance: K2 Thinking’s hype comes from its record-breaking performance on benchmarks that test real agentic capabilities:
44.9% on HLE with tools: Humanity’s Last Exam is a benchmark from the Center for AI Safety and Scale AI with thousands of difficult, closed-ended questions across academic disciplines. There’s even a “heavy mode” where K2 Thinking runs eight parallel trajectories, then aggregates them for a final answer. In heavy mode, it hits 51.0%.
60.2% on BrowseComp: K2 Thinking beats GPT-5 at browsing the internet for hard-to-find information.
71.3% on SWE-Bench Verified: Ttill behind GPT-5 and Claude Sonnet 4.5 on coding.
For general tasks, K2 Thinking trails Claude Sonnet 4.5 and GPT-5 but leads other open models.
One the day following K2 Thinking’s release, Artificial Analysis Intelligence Index showed that K2 placed second only to GPT-5, beating most closed-source models and all open models.
But here’s where things get interesting: K2 Thinking used 140M tokens across the evals, the highest ever. But with Moonshot’s base endpoint pricing ($0.6/$2.5 per million input/output tokens), the total cost to run the index was only $356, which is cheaper than leading frontier models. There’s also a turbo endpoint ($1.15/$8 per million tokens) that costs $1,172 to run the index, making it the second most expensive model after Grok 4. The base endpoint runs at ~8 output tokens/s, while turbo runs at ~50 output tokens/s.
I am also relying on other unofficial third-party evaluations for your reference. Zhihu contributor toyama nao published a detailed analysis concluding that K2 Thinking delivers genuine performance gains across reasoning benchmarks. It’s solidly first-tier intelligence, but at a steep token cost.
Meanwhile, a Chinese engineer and influencer ran a poll on X asking if K2 Thinking surpassed Claude Sonnet 4.5. 75.9% said it doesn’t.
Open Source Wins and Shifting AI Narratives
K2 Thinking is undeniably a major win for open-source LLMs. As Hugging Face CEO Clément Delangue put it: “The first time in a while that open-source gets ahead of proprietary APIs on their big area of focus (agents).”
The beauty of K2 Thinking is that it builds on previous open-source work of DeepSeek V3. And DeepSeek V3’s success traces back to LLaMA 2, which laid the foundation for the first generation of DeepSeek. This is collective intelligence in action: the open-source community has not only closed the gap with proprietary LLMs but has now surpassed them in certain tasks. Who says LLMs are a game only the deep-pocketed labs can play?
For Moonshot AI, MiniMax, and Z.AI (Zhipu), the past year has been tough. None have announced significant funding in months. Their commercial prospects looked shaky as domestic Chinese chatbots are all free to the public, making the U.S. subscription model impossible to replicate. The DeepSeek moment earlier this year nearly shattered public confidence in these startups. As a result, two of their peers (01.AI and Baichuan) have already abandoned LLM training altogether. Meanwhile, Chinese tech giants are attacking them on every front, from chatbots to video generation.
So their comeback over the past six months is genuinely remarkable. Despite DeepSeek’s phenomenon, these AI labs kept their heads down, focused on research innovations, prioritized training LLMs, and eventually delivered. Their hope is low-cost, high-performing open models that are good at coding and agentic tasks can help them crack international markets, win chatbot users, and gain adoption in enterprise and developer tools.
It’s starting to work. Z.AI now has around 100,000 monthly API users and 3 million free chatbot users overseas following their GLM-4.6 launch, according to CEO Li Zixuan in an interview with SCMP.
While these Chinese LLMs haven’t dethroned frontier U.S. models yet, the narrative and sentiment have shifted. Chinese LLMs have transformed from overlooked underdogs to AI frontrunners in just twelve months. As I mentioned in my previous post, a growing number of Silicon Valley’s hottest startups are now building their models and applications on these Chinese open LLMs.
The dramatic training cost reduction of K2 Thinking also hurts OpenAI’s justification for needing massive financial commitments and potential government support. If a Chinese startup can build a competitive model for under $5 million, why does OpenAI need $1.4 trillion in infrastructure spending?
As OpenAI’s GPT-5 release underwhelmed this year, CEO Sam Altman is now facing mounting criticism over his overhyped AGI promises and the trillion-dollar infrastructure deals that seem increasingly unrealistic given the massive capex requirements.
Last week on the BG2 podcast, host Brad Gerstner pressed Altman directly: “How can a company with $13 billion in revenues make $1.4 trillion of spend commitments?”
Altman’s response was defensive and terse. He claimed OpenAI’s revenue is “well more than that.” Then he cut Gerstner off: “If you want to sell your shares, I’ll find you a buyer. Enough.”
Meanwhile at a Wall Street Journal Tech Live event on Wednesday. CFO Sarah Friar suggested OpenAI was seeking an “ecosystem” of banks, private equity, and even government support to finance AI infrastructure investments. Specifically, she floated the idea that the U.S. government should “backstop” or provide a “guarantee” to enable the financing.
The comment ignited immediate backlash. Critics interpreted it as OpenAI asking U.S. taxpayers to absorb the risk if the company can’t pay for all the chips it’s committed to buying. David Sacks, the White House’s AI and crypto advisor, tweeted: “The U.S. has at least 5 major frontier model companies. If one fails, others will take its place.”
Friar quickly clarified on LinkedIn late Wednesday that OpenAI is not seeking a government backstop, admitting she “used the word ‘backstop’ and it muddied the point.”
I don’t doubt that scaling AI infrastructure, with more data centers and more power generation, is essential for next-generation models. But there’s a troubling trend. As OpenAI’s CEO becomes increasingly absorbed in headline-grabbing trillion-dollar deals rather than the heads-down research that produces real AGI breakthroughs, the company has lost something fundamental. This isn’t the OpenAI I once knew.
K2 Thinking is great, and so are MiniMax M2 and GLM 4.6. While these models haven’t yet matched the leading closed-source models, they’re closing the gap rapidly. Given the current pace of innovation from Chinese AI labs, a new open-source SOTA whether from Qwen-3-Max-Thinking, DeepSeek-V4, or others could emerge within weeks rather than months.

















interesting take, thanks for sharing the news.