03:22
Kimi K2 Thinking set new records in benchmark assessments evaluating reasoning, coding, and agent capabilities.
Jin10 Data, November 8 — According to the Moon's Dark Side official website, Kimi K2 Thinking has set new records in benchmark assessments evaluating reasoning, coding, and agent capabilities. K2 Thinking achieved a state-of-the-art score of 44.9% in the HLE benchmark, reached 60.2% in the BrowseComp test, and in SWE-Bench

