Deepseek API Python - Search News

17d

Qwen3-Max Thinking beats Gemini 3 Pro and GPT-5.2 on Humanity's Last Exam (with search)

On HMMT Feb 25, a rigorous reasoning benchmark, Qwen3-Max-Thinking scored 98.0, edging out Gemini 3 Pro (97.5) and significantly leading DeepSeek V3.2 (92.5).

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Qwen3-Max Thinking beats Gemini 3 Pro and GPT-5.2 on Humanity's Last Exam (with search)

Trending now