This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...
Researchers at the University of Science and Technology of China have developed a new reinforcement learning (RL) framework that helps train large language models (LLMs) for complex agentic tasks ...
Artificial intelligence for formal mathematical reasoning startup Harmonic AI Inc. announced today that it has raised $120 million in new funding on a $1.45 billion valuation. The funding is intended ...
Ribbit Capital Leads Round at $1.45B Valuation of Math-Based AI Venture; Emerson Collective Joins Existing Backers Including Sequoia & Kleiner Perkins PALO ALTO, Calif.--(BUSINESS WIRE)--Harmonic, the ...
Gemini 3 is Google’s latest AI model, offering improvements in reasoning, coding, and multimodal analysis. New features include the Gemini Agent tool and generative interfaces, such as visual layout ...
A few months before the 2025 International Mathematical Olympiad (IMO) in July, a three-person team at OpenAI made a long bet that they could use the competition’s brutally tough problems to train an ...
A new research paper from Apple details a technique that speeds up large language model responses, while preserving output quality. Here are the details. Traditionally, LLMs generate text one token at ...
Details about OpenAI’s upcoming GPT-5 model have leaked. GitHub accidentally published details of the upcoming model and its four variants in a blog, which was later withdrawn. The leak points to ...
Grok 4 and its reasoning-focused counterpart, Grok 4 Heavy, arrived with an immediate sense of ambition, offering multimodal AI designed to handle coding, logic, and perception tasks. In the initial ...
ChatGPT's o3 is OpenAI's best model to date because it features reasoning, and it might get even better in the next update. As spotted on X, OpenAI is testing a new "Alpha" variant of the o3 model, ...
Grok 4 is a huge leap from Grok 3, but how good is it compared to other models in the market, such as Gemini 2.5 Pro? We now have answers, thanks to new independent benchmarks. LMArena.ai, which is an ...
Two Ohio State Marion professors have secured a $20,000 grant to modernize high school math education by integrating data science. According to an announcement, physics Associate Professor Chris Orban ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results