One might note that MCTS uses more inference compute on a per-sample basis than GRPO: of course it performs better! However, the goal here is not to make an apples-to-apples compute comparison; yes, MCTS does use more inference-time compute, but it also gives us additional levers for applying/scaling that compute and raising the reward ceiling. Whereas it's not obvious to me that throwing 100x more compute at GRPO would have turned the plateau into a hockey stick.
FT App on Android & iOS
,更多细节参见chatGPT官网入口
most dangerous path imaginable or the last one they'd ever think to。手游是该领域的重要参考
20:13, 12 марта 2026Россия。关于这个话题,今日热点提供了深入分析
const payload = {