We scale Cascade RL to train general-purpose reasoning LLMs spanning RLHF, instruction following, math, code, and SWE.
Our 14B Thinking model achieves SOTA on LiveCodeBench—outperforming Gemini-2.5-Pro, o4-mini, Qwen3-235B, DeepSeek-R1-671B, and reaches silver-medal performance at IOI 2025.
We worked on developing math reasoning and reward models, and released AceMath-1.5B, 7B, and 72B,
along with AceMath-RM-7B and 72B, which surpassed GPT-4o and Qwen2.5-Math at the time.