We scale Cascade RL and multi-domain on-policy distillation to train Nemotron-Cascade-2-30B-A3B, which achieves gold-medal-level results on IMO, IOI, and ICPC World Finals, along with strong performance on ArenaHard and instruction following.
Research Scientist @ NVIDIA
I am a Research Scientist at NVIDIA (ADLR Group). I received my Ph.D. from Georgia Tech in August 2024.
I currently work on scaling reinforcement learning for reasoning LLMs.
We scale Cascade RL and multi-domain on-policy distillation to train Nemotron-Cascade-2-30B-A3B, which achieves gold-medal-level results on IMO, IOI, and ICPC World Finals, along with strong performance on ArenaHard and instruction following.
We scale Cascade RL to train general-purpose reasoning LLMs spanning RLHF, instruction following, math, code, and SWE. Our 14B Thinking model achieves SOTA on LiveCodeBench—outperforming Gemini-2.5-Pro, o4-mini, Qwen3-235B, DeepSeek-R1-671B, and reaches silver-medal performance at IOI 2025.
We further scaled the SFT and studied the interplay with RL, and released the SOTA 7B model, AceReason-Nemotron-1.1-7B.
We scaled the previous work to both math and code domains and released the SOTA medium-sized model, AceReason-Nemotron-14B.
Our pilot study on scaling RL for competitive math reasoning LLMs. We released the AceMath-RL-Nemotron-7B model.
We worked on developing math reasoning and reward models, and released AceMath-1.5B, 7B, and 72B, along with AceMath-RM-7B and 72B, which surpassed GPT-4o and Qwen2.5-Math at the time.
Research · 2022–2024
Multimodal LLM
Building AI that understands the visual world
Click to expand
Research · 2019–2024
Multilingual LLM
Bridging representation across global languages
Click to expand