Navigation Menu
I am an AI Researcher, working on reliable long-horizon AI agents, agentic reinforcement learning, and calibrated post-training.
My research asks a simple question:
How can AI agents know what they don’t know, act under uncertainty, and improve from their own prediction–reality gaps?
I build methods, environments, and evaluation frameworks that turn uncertainty, confidence, and consistency into first-class training signals for reliable and self-improving AI systems.
Homepage · Google Scholar · LinkedIn · X/Twitter · Email
- Agentic RL & Post-training
Calibration-aware on-policy distillation, GRPO/RL training, self-evolving environments, synthetic feedback, and reward/evaluator design for long-horizon agents. - Alignment, Calibration & Honesty
Uncertainty-aware supervision, confidence calibration, hallucination detection, factuality, scalable oversight, and reliable model behavior. - Long-horizon Agents & Evaluation
Tool use, planning, trajectory-level evaluation, deep research agents, evidence grounding, failure attribution, and enterprise-scale agent benchmarks.
- Prospective Hindsight
Self-calibrating reinforcement learning via prediction–reality gaps, aligning an agent’s action-time self-belief with verifier outcomes. - CaOPD: Calibration-aware On-policy Distillation
Decouples capability learning from honest confidence calibration in LLM post-training. - Agentic Uncertainty Quantification
Turns verbalized uncertainty into active control signals for memory, reflection, and long-horizon execution. - [ICML2026] Agentic Confidence Calibration
A trajectory-level calibration framework for diagnosing and improving the reliability of long-horizon agents. - [ACL2026] The Evolving Role of Uncertainty Quantification in Large Language Models
The evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior
For the full list of publications, please see my Google Scholar or homepage.
I am interested in reliable AI agents, agentic RL, post-training, calibration, uncertainty, scalable evaluation, and self-improving AI systems. Feel free to reach out via email or visit my homepage.
Pinned Loading
-
SURGroup/UQpy
SURGroup/UQpy PublicUQpy (Uncertainty Quantification with python) is a general purpose Python toolbox for modeling uncertainty in physical and mathematical systems.
-
Awesome-LLM-Uncertainty-Reliability-Robustness
Awesome-LLM-Uncertainty-Reliability-Robustness PublicAwesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models
-
intuit/sac3
intuit/sac3 PublicOfficial repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
-
Awesome-LLM-RAG
Awesome-LLM-RAG PublicAwesome-LLM-RAG: a curated list of advanced retrieval augmented generation (RAG) in Large Language Models
-
Awesome-LLM-Prompt-Optimization
Awesome-LLM-Prompt-Optimization PublicAwesome-LLM-Prompt-Optimization: a curated list of advanced prompt optimization and tuning methods in Large Language Models
-
SalesforceAIResearch/CaOPD
SalesforceAIResearch/CaOPD PublicCaOPD: Calibration-Aware On-Policy Distillation
If the problem persists, check the GitHub status page or contact support.



