|
Research
My prior research centered on Interpretability in Computer Vision Models (CNNs and ViTs)
and ensuring the reliability of LLM-as-a-Judge frameworks.
Building on this foundation, I have pivoted to Large-scale Post-training,
where I design algorithms specifically aimed at eliciting and enhancing the complex reasoning capabilities of LLMs. |
|
News
[Jan. 28, 2026] Two papers accepted to ICLR 2026. Thanks to my excellent collaborators. See you in Rio!
[Sept. 2025] Two papers accepted to EMNLP 2025 and One paper accepted to NeurIPS 2025. See you in Suzhou and SD!
[Jun. 2025] Started as a research intern at Tongyi Lab Qwen Pilot Team, Hangzhou. Excited to meet friends in Hangzhou!
|
|
Selected Publications
Some interesting papers about my work and research on LLM Post-training and Reinforcement Learning.
|
|
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
Kexin Huang, Haoming Meng, Junkang Wu, Jinda Lu, Chiyu Ma, Ziqian Chen, Xue Wang, Bolin Ding, Jiancan Wu, Xiang Wang, Xiangnan He, Guoyin Wang, Jingren Zhou
ICLR 2026
Paper
This paper proposes that log-probability difference is a more promising metric than entropy for evaluating how LLMs evolve during training, a claim supported by both empirical and theoretical analysis.
|
|
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs
Haoming Meng, Kexin Huang, Shaohang Wei, Chiyu Ma, Shuo Yang, Xue Wang, Guoyin Wang, Bolin Ding, Jingren Zhou
ICLR 2026
Paper
This paper sheds light on the distributional changes induced by RLVR and provides a granular, token-level lens for understanding and improving RL fine-tuning in LLMs.
|
|
My earlier works on the robustness of LLMs and LLM-as-Judge.
|
|
Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge
Chiyu Ma* , Enpei Zhang*, Yilun Zhao, Wenjun Liu, Yaning Jia, Peijun Qing, Lin Shi, Arman Cohan, Yujun Yan, Soroush Vosoughi
Findings of EMNLP 2025
Paper
This paper provides counter-intuitive findings that multi-agent based LLM-as-Judges do not always provide reliable answers as people previously thought. We explored the phenomenon of bias amplifications in both Multi-agent Debate and LLm-as-Meta-Judge settings.
|
|
Judging the judges: A systematic study of position bias in llm-as-a-judge
Lin Shi, Chiyu Ma , Wenhua Liang, Xingjian Diao, Weicheng Ma, Soroush Vosoughi
Orals of AACL 2025
Paper
This paper provides a thorough analysis on how position bias spread over pair-wise and list-wise comparisons in the state-of-the-arts LLMs such as Gemini, GPT, and Claude.
|
|
Achieving Domain-Independent Certified Robustness via Knowledge Continuity
Alan Sun, Chiyu Ma , Kenneth Ge, Soroush Vosoughi
NeurIPS 2024
Paper
This paper proposes knowledge continuity, a novel definition inspired by Lipschitz continuity which aims to certify the robustness of neural networks across input domains (such as continuous and discrete domains in vision and language, respectively).
|
My earlier works on Interpretability. Although I no longer work in this area, these projects represent a cherished part of my research journey.
|
|
ProtoPairNet: Interpretable Regression through Prototypical Pair Reasoning
Rose Gurung, Ronilo Ragodos, Chiyu Ma , Tong Wang, Chaofan Chen
NeurIPs 2025
Paper
This paper propopses an interpretable alorightms with prototypical pairs for tasks with continuous labels. This algorithm is further evaluated in Reward Prediction (RL settings) and Age predictions (Classical Regression tasks).
|
|
Interpretable Image Classification with Adaptive Prototype-based Vision Transformers
Chiyu Ma , Jon Donnelly, Wenjun Liu, Soroush Vosoughi, Cynthia Rudin, Chaofan Chen
NeurIPs 2024
Paper
This paper propopses an interpretable alorightms on ViTs with Prototical Parts. By greedying matching algorithms, we decomposes the prototypical parts into small patches that can freely learn features to represent a more local feature.
|
|
This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations
Chiyu Ma* , Brandon Zhao *, Chaofan Chen, Cynthia Rudin
NeurIPs 2023
Paper
This paper propopses an interpretable alorightms on CNNs that provides geometrically equivalent visualizations of prototypical concepts. This is the first work in prototyped based methods that extract concepts from training set.
|
|
Academic Service
Conference Reviewer:
ICLR (2025, 2026), ICML (2024–2026), NeurIPS (2024, 2025), AAAI (AISI Track 2023)
Workshop Reviewer:
NeurIPS IAI Workshop (2024), ICLR LLM Reasoning and Planning Workshop (2025)
Journal Reviewer:
Transactions on Machine Learning Research (TMLR)
|
Education
-
Ph.D. in Computer Science, Dartmouth College, 2028 (Expected)
-
M.S. in Statistical Science, Duke University, 2023
-
B.S. in Statistics (with Honors), Carnegie Mellon University, 2021
|
|