EXP-Bench: Can AI Conduct AI Research Experiments? Arxiv 2025.

EXP-Bench is the first benchmark to evaluate AI agents on research experiment tasks that are semi-autonomously constructed from top-tier ML research papers.

June 2025 · Patrick Tser Jern Kon, Jiachen Liu, Xinyi Zhu, Qiuyi Ding, Jingjia Peng, Jiarong Xing, Yibo Huang, Yiming Qiu, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Matei Zaharia and Ang Chen

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents. Arxiv 2025.

Curie is the first AI-agent framework designed for automated and rigorous scientific experimentation. Curie helps answer your curiosity through end-to-end experimentation automation, ensuring that every step—from hypothesis formulation to result interpretation—is conducted with precision, reliability, and reproducibility.

January 2025 · Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning Yang, Yibo Huang, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury and Ang Chen