Scientific Experimentation

EXP-Bench: Can AI Conduct AI Research Experiments? ICLR 2026.

EXP-Bench is the first benchmark to evaluate AI agents on research experiment tasks that are semi-autonomously constructed from top-tier ML research papers.

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents. Arxiv 2025.

Curie is the first AI-agent framework designed for automated and rigorous scientific experimentation. Curie helps answer your curiosity through end-to-end experimentation automation, ensuring that every step—from hypothesis formulation to result interpretation—is conducted with precision, reliability, and reproducibility.