Four programs, one mandate.
We organize the lab around research programs rather than outputs. Each program is a long-term line of inquiry with its own methods, projects, and impact.
AI Evaluation
We build evaluation science for high-stakes settings — benchmarks, LLM-as-judge protocols, and reliability metrics that expose failure modes standard leaderboards miss.
Related projects
AI Governance
Governance is not a document — it is a system. We study how public institutions can adopt AI while preserving legitimacy, oversight, and the procedural rights of the people they serve.
Related projects
AI Assurance
When an AI system informs a consequential decision, its reasoning must be verifiable after the fact. We develop provenance, evidence, and fidelity methods for end-to-end assurance.
AI Systems
We engineer the systems layer — multi-agent architectures, generative simulation, and human-in-the-loop workflows built to run in real operational settings.