Research Programs

Four programs, one mandate.

We organize the lab around research programs rather than outputs. Each program is a long-term line of inquiry with its own methods, projects, and impact.

Program 01

AI Evaluation

PoliceBenchLLM-as-JudgeBenchmarkingReliability

We build evaluation science for high-stakes settings — benchmarks, LLM-as-judge protocols, and reliability metrics that expose failure modes standard leaderboards miss.

Related projects

PoliceBench

Program 02

AI Governance

Procedural JusticeAccountabilityResponsible AIPolicy

Governance is not a document — it is a system. We study how public institutions can adopt AI while preserving legitimacy, oversight, and the procedural rights of the people they serve.

Related projects

Public Safety AI

Program 03

AI Assurance

VerificationProvenanceEvidenceFidelity

When an AI system informs a consequential decision, its reasoning must be verifiable after the fact. We develop provenance, evidence, and fidelity methods for end-to-end assurance.

Program 04

AI Systems

Agent SystemsSimulationWorkflowDeployment

We engineer the systems layer — multi-agent architectures, generative simulation, and human-in-the-loop workflows built to run in real operational settings.

Related projects

GABM모두의경찰관