LegalChain is the public benchmark intro site for Legal-10 and the main showcase for AGChain. AGChain is the benchmark authoring platform under development; LegalChain is where the benchmark, methodology, leaderboard, and pitch are published today.
"The transition from atomic prompts to stress-tests that evaluate multi-step, complex reasoning in chained, stateful conditions requires new standards."
AGChain provides the evaluation infrastructure; LegalChain is the public benchmark surface running on top of it. Unlike traditional benchmarks that test isolated questions, this stack evaluates 10-step chained reasoning where errors propagate realistically, verifying citations against a sealed universe of 27,733 Supreme Court opinions and 378,938 extracted citation occurrences. With structural no-leak architecture and deterministic synthetic traps, it distinguishes grounded legal reasoning from hallucination in high-stakes workflows.
Top Performance
FULL_LOGS ->| Model | Composite | S8 Integrity | Latency | Cost |
|---|---|---|---|---|
|
Leaderboard preview
Open the full leaderboard to view results.
VIEW_LEADERBOARD ->
|
||||