RustMizan
A compilable, contamination-aware benchmarking framework for Rust vulnerability analysis.
Get started · GitHub · Leaderboard · Trajectories
RustMizan (Mizan - Arabic for "scale" or "balance") evaluates both traditional and LLM-based vulnerability analysis techniques in Rust. It pairs a curated dataset of real-world vulnerabilities with the infrastructure to evaluate them.
The dataset is a curated set of real-world memory-safety CVEs, each packaged as compilable variants at the crate, file, and function levels. Every variant ships with ground-truth annotations for four tasks: Crate Vulnerability Classification (CVC), CWE classification, function localization, and line localization.

Design principles
- Fully compilable. Every variant compiles, so it can be analyzed by traditional tools (static analyzers, formal verification) and explored by agents that build and run the code. See the Dataset.
- Multi-level context. Each vulnerability is available at crate, file, and function levels, so you can study how context granularity affects analysis.
- Contamination-aware. A pluggable mutation framework applies semantic-preserving transformations that change syntax while preserving the vulnerability, so you can probe memorization versus reasoning.
- Extensible. Adding a vulnerability or a mutation is a small, well-defined task. See Contributing.
- Transparent. Every evaluation run is published as a complete agent trajectory (prompts, reasoning, tool calls, and scoring), browsable in an Inspect log viewer and linked from each result on the Leaderboard.
How it compares
Most vulnerability benchmarks use non-compilable snippets, fix a single context level, focus on binary detection, and rarely handle contamination or target Rust. RustMizan combines all of these in one benchmark: compilable variants, the same vulnerability at multiple context levels, the full analysis pipeline (CVC, CWE classification, and function- and line-level localization), built-in contamination and robustness testing, and a focus on Rust.
Where to go next
| If you want to... | Read |
|---|---|
| Install and run the full pipeline | Getting Started |
| Understand the dataset and its layout | Dataset |
Use the mizan command-line tool | The mizan CLI |
| Learn the mutations and how they preserve ground truth | Mutations |
| See how models are scored | Evaluation |
| Read or submit results | Leaderboard |
| Add a vulnerability, a mutation, or results | Contributing |
Acknowledgements
This work is done at the Reliable Systems Lab at Simon Fraser University, led by Dr. Steven Ko.
Licensed under the Apache License, Version 2.0.