RustMizan

A compilable, contamination-aware benchmarking framework for Rust vulnerability analysis.

Get started · GitHub · Leaderboard · Trajectories


RustMizan (Mizan - Arabic for "scale" or "balance") evaluates both traditional and LLM-based vulnerability analysis techniques in Rust. It pairs a curated dataset of real-world vulnerabilities with the infrastructure to evaluate them.

The dataset is a curated set of real-world memory-safety CVEs, each packaged as compilable variants at the crate, file, and function levels. Every variant ships with ground-truth annotations for four tasks: Crate Vulnerability Classification (CVC), CWE classification, function localization, and line localization.

RustMizan overview

Design principles

  • Fully compilable. Every variant compiles, so it can be analyzed by traditional tools (static analyzers, formal verification) and explored by agents that build and run the code. See the Dataset.
  • Multi-level context. Each vulnerability is available at crate, file, and function levels, so you can study how context granularity affects analysis.
  • Contamination-aware. A pluggable mutation framework applies semantic-preserving transformations that change syntax while preserving the vulnerability, so you can probe memorization versus reasoning.
  • Extensible. Adding a vulnerability or a mutation is a small, well-defined task. See Contributing.
  • Transparent. Every evaluation run is published as a complete agent trajectory (prompts, reasoning, tool calls, and scoring), browsable in an Inspect log viewer and linked from each result on the Leaderboard.

How it compares

Most vulnerability benchmarks use non-compilable snippets, fix a single context level, focus on binary detection, and rarely handle contamination or target Rust. RustMizan combines all of these in one benchmark: compilable variants, the same vulnerability at multiple context levels, the full analysis pipeline (CVC, CWE classification, and function- and line-level localization), built-in contamination and robustness testing, and a focus on Rust.

Where to go next

If you want to...Read
Install and run the full pipelineGetting Started
Understand the dataset and its layoutDataset
Use the mizan command-line toolThe mizan CLI
Learn the mutations and how they preserve ground truthMutations
See how models are scoredEvaluation
Read or submit resultsLeaderboard
Add a vulnerability, a mutation, or resultsContributing

Acknowledgements

This work is done at the Reliable Systems Lab at Simon Fraser University, led by Dr. Steven Ko.

Licensed under the Apache License, Version 2.0.