RustMizan

A compilable, contamination-aware benchmarking framework for Rust vulnerability analysis.

Get started · Paper · GitHub · Vanilla Dataset · Leaderboard · Trajectories · Analysis

RustMizan (Mizan - Arabic for "scale" or "balance") evaluates both traditional and LLM-based vulnerability analysis techniques in Rust. It pairs a curated dataset of real-world vulnerabilities with the infrastructure to evaluate them.

The dataset is a curated set of real-world memory-safety CVEs, each packaged as compilable variants at the crate, file, and function levels. Every variant ships with ground-truth annotations for four tasks: Crate Vulnerability Classification (CVC), CWE classification, function localization, and line localization.

RustMizan overview

Design principles

Fully compilable. Every variant compiles, so it can be analyzed by traditional tools (static analyzers, formal verification) and explored by agents that build and run the code. See the Dataset.
Multi-level context. Each vulnerability is available at crate, file, and function levels, so you can study how context granularity affects analysis.
Contamination-aware. A pluggable mutation framework applies semantic-preserving transformations that change syntax while preserving the vulnerability, so you can probe memorization versus reasoning.
Extensible. Adding a vulnerability or a mutation is a small, well-defined task. See Contributing.
Transparent. Every evaluation run is published as a complete agent trajectory (prompts, reasoning, tool calls, and scoring), browsable in an Inspect log viewer and linked from each result on the Leaderboard. Every run is also analyzed automatically with Docent for contamination signals.

How it compares

Most vulnerability benchmarks use non-compilable snippets, fix a single context level, focus on binary detection, and rarely handle contamination or target Rust. RustMizan combines all of these in one benchmark: compilable variants, the same vulnerability at multiple context levels, the full analysis pipeline (CVC, CWE classification, and function- and line-level localization), built-in contamination and robustness testing, and a focus on Rust.

Where to go next

If you want to...	Read
Install and run the full pipeline	Getting Started
Understand the dataset and its layout	Dataset
Use the `mizan` command-line tool	The mizan CLI
Learn the mutations and how they preserve ground truth	Mutations
See how models are scored	Evaluation
Read or submit results	Leaderboard
See how runs are analyzed for contamination	Trajectory analysis
Add a vulnerability, a mutation, or results	Contributing

Citation

@misc{elsayed2026rustmizancompilablecontaminationawarebenchmarking,
title={RustMizan: A Compilable, Contamination-Aware Benchmarking Framework for Rust Vulnerabilities},
author={Tarek Elsayed and Shiping Yang and Eunsong Koh and Sanika Goyal and Vincent Huang and Paul Ngo and Nathan Young and Mohammad Omidvar Tehrani and Alvyn Kang and Arnell Kang and Zeyu Chen and Angélica Moreira and Xuan Feng and Angel X. Chang and Nick Sumner and Steven Y. Ko},
year={2026},
eprint={2607.04729},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2607.04729},
}

Acknowledgements

This work is done at the Reliable Systems Lab at Simon Fraser University, led by Dr. Steven Ko.

Licensed under the Apache License, Version 2.0.