The mizan CLI

mizan is the Python CLI for working with the dataset. It selects samples, applies mutations, and prepares datasets for evaluation.

All commands run from a directory containing mizan.json (the dataset root).

Installation

cd mizan-cli
poetry install
export PATH="$(poetry env info --path)/bin:$PATH"

Configuration

Optional configuration lives at ~/.config/mizan/config.json:

OptionDescriptionDefault
log_levelDEBUG, INFO, WARNING, or ERRORINFO
log_filePath to a log filenone

checkout

Select and export samples from the dataset into an output directory.

mizan checkout [OPTIONS]
OptionShortDescriptionDefault
--output-oOutput directory./output
--level-lfunction, file, crate, or allall
--vuln-ids-vSpecific vulnerability IDs (repeatable)none
--year-yFilter by yearnone
--cwe-types-cFilter by CWE type (repeatable)none
--include-fixedInclude fixed samples toofalse
# All function-level samples
mizan checkout --level function

# Two specific vulnerabilities
mizan checkout -v vuln-0001 -v vuln-0002

# Combine filters
mizan checkout --level function --year 2019 --cwe-types CWE-416 -o ./my-samples

checkout copies the selected samples and any dependencies they need, writes a workspace Cargo.toml, and emits a filtered mizan.json into the output directory.

mutate

Apply semantic-preserving mutations to checked-out samples. Run it from inside the checkout output directory.

cd output
mizan mutate [OPTIONS]
OptionShortDescriptionDefault
--mutations-mMutations to apply (repeatable)all
--seed-sRandom seed for reproducibility42
# A single mutation
mizan mutate -m remove-comments

# Several, applied in order
mizan mutate -m format-compact -m benign-comments

The full list of mutations, their categories, and ordering caveats are on the Mutations page. mutate updates mizan.json with corrected line numbers and writes a mizan_mutations.json log.

evaluate prepare-dataset

Convert checked-out samples into a parquet file for evaluation. Run it from the output directory.

mizan evaluate prepare-dataset [OPTIONS]
OptionShortDescriptionDefault
--output-oOutput parquet filedataset.parquet
--tag-tOptional tag to identify the datasetnone

The parquet bundles each sample's files and ground truth, plus dataset metadata (rust version, tag, applied mutations). It is the only artifact the evaluation harness consumes. See Evaluation.

Running evaluations

Use the run_eval.py script for full control over models, limits, and the agent scaffold:

cd mizan-cli
# Edit run_eval.py: dataset path, models, message/time limits
python run_eval.py

The script exposes the full evaluation configuration, including the agent, which can be replaced with a custom implementation. See Evaluation.