Submit leaderboard results
The leaderboard is a separate repository (the Hugging Face Space). Adding results means contributing the processed output of an Inspect-AI run to that repo.
First, run an evaluation and produce an Inspect-AI .eval file (see Evaluation). Then, in the leaderboard repo:
- Add the
.evalfile todata/eval_files/.cp your_experiment.eval data/eval_files/ - Register it in
data/leaderboard_config.jsonby adding an entry to theexperimentsarray:{ "name": "Agent + Model", "eval_path": "data/eval_files/your_experiment.eval" } - Add the variant (if new). If your eval uses a new
tag, map it to a display name indata/dataset_info.json:{ "your_tag": "Display Name" } - Run preprocessing.
This reads eachpython preprocess_evals.py.evalfile, extracts the per-sample scores intodata/experiments/<name>_<tag>.json, and regeneratesdata/processed_config.json, which the app loads at startup. - Open a pull request against the Space with your changes. You can browse and create pull requests from the Space's Community tab: open pull requests.
The committed JSON files in data/experiments/ (not the large .eval files) are what the app serves. See the leaderboard repo's CONTRIBUTING.md for the canonical version of these steps.
Publish the trajectories
The Sample-wise Comparison tab links each result to its full trajectory in the rust-mizan-logs Inspect log viewer. That viewer is regenerated from the raw .eval files (which are not stored in the repo), so refresh it after adding runs:
export HF_TOKEN=hf_... # write access to sfu-rsl
python publish_logs.py # defaults to ../agentic_evals/logs
This bundles the .eval files into a static Inspect viewer and uploads it to the Space, replacing the previous contents. Pass --logs-dir / --space to override the defaults.