Eval runner

Paste prompts, pick models, fan out in parallel. Results stream live into the matrix below.

Dataset

0 prompts
· · ·
0prompts 0models 0total cells

Ready when you are.

Add prompts + pick at least one model above, then click Run eval.

Past eval runs

Click a row to load it back into the matrix + synthesis.
When Name Cells Passed Credits Status

Drift

Top 6 models from the latest run shown by default. Click a row in the table to toggle that model on/off.
Compare vs
Model Latest pass Δ vs prev Runs

Run diff