Eval runner

Paste prompts, pick models, fan out in parallel. Results stream live into the matrix below.

Dataset

Saved dataset

Prompts (one per line, blank lines ignored)

0 prompts

Run name (optional)

Data hosting

Privacy mode

Models

· · ·

0prompts 0models 0total cells

Add prompts + pick at least one model above, then click Run eval.

Click a row to load it back into the matrix + synthesis.

When	Name	Cells	Passed	Credits	Status

—

Top 6 models from the latest run shown by default. Click a row in the table to toggle that model on/off.

Compare vs

	Model	Latest pass	Δ vs prev	Runs