About Us

Why this website exists at all

OpenSandwich.ai is a deliberately small benchmark for a very real alignment problem: recover a fuzzy human category from noisy human votes, inconsistent comments, and edge cases that make both humans and models look less stable than they would prefer.

The public version of the project combines three things: a benchmark of twenty sandwich-adjacent photos, repeated model runs over that same photo set, and a live public survey that keeps the human baseline from turning into a static artifact.

What we measure

Not sandwich truth. Sandwich alignment.

The goal is not to prove a universal definition of sandwichhood. The goal is to measure how closely models track the human crowd when the category gets slippery, funny, annoying, or structurally ambiguous.

Why it matters

Small tasks expose larger habits.

If a model becomes overconfident on a low-stakes category boundary that humans themselves do not agree on, that tells you something useful about calibration, reasoning style, and how the system handles messy human concepts elsewhere.

Project shape

What lives inside OpenSandwich.ai

  • A benchmark index with photo-by-photo splits, model gaps, and comments.
  • A model-centric analytics page for cost, retries, token usage, and worst-case image behavior.
  • A public survey flow that lets the human baseline keep growing over time.
  • A canonical datastore that preserves raw benchmark outputs and import provenance.