Not sandwich truth. Sandwich alignment.
The goal is not to prove a universal definition of sandwichhood. The goal is to measure how closely models track the human crowd when the category gets slippery, funny, annoying, or structurally ambiguous.
OpenSandwich.ai is a deliberately small benchmark for a very real alignment problem: recover a fuzzy human category from noisy human votes, inconsistent comments, and edge cases that make both humans and models look less stable than they would prefer.
The public version of the project combines three things: a benchmark of twenty sandwich-adjacent photos, repeated model runs over that same photo set, and a live public survey that keeps the human baseline from turning into a static artifact.
The goal is not to prove a universal definition of sandwichhood. The goal is to measure how closely models track the human crowd when the category gets slippery, funny, annoying, or structurally ambiguous.
If a model becomes overconfident on a low-stakes category boundary that humans themselves do not agree on, that tells you something useful about calibration, reasoning style, and how the system handles messy human concepts elsewhere.