Previous photoHashbrown SandwichNext photoPickle Sandwich
Human 39.8% yes60.2% no Model average 45.4% yes54.6% no Human distribution 39.8% yes, 60.2% no over 653 explicit votes. Model average distribution 45.4% yes, 54.6% no across the current model set. Closest current model 40.0% yes. Least aligned models 60.2 point gap. Legacy GPT-4o baseline 82.0% yes with a 42.2 point gap against humans. Biggest model gap 60.2 percentage points on this image. Current classification Split concept Current classification Split concept Models compared 67 current runs Biggest model gap 60.2 percentage points on this image. Closest model output 40.0% yes. 

DOGSplit concept
Benchmark image 10
Hot Dog
Hot dog "Sandwich"
A hot dog sits in its split bun, the most litigated piece of street food in American semantics. One continuous bread artifact, one sausage, infinite discourse from people who should probably log off.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
google/gemini-2.5-flash
13-way tie
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
google/gemini-2.5-flash comments
google/gemini-3-flash-preview comments