Previous photoKFC Double Down
Human 46.6% yes53.4% no Model average 83.4% yes16.6% no Human distribution 46.6% yes, 53.4% no over 652 explicit votes. Model average distribution 83.4% yes, 16.6% no across the current model set. Closest current model 46.8% yes. Least aligned models 53.4 point gap. Legacy GPT-4o baseline 84.0% yes with a 37.4 point gap against humans. Biggest model gap 53.4 percentage points on this image. Current classification Human knife-edge Current classification Human knife-edge Models compared 67 current runs Biggest model gap 53.4 percentage points on this image. Closest model output 46.8% yes. 

WTFHuman knife-edge
Benchmark image 20
Bagel PB&J
Perpendicular peanut butter and jelly bagel "Sandwich"
A bagel hacked perpendicular into a peanut-butter-and-jelly arrangement turns a children's lunch into topology discourse. The filling is real, the bread surfaces are opposing, and the geometry is actively trying to get cited.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
minimax/minimax-01
30-way tie
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo


Selected human comments
minimax/minimax-01 comments
google/gemini-3-flash-preview comments