Previous photoCookie PBNext photoWaffle Ice Cream
Human 22.6% yes77.4% no Model average 47.7% yes52.3% no Human distribution 22.6% yes, 77.4% no over 655 explicit votes. Model average distribution 47.7% yes, 52.3% no across the current model set. Closest current model 16.0% yes. Least aligned models 77.4 point gap. Legacy GPT-4o baseline 0.0% yes with a 22.6 point gap against humans. Biggest model gap 77.4 percentage points on this image. Current classification Split concept Current classification Split concept Models compared 67 current runs Biggest model gap 77.4 percentage points on this image. Closest model output 16.0% yes. 

WRPSplit concept
Benchmark image 15
Chicken Wrap
Chicken, Cesar, lettuce and tomato "Sandwich"
A chicken Caesar wrap bundles meat, lettuce, and sauce into a tortilla tube that lives permanently in sandwich-adjacent limbo. It is the kind of object that makes taxonomies collapse into a Slack thread.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
bytedance-seed/seed-2.0-lite
12-way tie
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
bytedance-seed/seed-2.0-lite comments
minimax/minimax-01 comments