Previous photoGrilled Cheese PineappleNext photoHamburger
Human 54.2% yes45.8% no Model average 8.7% yes91.3% no Human distribution 54.2% yes, 45.8% no over 655 explicit votes. Model average distribution 8.7% yes, 91.3% no across the current model set. Closest current model 52.0% yes. Least aligned models 54.2 point gap. Legacy GPT-4o baseline 0.0% yes with a 54.2 point gap against humans. Biggest model gap 54.2 percentage points on this image. Current classification Human knife-edge Current classification Human knife-edge Models compared 67 current runs Biggest model gap 54.2 percentage points on this image. Closest model output 52.0% yes. 

KTYHuman knife-edge
Benchmark image 07
Kitten in Bread
Cat "Sandwich"
A kitten has been placed between two slices of bread, producing a meme that is structurally sandwich-shaped and operationally a felony against common sense. This is where ontology leaves the lab and starts posting.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
openai/gpt-4.1-nano
40-way tie
Benchmark context
Model spread
How Models Align with Human Responses
This compares each model against human responses to show how closely it aligns with people.Human rate marker
Vote card
Generated summary for this photo



Selected human comments
openai/gpt-4.1-nano comments
google/gemini-3-flash-preview comments