Previous photoGrilled Cheese Pineapple Next photoHamburger

KTYHuman knife-edge

Benchmark image 07

Kitten in Bread

Cat "Sandwich"

A kitten has been placed between two slices of bread, producing a meme that is structurally sandwich-shaped and operationally a felony against common sense. This is where ontology leaves the lab and starts posting.

Under development: this benchmark and its published results are provisional, not final.

Human

54.2% yes45.8% no

Model average

7.9% yes92.1% no

Most aligned model

2.2 point gap from humans

openai/gpt-4.1-nano

Least aligned models

45-way tie

openai/o1-pro

openai/o1

openai/gpt-5.1-chat+42 more

At a glance

How this photo split the room

Human distribution

54.2% yes, 45.8% no over 655 explicit votes.

Model average distribution

7.9% yes, 92.1% no across the current model set.

Closest current model

52.0% yes.

openai/gpt-4.1-nano

Least aligned models

54.2 point gap.

45-way tie

Legacy GPT-4o baseline

0.0% yes with a 54.2 point gap against humans.

Biggest model gap

54.2 percentage points on this image.

Current classification

Human knife-edge

Benchmark context

Current classification

Human knife-edge

Models compared

74 current runs

Biggest model gap

54.2 percentage points on this image.

Closest model output

52.0% yes.

Model spread

How Models Align with Human Responses

This compares each model against human responses to show how closely it aligns with people.Human rate marker

anthropic/claude-haiku-4.5

100.0% no0.0% yes

Human gap54.2%

Rank #52

anthropic/claude-opus-4.5

100.0% no0.0% yes

Human gap54.2%

Rank #56

anthropic/claude-opus-4.6

100.0% no0.0% yes

Human gap54.2%

Rank #63

anthropic/claude-opus-4.7

100.0% no0.0% yes

Human gap54.2%

Rank #38

anthropic/claude-opus-4.8

100.0% no0.0% yes

Human gap54.2%

Rank #40

anthropic/claude-sonnet-4.6

100.0% no0.0% yes

Human gap54.2%

Rank #62

bytedance-seed/seed-1.6

100.0% no0.0% yes

Human gap54.2%

Rank #41

bytedance-seed/seed-1.6-flash

100.0% no0.0% yes

Human gap54.2%

Rank #20

google/gemini-2.5-flash

100.0% no0.0% yes

Human gap54.2%

Rank #21

google/gemini-2.5-pro

100.0% no0.0% yes

Human gap54.2%

Rank #25

google/gemini-3-flash-preview

100.0% no0.0% yes

Human gap54.2%

Rank #75

google/gemini-3-pro-image-preview

100.0% no0.0% yes

Human gap54.2%

Rank #42

google/gemini-3.1-flash-image-preview

100.0% no0.0% yes

Human gap54.2%

Rank #24

google/gemini-3.1-flash-lite-preview

100.0% no0.0% yes

Human gap54.2%

Rank #55

google/gemini-3.1-pro-preview

100.0% no0.0% yes

Human gap54.2%

Rank #45

google/gemma-3-12b-it

100.0% no0.0% yes

Human gap54.2%

Rank #26

google/gemma-3-27b-it

100.0% no0.0% yes

Human gap54.2%

Rank #48

GPT-4o (Spring 2024)

100.0% no0.0% yes

Human gap54.2%

Rank #4

minimax/minimax-01

100.0% no0.0% yes

Human gap54.2%

Rank #72

mistralai/mistral-large-2512

100.0% no0.0% yes

Human gap54.2%

Rank #71

mistralai/pixtral-large-2411

100.0% no0.0% yes

Human gap54.2%

Rank #50

moonshotai/kimi-k2.5

100.0% no0.0% yes

Human gap54.2%

Rank #13

openai/gpt-4.1

100.0% no0.0% yes

Human gap54.2%

Rank #74

openai/gpt-4.1-mini

100.0% no0.0% yes

Human gap54.2%

Rank #57

openai/gpt-4o

100.0% no0.0% yes

Human gap54.2%

Rank #15

openai/gpt-4o-2024-11-20

100.0% no0.0% yes

Human gap54.2%

Rank #67

openai/gpt-4o-mini

100.0% no0.0% yes

Human gap54.2%

Rank #61

openai/gpt-5.1

100.0% no0.0% yes

Human gap54.2%

Rank #49

openai/gpt-5.1-chat

100.0% no0.0% yes

Human gap54.2%

Rank #8

openai/gpt-5.1-codex

100.0% no0.0% yes

Human gap54.2%

Rank #37

openai/gpt-5.3-chat

100.0% no0.0% yes

Human gap54.2%

Rank #30

openai/gpt-5.4

100.0% no0.0% yes

Human gap54.2%

Rank #59

openai/gpt-5.4-nano

100.0% no0.0% yes

Human gap54.2%

Rank #31

openai/gpt-5.4-pro

100.0% no0.0% yes

Human gap54.2%

Rank #65

openai/gpt-5.5

100.0% no0.0% yes

Human gap54.2%

Rank #46

openai/o1

100.0% no0.0% yes

Human gap54.2%

Rank #2

openai/o1-pro

100.0% no0.0% yes

Human gap54.2%

Rank #1

openai/o3-pro

100.0% no0.0% yes

Human gap54.2%

Rank #53

perplexity/sonar-pro-search

100.0% no0.0% yes

Human gap54.2%

Rank #32

qwen/qwen-2-vl-72b-instruct

100.0% no0.0% yes

Human gap54.2%

Rank #29

qwen/qwen2.5-vl-32b-instruct

100.0% no0.0% yes

Human gap54.2%

Rank #39

qwen/qwen2.5-vl-72b-instruct

100.0% no0.0% yes

Human gap54.2%

Rank #70

qwen/qwen3-vl-235b-a22b-instruct

100.0% no0.0% yes

Human gap54.2%

Rank #47

qwen/qwen3-vl-30b-a3b-instruct

100.0% no0.0% yes

Human gap54.2%

Rank #66

qwen/qwen3-vl-30b-a3b-thinking

100.0% no0.0% yes

Human gap54.2%

Rank #22

x-ai/grok-4.20-beta

100.0% no0.0% yes

Human gap54.2%

Rank #17

baidu/ernie-4.5-vl-28b-a3b

99.0% no1.0% yes

Human gap53.2%

Rank #69

bytedance-seed/seed-2.0-mini

99.0% no1.0% yes

Human gap53.2%

Rank #19

openai/gpt-5.4-mini

99.0% no1.0% yes

Human gap53.2%

Rank #28

openrouter/healer-alpha

99.0% no1.0% yes

Human gap53.2%

Rank #10

bytedance-seed/seed-2.0-lite

98.0% no2.0% yes

Human gap52.2%

Rank #14

openai/gpt-5.3-codex

97.0% no3.0% yes

Human gap51.2%

Rank #44

amazon/nova-pro-v1

96.1% no3.9% yes

Human gap50.3%

Rank #73

qwen/qwen3.5-plus-02-15

96.0% no4.0% yes

Human gap50.2%

Rank #35

qwen/qwen3.5-397b-a17b

95.0% no5.0% yes

Human gap49.2%

Rank #34

z-ai/glm-4.6v

94.1% no5.9% yes

Human gap48.3%

Rank #58

allenai/molmo-2-8b

94.0% no6.0% yes

Human gap48.2%

Rank #6

amazon/nova-2-lite-v1

94.0% no6.0% yes

Human gap48.2%

Rank #60

openai/gpt-5.2

94.0% no6.0% yes

Human gap48.2%

Rank #43

google/gemini-2.5-flash-lite

92.0% no8.0% yes

Human gap46.2%

Rank #54

nvidia/nemotron-nano-12b-v2-vl

92.0% no8.0% yes

Human gap46.2%

Rank #7

qwen/qwen3.5-35b-a3b

92.0% no8.0% yes

Human gap46.2%

Rank #23

qwen/qwen3.5-flash-02-23

87.0% no13.0% yes

Human gap41.2%

Rank #9

openai/o3

86.4% no13.6% yes

Human gap40.6%

Rank #64

qwen/qwen3.5-9b

86.0% no14.0% yes

Human gap40.2%

Rank #27

x-ai/grok-4

86.0% no14.0% yes

Human gap40.2%

Rank #12

qwen/qwen3.5-27b

84.0% no16.0% yes

Human gap38.2%

Rank #18

amazon/nova-lite-v1

82.0% no18.0% yes

Human gap36.2%

Rank #51

meta-llama/llama-3.2-11b-vision-instruct

73.0% no27.0% yes

Human gap27.2%

Rank #3

qwen/qwen3.5-122b-a10b

60.0% no40.0% yes

Human gap14.2%

Rank #11

openai/gpt-4.1-nano

48.0% no52.0% yes

Human gap2.2%

Rank #36

x-ai/grok-4-fast

29.0% no71.0% yes

Human gap16.8%

Rank #5

x-ai/grok-4.1-fast

23.0% no77.0% yes

Human gap22.8%

Rank #16

meta-llama/llama-4-scout

21.0% no79.0% yes

Human gap24.8%

Rank #33

meta-llama/llama-4-maverick

19.5% no80.5% yes

Human gap26.3%

Rank #68

Selected human comments

Yes only if someone is gonna eat it, otherwise its art.
No, it is a photo of a cat sandwich.
a cute one!

openai/gpt-4.1-nano comments

This image depicts a cat with two slices of bread on its back, resembling a sandwich. It plays with the concept of what constitutes a sandwich, emphasizing the importance of bread as a defining feature.
This is an image of a cat with slices of bread on its back, not a sandwich. A sandwich typically involves bread with fillings in between, not an animal with bread.

google/gemini-3-flash-preview comments

While the structure follows the 'filling between two slices of bread' rule, a sandwich must be edible and intended for consumption. This is a kitten, which makes it a 'purr-ito' at best, but definitely not a sandwich.
While the structure follows the 'filling between two slices of bread' rule, a sandwich must be edible. This is a living creature, making it a 'purr-ito' at best, but definitely not a food item.

Vote card

Kitten in Bread

How this photo split the room

How Models Align with Human Responses

Selected human comments

openai/gpt-4.1-nano comments

google/gemini-3-flash-preview comments

Generated summary for this photo