Previous photoAvocado Tea Next photoCookie PB

PNIPeople mostly said yes

Benchmark image 13

Panini

Panini "Sandwich"

A pressed panini with greens and filling compressed into sharp grill lines shows up like a normal sandwich after a product manager discovered heat. It is structurally boring in the best possible way and still somehow controversial to a few models.

Under development: this benchmark and its published results are provisional, not final.

Human

92.4% yes7.6% no

Model average

99.7% yes0.3% no

Most aligned model

6.6 point gap from humans

nvidia/nemotron-nano-12b-v2-vl

Least aligned model

14.4 point gap from humans

meta-llama/llama-3.2-11b-vision-instruct

At a glance

How this photo split the room

Human distribution

92.4% yes, 7.6% no over 655 explicit votes.

Model average distribution

99.7% yes, 0.3% no across the current model set.

Closest current model

99.0% yes.

nvidia/nemotron-nano-12b-v2-vl

Least aligned model

14.4 point gap.

meta-llama/llama-3.2-11b-vision-instruct

Legacy GPT-4o baseline

100.0% yes with a 7.6 point gap against humans.

Biggest model gap

14.4 percentage points on this image.

Current classification

People mostly said yes

Benchmark context

Current classification

People mostly said yes

Models compared

74 current runs

Biggest model gap

14.4 percentage points on this image.

Closest model output

99.0% yes.

Model spread

How Models Align with Human Responses

This compares each model against human responses to show how closely it aligns with people.Human rate marker

meta-llama/llama-3.2-11b-vision-instruct

22.0% no78.0% yes

Human gap14.4%

Rank #3

nvidia/nemotron-nano-12b-v2-vl

1.0% no99.0% yes

Human gap6.6%

Rank #7

allenai/molmo-2-8b

0.0% no100.0% yes

Human gap7.6%

Rank #6

amazon/nova-2-lite-v1

0.0% no100.0% yes

Human gap7.6%

Rank #60

amazon/nova-lite-v1

0.0% no100.0% yes

Human gap7.6%

Rank #51

amazon/nova-pro-v1

0.0% no100.0% yes

Human gap7.6%

Rank #73

anthropic/claude-haiku-4.5

0.0% no100.0% yes

Human gap7.6%

Rank #52

anthropic/claude-opus-4.5

0.0% no100.0% yes

Human gap7.6%

Rank #56

anthropic/claude-opus-4.6

0.0% no100.0% yes

Human gap7.6%

Rank #63

anthropic/claude-opus-4.7

0.0% no100.0% yes

Human gap7.6%

Rank #38

anthropic/claude-opus-4.8

0.0% no100.0% yes

Human gap7.6%

Rank #40

anthropic/claude-sonnet-4.6

0.0% no100.0% yes

Human gap7.6%

Rank #62

baidu/ernie-4.5-vl-28b-a3b

0.0% no100.0% yes

Human gap7.6%

Rank #69

bytedance-seed/seed-1.6

0.0% no100.0% yes

Human gap7.6%

Rank #41

bytedance-seed/seed-1.6-flash

0.0% no100.0% yes

Human gap7.6%

Rank #20

bytedance-seed/seed-2.0-lite

0.0% no100.0% yes

Human gap7.6%

Rank #14

bytedance-seed/seed-2.0-mini

0.0% no100.0% yes

Human gap7.6%

Rank #19

google/gemini-2.5-flash

0.0% no100.0% yes

Human gap7.6%

Rank #21

google/gemini-2.5-flash-lite

0.0% no100.0% yes

Human gap7.6%

Rank #54

google/gemini-2.5-pro

0.0% no100.0% yes

Human gap7.6%

Rank #25

google/gemini-3-flash-preview

0.0% no100.0% yes

Human gap7.6%

Rank #75

google/gemini-3-pro-image-preview

0.0% no100.0% yes

Human gap7.6%

Rank #42

google/gemini-3.1-flash-image-preview

0.0% no100.0% yes

Human gap7.6%

Rank #24

google/gemini-3.1-flash-lite-preview

0.0% no100.0% yes

Human gap7.6%

Rank #55

google/gemini-3.1-pro-preview

0.0% no100.0% yes

Human gap7.6%

Rank #45

google/gemma-3-12b-it

0.0% no100.0% yes

Human gap7.6%

Rank #26

google/gemma-3-27b-it

0.0% no100.0% yes

Human gap7.6%

Rank #48

GPT-4o (Spring 2024)

0.0% no100.0% yes

Human gap7.6%

Rank #4

meta-llama/llama-4-maverick

0.0% no100.0% yes

Human gap7.6%

Rank #68

meta-llama/llama-4-scout

0.0% no100.0% yes

Human gap7.6%

Rank #33

minimax/minimax-01

0.0% no100.0% yes

Human gap7.6%

Rank #72

mistralai/mistral-large-2512

0.0% no100.0% yes

Human gap7.6%

Rank #71

mistralai/pixtral-large-2411

0.0% no100.0% yes

Human gap7.6%

Rank #50

moonshotai/kimi-k2.5

0.0% no100.0% yes

Human gap7.6%

Rank #13

openai/gpt-4.1

0.0% no100.0% yes

Human gap7.6%

Rank #74

openai/gpt-4.1-mini

0.0% no100.0% yes

Human gap7.6%

Rank #57

openai/gpt-4.1-nano

0.0% no100.0% yes

Human gap7.6%

Rank #36

openai/gpt-4o

0.0% no100.0% yes

Human gap7.6%

Rank #15

openai/gpt-4o-2024-11-20

0.0% no100.0% yes

Human gap7.6%

Rank #67

openai/gpt-4o-mini

0.0% no100.0% yes

Human gap7.6%

Rank #61

openai/gpt-5.1

0.0% no100.0% yes

Human gap7.6%

Rank #49

openai/gpt-5.1-chat

0.0% no100.0% yes

Human gap7.6%

Rank #8

openai/gpt-5.1-codex

0.0% no100.0% yes

Human gap7.6%

Rank #37

openai/gpt-5.2

0.0% no100.0% yes

Human gap7.6%

Rank #43

openai/gpt-5.3-chat

0.0% no100.0% yes

Human gap7.6%

Rank #30

openai/gpt-5.3-codex

0.0% no100.0% yes

Human gap7.6%

Rank #44

openai/gpt-5.4

0.0% no100.0% yes

Human gap7.6%

Rank #59

openai/gpt-5.4-mini

0.0% no100.0% yes

Human gap7.6%

Rank #28

openai/gpt-5.4-nano

0.0% no100.0% yes

Human gap7.6%

Rank #31

openai/gpt-5.4-pro

0.0% no100.0% yes

Human gap7.6%

Rank #65

openai/gpt-5.5

0.0% no100.0% yes

Human gap7.6%

Rank #46

openai/o1

0.0% no100.0% yes

Human gap7.6%

Rank #2

openai/o1-pro

0.0% no100.0% yes

Human gap7.6%

Rank #1

openai/o3

0.0% no100.0% yes

Human gap7.6%

Rank #64

openai/o3-pro

0.0% no100.0% yes

Human gap7.6%

Rank #53

openrouter/healer-alpha

0.0% no100.0% yes

Human gap7.6%

Rank #10

perplexity/sonar-pro-search

0.0% no100.0% yes

Human gap7.6%

Rank #32

qwen/qwen-2-vl-72b-instruct

0.0% no100.0% yes

Human gap7.6%

Rank #29

qwen/qwen2.5-vl-32b-instruct

0.0% no100.0% yes

Human gap7.6%

Rank #39

qwen/qwen2.5-vl-72b-instruct

0.0% no100.0% yes

Human gap7.6%

Rank #70

qwen/qwen3-vl-235b-a22b-instruct

0.0% no100.0% yes

Human gap7.6%

Rank #47

qwen/qwen3-vl-30b-a3b-instruct

0.0% no100.0% yes

Human gap7.6%

Rank #66

qwen/qwen3-vl-30b-a3b-thinking

0.0% no100.0% yes

Human gap7.6%

Rank #22

qwen/qwen3.5-122b-a10b

0.0% no100.0% yes

Human gap7.6%

Rank #11

qwen/qwen3.5-27b

0.0% no100.0% yes

Human gap7.6%

Rank #18

qwen/qwen3.5-35b-a3b

0.0% no100.0% yes

Human gap7.6%

Rank #23

qwen/qwen3.5-397b-a17b

0.0% no100.0% yes

Human gap7.6%

Rank #34

qwen/qwen3.5-9b

0.0% no100.0% yes

Human gap7.6%

Rank #27

qwen/qwen3.5-flash-02-23

0.0% no100.0% yes

Human gap7.6%

Rank #9

qwen/qwen3.5-plus-02-15

0.0% no100.0% yes

Human gap7.6%

Rank #35

x-ai/grok-4

0.0% no100.0% yes

Human gap7.6%

Rank #12

x-ai/grok-4-fast

0.0% no100.0% yes

Human gap7.6%

Rank #5

x-ai/grok-4.1-fast

0.0% no100.0% yes

Human gap7.6%

Rank #16

x-ai/grok-4.20-beta

0.0% no100.0% yes

Human gap7.6%

Rank #17

z-ai/glm-4.6v

0.0% no100.0% yes

Human gap7.6%

Rank #58

Selected human comments

No, it is a photo of an Avocado and spinach panini "sandwich"
That's a gum-shredder. That shit will tear up your face.
Salad and breadsticks

nvidia/nemotron-nano-12b-v2-vl comments

The item is a wrap, not a traditional sandwich. Sandwiches typically use sliced bread, while wraps use folded flatbread or tortillas.
Grilled panini with bread, avocado, lettuce, and onions fits the sandwich definition: fillings between bread slices.

meta-llama/llama-3.2-11b-vision-instruct comments

A sandwich typically consists of two slices of bread filled with various ingredients, such as meats, cheeses, vegetables, and condiments. This photo shows two halves of a ciabatta bread filled with avocado, spinach, and sun-dried tomatoes, which is more akin to a panini or a wrap. While it may be considered a sandwich by some, it doesn't fit the traditional definition of a sandwich.
A sandwich is a food item typically consisting of two or more slices of bread, often with fillings such as meats, cheeses, vegetables, and condiments. The key elements of a sandwich are the bread and the filling, and it can be served in various ways, including grilled, toasted, or as a wrap. In this case, the image depicts a panini, which is a type of sandwich made with bread that has been grilled or pressed. It typically consists of a filling, such as meats, cheeses, and vegetables, placed between two slices of bread that have been grilled or pressed together. Therefore, based on the definition of a sandwich, the image is indeed a sandwich, specifically a panini.

Vote card

Panini

How this photo split the room

How Models Align with Human Responses

Selected human comments

nvidia/nemotron-nano-12b-v2-vl comments

meta-llama/llama-3.2-11b-vision-instruct comments

Generated summary for this photo