Previous photoDodge Van Next photoSandwich Costume

SUBPeople mostly said yes

Benchmark image 03

Sub Sandwich

Salami, cheddar, lettuce & tomato sub "Sandwich"

A long sub packed with salami, cheddar, lettuce, and tomato sprawls across the frame like a benchmark overfit to obvious wins. It is unquestionably a sandwich, unless you are the kind of engineer who opens a ticket about submarine semantics.

Under development: this benchmark and its published results are provisional, not final.

Human

94.5% yes5.5% no

Model average

99.8% yes0.2% no

Most aligned model

2.5 point gap from humans

allenai/molmo-2-8b

Least aligned model

9.5 point gap from humans

meta-llama/llama-3.2-11b-vision-instruct

At a glance

How this photo split the room

Human distribution

94.5% yes, 5.5% no over 656 explicit votes.

Model average distribution

99.8% yes, 0.2% no across the current model set.

Closest current model

97.0% yes.

allenai/molmo-2-8b

Least aligned model

9.5 point gap.

meta-llama/llama-3.2-11b-vision-instruct

Legacy GPT-4o baseline

100.0% yes with a 5.5 point gap against humans.

Biggest model gap

9.5 percentage points on this image.

Current classification

People mostly said yes

Benchmark context

Current classification

People mostly said yes

Models compared

74 current runs

Biggest model gap

9.5 percentage points on this image.

Closest model output

97.0% yes.

Model spread

How Models Align with Human Responses

This compares each model against human responses to show how closely it aligns with people.Human rate marker

meta-llama/llama-3.2-11b-vision-instruct

15.0% no85.0% yes

Human gap9.5%

Rank #3

allenai/molmo-2-8b

3.0% no97.0% yes

Human gap2.5%

Rank #6

amazon/nova-2-lite-v1

0.0% no100.0% yes

Human gap5.5%

Rank #60

amazon/nova-lite-v1

0.0% no100.0% yes

Human gap5.5%

Rank #51

amazon/nova-pro-v1

0.0% no100.0% yes

Human gap5.5%

Rank #73

anthropic/claude-haiku-4.5

0.0% no100.0% yes

Human gap5.5%

Rank #52

anthropic/claude-opus-4.5

0.0% no100.0% yes

Human gap5.5%

Rank #56

anthropic/claude-opus-4.6

0.0% no100.0% yes

Human gap5.5%

Rank #63

anthropic/claude-opus-4.7

0.0% no100.0% yes

Human gap5.5%

Rank #38

anthropic/claude-opus-4.8

0.0% no100.0% yes

Human gap5.5%

Rank #40

anthropic/claude-sonnet-4.6

0.0% no100.0% yes

Human gap5.5%

Rank #62

baidu/ernie-4.5-vl-28b-a3b

0.0% no100.0% yes

Human gap5.5%

Rank #69

bytedance-seed/seed-1.6

0.0% no100.0% yes

Human gap5.5%

Rank #41

bytedance-seed/seed-1.6-flash

0.0% no100.0% yes

Human gap5.5%

Rank #20

bytedance-seed/seed-2.0-lite

0.0% no100.0% yes

Human gap5.5%

Rank #14

bytedance-seed/seed-2.0-mini

0.0% no100.0% yes

Human gap5.5%

Rank #19

google/gemini-2.5-flash

0.0% no100.0% yes

Human gap5.5%

Rank #21

google/gemini-2.5-flash-lite

0.0% no100.0% yes

Human gap5.5%

Rank #54

google/gemini-2.5-pro

0.0% no100.0% yes

Human gap5.5%

Rank #25

google/gemini-3-flash-preview

0.0% no100.0% yes

Human gap5.5%

Rank #75

google/gemini-3-pro-image-preview

0.0% no100.0% yes

Human gap5.5%

Rank #42

google/gemini-3.1-flash-image-preview

0.0% no100.0% yes

Human gap5.5%

Rank #24

google/gemini-3.1-flash-lite-preview

0.0% no100.0% yes

Human gap5.5%

Rank #55

google/gemini-3.1-pro-preview

0.0% no100.0% yes

Human gap5.5%

Rank #45

google/gemma-3-12b-it

0.0% no100.0% yes

Human gap5.5%

Rank #26

google/gemma-3-27b-it

0.0% no100.0% yes

Human gap5.5%

Rank #48

GPT-4o (Spring 2024)

0.0% no100.0% yes

Human gap5.5%

Rank #4

meta-llama/llama-4-maverick

0.0% no100.0% yes

Human gap5.5%

Rank #68

meta-llama/llama-4-scout

0.0% no100.0% yes

Human gap5.5%

Rank #33

minimax/minimax-01

0.0% no100.0% yes

Human gap5.5%

Rank #72

mistralai/mistral-large-2512

0.0% no100.0% yes

Human gap5.5%

Rank #71

mistralai/pixtral-large-2411

0.0% no100.0% yes

Human gap5.5%

Rank #50

moonshotai/kimi-k2.5

0.0% no100.0% yes

Human gap5.5%

Rank #13

nvidia/nemotron-nano-12b-v2-vl

0.0% no100.0% yes

Human gap5.5%

Rank #7

openai/gpt-4.1

0.0% no100.0% yes

Human gap5.5%

Rank #74

openai/gpt-4.1-mini

0.0% no100.0% yes

Human gap5.5%

Rank #57

openai/gpt-4.1-nano

0.0% no100.0% yes

Human gap5.5%

Rank #36

openai/gpt-4o

0.0% no100.0% yes

Human gap5.5%

Rank #15

openai/gpt-4o-2024-11-20

0.0% no100.0% yes

Human gap5.5%

Rank #67

openai/gpt-4o-mini

0.0% no100.0% yes

Human gap5.5%

Rank #61

openai/gpt-5.1

0.0% no100.0% yes

Human gap5.5%

Rank #49

openai/gpt-5.1-chat

0.0% no100.0% yes

Human gap5.5%

Rank #8

openai/gpt-5.1-codex

0.0% no100.0% yes

Human gap5.5%

Rank #37

openai/gpt-5.2

0.0% no100.0% yes

Human gap5.5%

Rank #43

openai/gpt-5.3-chat

0.0% no100.0% yes

Human gap5.5%

Rank #30

openai/gpt-5.3-codex

0.0% no100.0% yes

Human gap5.5%

Rank #44

openai/gpt-5.4

0.0% no100.0% yes

Human gap5.5%

Rank #59

openai/gpt-5.4-mini

0.0% no100.0% yes

Human gap5.5%

Rank #28

openai/gpt-5.4-nano

0.0% no100.0% yes

Human gap5.5%

Rank #31

openai/gpt-5.4-pro

0.0% no100.0% yes

Human gap5.5%

Rank #65

openai/gpt-5.5

0.0% no100.0% yes

Human gap5.5%

Rank #46

openai/o1

0.0% no100.0% yes

Human gap5.5%

Rank #2

openai/o1-pro

0.0% no100.0% yes

Human gap5.5%

Rank #1

openai/o3

0.0% no100.0% yes

Human gap5.5%

Rank #64

openai/o3-pro

0.0% no100.0% yes

Human gap5.5%

Rank #53

openrouter/healer-alpha

0.0% no100.0% yes

Human gap5.5%

Rank #10

perplexity/sonar-pro-search

0.0% no100.0% yes

Human gap5.5%

Rank #32

qwen/qwen-2-vl-72b-instruct

0.0% no100.0% yes

Human gap5.5%

Rank #29

qwen/qwen2.5-vl-32b-instruct

0.0% no100.0% yes

Human gap5.5%

Rank #39

qwen/qwen2.5-vl-72b-instruct

0.0% no100.0% yes

Human gap5.5%

Rank #70

qwen/qwen3-vl-235b-a22b-instruct

0.0% no100.0% yes

Human gap5.5%

Rank #47

qwen/qwen3-vl-30b-a3b-instruct

0.0% no100.0% yes

Human gap5.5%

Rank #66

qwen/qwen3-vl-30b-a3b-thinking

0.0% no100.0% yes

Human gap5.5%

Rank #22

qwen/qwen3.5-122b-a10b

0.0% no100.0% yes

Human gap5.5%

Rank #11

qwen/qwen3.5-27b

0.0% no100.0% yes

Human gap5.5%

Rank #18

qwen/qwen3.5-35b-a3b

0.0% no100.0% yes

Human gap5.5%

Rank #23

qwen/qwen3.5-397b-a17b

0.0% no100.0% yes

Human gap5.5%

Rank #34

qwen/qwen3.5-9b

0.0% no100.0% yes

Human gap5.5%

Rank #27

qwen/qwen3.5-flash-02-23

0.0% no100.0% yes

Human gap5.5%

Rank #9

qwen/qwen3.5-plus-02-15

0.0% no100.0% yes

Human gap5.5%

Rank #35

x-ai/grok-4

0.0% no100.0% yes

Human gap5.5%

Rank #12

x-ai/grok-4-fast

0.0% no100.0% yes

Human gap5.5%

Rank #5

x-ai/grok-4.1-fast

0.0% no100.0% yes

Human gap5.5%

Rank #16

x-ai/grok-4.20-beta

0.0% no100.0% yes

Human gap5.5%

Rank #17

z-ai/glm-4.6v

0.0% no100.0% yes

Human gap5.5%

Rank #58

Selected human comments

No, it is a photo of a sandwich.
This is a vessel that conveys men underwater, while cramming too many of them together.
Not my favortie

allenai/molmo-2-8b comments

While it contains elements typically found in sandwiches (bread, meat, cheese), the long, narrow shape and presentation suggest it's more of a sub or hero sandwich, which is often considered a separate category from traditional sandwiches.
A submarine sandwich meets all traditional criteria: bread, fillings, and a contained structure. The elongated shape doesn't negate its classification.

meta-llama/llama-3.2-11b-vision-instruct comments

A sandwich is generally considered to be a food item consisting of two or more slices of bread, with fillings in between. In this case, the image depicts a sausage and cheese hoagie, which meets this definition.
A sandwich is a food item typically consisting of bread, often sliced, stacked with various fillings such as meats, cheeses, vegetables, condiments, or spreads.

Vote card

Sub Sandwich

How this photo split the room

How Models Align with Human Responses

Selected human comments

allenai/molmo-2-8b comments

meta-llama/llama-3.2-11b-vision-instruct comments

Generated summary for this photo