Next photoDodge Van
Human 96.3% yes3.7% no Model average 100.0% yes0.0% no Most aligned models 31-way tie meta-llama/llama-3.2-11b-vision-instructopenai/o3-promoonshotai/kimi-k2.5+28 more Least aligned models 31-way tie meta-llama/llama-3.2-11b-vision-instructopenai/o3-promoonshotai/kimi-k2.5+28 more Human distribution 96.3% yes, 3.7% no over 656 explicit votes. Model average distribution 100.0% yes, 0.0% no across the current model set. Closest current models 100.0% yes. Least aligned models 3.7 point gap. Legacy GPT-4o baseline 100.0% yes with a 3.7 point gap against humans. Biggest model gap 3.7 percentage points on this image. Current classification People mostly said yes 

BLTPeople mostly said yes
Benchmark image 01
Bacon Lettuce Tomato
BLT "Sandwich"
A perfectly legible BLT sits on toasted bread, the kind of canonical positive example that makes even the worst eval look solved. If your model misses this one, it does not need fine-tuning; it needs adult supervision.
Under development: this benchmark and its published results are provisional, not final.
At a glance
How this photo split the room
31-way tie
31-way tie
Why this page matters
This is the compact read on where humans, models, and comments start disagreeing about the same image.
Model spread
How models line up against the crowd
Bars run from more no on the left to more yes on the right. The marker shows the human yes rate.
amazon/nova-lite-v1
amazon/nova-pro-v1
anthropic/claude-opus-4.6
anthropic/claude-sonnet-4.6
baidu/ernie-4.5-vl-28b-a3b
google/gemini-2.5-pro
google/gemini-3-flash-preview
google/gemini-3.1-pro-preview
google/gemma-3-12b-it
google/gemma-3-27b-it
meta-llama/llama-3.2-11b-vision-instruct
meta-llama/llama-4-maverick
meta-llama/llama-4-scout
minimax/minimax-01
mistralai/pixtral-large-2411
moonshotai/kimi-k2.5
openai/gpt-4.1
openai/gpt-4.1-mini
openai/gpt-4o
openai/gpt-4o-2024-11-20
openai/gpt-4o-mini
openai/gpt-5.4
openai/gpt-5.4-pro
openai/o3
openai/o3-pro
qwen/qwen-2-vl-72b-instruct
qwen/qwen2.5-vl-32b-instruct
qwen/qwen2.5-vl-72b-instruct
qwen/qwen3.5-397b-a17b
x-ai/grok-4-fast
z-ai/glm-4.6v
Vote card
Generated summary for this photo


Selected human comments
meta-llama/llama-3.2-11b-vision-instruct comments
google/gemini-3-flash-preview comments