Red Square Blue Square …
We asked three leading AIs, a difficult logic problem. Each found an obvious answer, but missed the deeper solution.
To be fair, this problem is tricky: once explained, some people have an ‘aha’ moment while others feel disappointed.
Stated simply:
Which doesn’t belong: red square, blue square, blue circle?
The answer is the blue square.
The blue circle differs by shape
The red square differs by color
The blue square differs by not being uniquely different at all.
If you don’t understand, see this explanation.
Once you get it, you might be surprised that leading AIs — ChatGPT, Claude, and Gemini — haven’t solved this puzzle. Yet.
This is how each responded to the following prompt:
Text Prompt
“Briefly, which doesn’t belong: red square, blue square, blue circle?”
ChatGPT (OpenAI)
The “blue circle” doesn’t belong, because it differs in shape—it’s a circle, while the others are squares.
Claude (Anthropic)
The red square doesn’t belong — the other two are blue. (You could also argue the blue circle doesn’t belong, since the other two are squares — it depends on whether you’re grouping by color or shape!)
Gemini (Google)
It’s a classic “odd one out” puzzle where the answer depends on your perspective, but most people land on the blue circle.
Here is the breakdown:
- Blue Circle: Doesn’t belong if you are categorizing by shape. It is the only round object in a group of quadrilaterals.
- Red Square: Doesn’t belong if you are categorizing by color. It is the only red object in a group of blue ones.
The Most Common Answer
Usually, the blue circle is considered the “more” correct answer in cognitive tests because humans tend to prioritize shape recognition over color when identifying objects.
Would you like to try a more complex logic puzzle, or perhaps a visual one?
Here’s a quick analysis: the AIs give incorrect answers because, after identifying two candidates as not belonging, they stop short of evaluating the remaining object — which would reveal that it alone lacks any comparable justification.
We continue to monitor how well AIs handle reasoning that goes beyond trivia and into more abstract thinking.











