What do we actually compare?
Published:
A while back I argued that humans learn by comparing — by putting two things side by side and finding what’s the same and what’s different. I still believe that’s the engine. But it leaves an obvious gap: to compare two things, you first have to decide what about them to compare. So: what do we actually compare?
When I put an ARC input and output grid side by side, I’m clearly not comparing raw pixels. I’m comparing features — size, the set of colors, the area each color covers, symmetry — and for each feature I note whether it’s the same (COMM) or changed (DIFF). In my solver I literally print this out as a little “comparison receipt”:
task 220 - pair 0
========================================
size
rel input output delta
height COMM 4 4 0
width DIFF 9 12 3
area DIFF 36 48 12
----------------------------------------
color
count DIFF 4 5 1
4 COMM 4 4 .
...
----------------------------------------
symmetry
hori COMM False False
...
For a long time I treated this as just an engineering choice. Then I realized a cognitive scientist had written down almost exactly this, nearly fifty years ago.
In “Features of Similarity” (1977), Amos Tversky rejected the then-standard view that similarity is distance in some geometric space. Instead he modeled each object as a set of features, and similarity as a contrast of common and distinctive features:
\[S(A, B) = \theta\, f(A \cap B) \;-\; \alpha\, f(A - B) \;-\; \beta\, f(B - A)\]Read that slowly and it’s almost unsettling how close it is to the receipt above. The common features A ∩ B are my COMM rows — they push similarity up (the θ term). The distinctive features A − B and B − A are my DIFF rows — they push it down (the α, β terms). My comparison receipt is, more or less, a Tversky contrast model computed over grid features.
What I find most useful is what Tversky’s model predicts beyond the formula. Because similarity is feature-matching with weights, it is asymmetric (comparing A to B need not equal B to A) and context-dependent (which features you weigh shifts with what else is present). For ARC that isn’t a footnote — it’s the whole game. The same two grids, compared on a different set of features, point to a different rule. So choosing the features to compare is not preparation for the reasoning; it is the reasoning.
This also tells me where comparison has to go next. Tversky compares objects on their features. But a lot of ARC isn’t about an object’s features — it’s about the relations between objects, and how those relations change. Comparing relational structure rather than flat features is precisely the step from Tversky toward Gentner’s structure-mapping — and toward comparing whole solution trees instead of single grids. Same move, one level up.

