Objects are not enough

2 minute read

Published:

If you want to reason about an ARC grid in terms of objects, you first have to decide what an object is. A common way — the one I’ve been using — is Hodel’s objects() detector, which takes three booleans:

  • single-valued — must an object be one color, or can it be many?
  • diagonal — do diagonally-touching cells count as connected?
  • without-bg — is the most common color “background”, or just another color?

Three switches, so eight ways to carve a grid into objects. And honestly, it works — most of the time it returns the things a human would point at and call “objects”.

But it’s not enough. The moment I sit down and actually try to solve tasks with these objects, I keep hitting the same wall: I need to select things more freely than any fixed setting allows.

An ARC task where a black 'clamp' must be split into a head and an arm A task that looks like a clamp grabbing something and dragging it. To solve it you have to split the single black clamp into a “head” and an “arm” — a cut no fixed detector gives you for free.

Two kinds of trouble come up again and again:

  • I need to merge things the detector splits. Two same-colored marks far apart on the grid — a frame, a border, a scattered outline — should sometimes be treated as one object. The justification is simple: they’re the same color.
  • I need to split things the detector merges. In the clamp task above, the black shape is really a head plus an arm, and you can only tell because the output shows a piece with the same color and shape. So the grounds for the cut come from elsewhere in the task.

What I actually want isn’t a better fixed detector. I want the ability to decompose a detected object and freely recombine pieces into a new whole, with an explicit reason each time (same color, same shape, same size…).

And once you allow that, something nicer falls out. The ARCKG hierarchy I use — GRID, OBJECT, PIXEL — starts to look like nothing more than object sizes: a GRID is just the largest multi-colored object, a PIXEL is just the smallest single-colored one, and there’s a whole continuum of objects in between. The layers shouldn’t be three rigid shelves. They should be a fluid scale, where “what counts as one object” is decided per task, by the reasoning, not fixed in advance.

That realization — objects are not a given, they’re a decision — reshaped how I think about the whole representation.