Why stacking approximations makes me uneasy
Published:
Sooner or later, every “neuro-symbolic” conversation collapses to a single fork: symbolic representation vs. vector representation. I want to be honest about which side I lean to, and why — because the why is more interesting than the slogan.
Let me start by conceding the strong case for vectors. Their generalizability is real, and it’s the reason neural networks and transformers have developed at an insane pace. A continuous, distributed representation is diverse and near-universal; it handles big, noisy, multimodal data beautifully. I’m not arguing against any of that.
What bothers me is narrower: the habit of building everything out of accumulated approximations.
Approximation is essential for finding an optimal decision. It is not obviously the right tool for everything. And if we commit to representing all knowledge as approximations stacked on approximations, the bill comes due as ever more compute and ever-better optimization — forever.
Here’s the picture I keep coming back to. Think about a data-scarce situation — which is the normal situation for a new ARC task, where you get a handful of examples and nothing else. There, you can’t average over a big corpus; you have to learn something that works only here, by trial and error. That’s what a scientist does: set up a hypothesis with defined variables — independent, dependent, manipulated, controlled — test it, and repeat. We learn part by part, and crucially, each thing we learn is anchored to a variable. What we learn is specific, not general.
Now compare that to a network updating all of its parameters from a single observation. There’s no anchor for the new information to stick to, so it gets distributed — smeared thin, scattered across weights. My claim is that this happens because there was no hypothesis, and therefore no anchor.
The cleanest way I can show the gap is with one tiny example. In my knowledge graph, a single PIXEL is just:
{ color: 0, coordinate: { row: 3, col: 7 } }
That is already a symbolic fact. Now imagine that same pixel living inside a transformer — some vector, some tensor, not the legible thing above. Ask for it back, reconstruct it from the vector, and you might get:
{ color: -0.0001388839, coordinate: { x: 2.999998793, y: 6.788208334 } }
Very close to what I started with. But here’s the thing humans do effortlessly and approximation alone does not: we know the difference between “exactly the same” and “very similar.” To recover exactly-the-same-level information, you need something that snaps to one side, like a magnet — something that makes the neural ↔ symbolic translation commutable instead of lossy.
This is the gap I think the serious neuro-symbolic work is circling. Garcez and Lamb’s Neurosymbolic AI: The 3rd Wave (2020) frames the trade-off precisely: neural representations are distributed and continuous and ground concepts well on feature vectors; symbolic representations are localist and discrete and reason well and explainably. The open problem is grounding logical concepts onto vectors. My unease is specifically about the return trip — vector → symbol — where “very close” quietly replaces “the same.”
And I don’t think this is a fussy philosophical worry. It’s close to why pure deep learning struggles on ARC at all: the benchmark rewards skill-acquisition efficiency in the data-scarce regime (Chollet, On the Measure of Intelligence, 2019) — exactly the regime where “learn part by part, anchored to a variable” should beat “approximate everything.” So when I bet on a symbolic graph as the place where knowledge accumulates, this is the intuition underneath it: I want the anchors back.


