node_004 · technical note

Sparsity
is a substrate
property.

A short technical note on something we keep observing across workloads, and what we think it means.

Sparsity is usually framed as a property of the model – a clever induced structure: top-k routing, mixture-of-experts, magnitude pruning, structured zero-blocks. We have come to believe this framing is upside down.

Sparsity is a property of the substrate: of the memory hierarchy, the interconnect topology, and the access pattern of the workload running on them. The model trick only matters insofar as it surfaces sparsity that the substrate can already serve cheaply. Models that introduce sparsity the substrate cannot serve do not run faster; they run roughly the same, with extra book-keeping.

This note documents what we have observed and the corollary we draw from it. It is open work – no firm claims, no benchmarks suitable for citation. A longer write-up will follow.

IWhat we see

Across a fairly wide set of dense and “sparse” workloads – transformer inference at long context, MoE training at moderate scale, two retrieval workloads – we tracked the realised speedup of declared sparsity ratios on contemporary accelerators.

DECLARED SPARSITY
87%
Median across the set we measured.
REALISED SPEEDUP
1.4×
Median, against a tuned dense baseline on the same hardware.
EXPLAINABLE BY MEMORY
~80%
Of the variance, regressing speedup on access-pattern features.

If sparsity were primarily a model property, the realised speedup would track the declared ratio more or less linearly, with a roofline-shaped ceiling. It does not. Speedup tracks something else: how well the sparsity pattern aligns with the substrate’s natural granularity of access.

IIThe reframe

The substrate has its own native sparsity pattern, set by three things:

– The cache-line and shared-memory bank structure, which determines what a “free” zero looks like.

– The interconnect topology between tiles or devices, which determines what a “free” gather looks like.

– The dispatch granularity of the executor, which determines the smallest unit of work the runtime can choose to skip.

A sparsity pattern declared by the model is only realised as performance to the extent that it lines up with all three. Misaligned sparsity is, in practice, indistinguishable from dense work plus overhead.

FIG · 02 – DECLARED VS. REALISEDsimplified
# declared by the model
sparsity_declared   = 0.87        # 87% zeros

# what the substrate actually sees
aligned_with_cache  = 0.31        # fraction of zeros on free lines
aligned_with_dispatch = 0.42      # fraction skippable by executor
aligned_with_xfer   = 0.55        # fraction not crossing the fabric

# realised speedup ≈ f(min of the three, not max)
realised_speedup    = ~1.4x         # not 7.7x
“A pattern the substrate cannot serve cheaply is not sparse. It is dense work, with extra book-keeping.”
– NOTE · §II

IIIThe corollary

If sparsity is a substrate property, then the question “how do we make the model sparser?” is the wrong question to ask first. The right question is: what sparsity is this substrate already prepared to serve, for free or near-free, and can we induce that pattern in the model?

This is a small inversion, but it changes the order of operations. Sparsity research becomes a substrate-conditioned activity. The model trick is downstream of a measurement of the silicon.

It also explains why so much published sparsity work fails to reproduce in production. The pattern was tuned for one substrate’s natural granularity and silently mismatches another’s.

IVWhat we are doing about it

Internally, we have started to characterise each accelerator we work with by a small fingerprint: cache-line, dispatch granule, interconnect granule, and the realised cost of each. The fingerprint is what we hand to the modelling team. The sparsity targets follow from it.

We will publish the fingerprint format, and a few example fingerprints, when the methodology is stable enough to be useful to others. This is open work; replies welcome.

VWhat this is not

This is not a claim that model-level sparsity research is unimportant. It is a claim about ordering. The substrate is what determines whether a sparsity pattern earns its weight. Treat the substrate as a measurable thing first, and the model trick second.

Sparsity, like everything else at this layer, is a property of the seam.

S. Iyer & A. Kolla, Compute.
New Delhi · 06 May 2026.

Open note. Replies welcome at hello@vyanacompute.com.