Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data

(thonking.ai)

65 points | by tosh 4 days ago

9 comments

dan_sbl 16 minutes ago
> For example, when the GPU is fully idle, nvidia-smi tells me that it’s only pulling 88W of power.
I haven't used a non-laptop GPU in some time, but that is a crazy amount of "idle" power consumption. Is this normal for cards like this?
[-]
- Aurornis 8 minutes ago
  Server cards are not optimized for idle power usage. They’re expected to be fully utilized.
  For server gear it’s more common to have less dynamic power and voltage switching because it produces more predictable performance and latency.
jayd16 34 minutes ago
I can't tell from the blog, is this actually verified or is it theory and then numbers showing plausibility?
I could certainly come up with alternative theories about memory compression and prefetching if we were talking about texture reads.
amelius 13 minutes ago
Sounds like a side channel attack waiting to happen.
nzach 1 hour ago
I went in expecting to find 'branch prediction'[0] as the answer, but apparently things are even more complex nowadays.
[0] - https://stackoverflow.com/questions/11227809/why-is-conditio...
[-]
- kangalioo 40 minutes ago
  To be fair, the culprit in the article is _less complex_ than branch prediction: "with random data, bits are flipped often, and bit flips in transistors inherently draw power" is less mental gymnastics than "with random data, the cpu fails to predict the future, causing redundant speculative execution"
bitwize 10 minutes ago
It wouldn't surprise me to see some ML algorithm in silico somewhere to select faster matmul paths on favorable data. Yo dawg, I heard you like AI, so we put some AI in your AI so you can infer while you're inferring.
gdevenyi 2 hours ago
People have been noticing the effects of this in local LLM inference. Power limiting seems to improve overall performance!
[-]
- Aurornis 46 minutes ago
  This is not observable from LLM inference, where you would not encounter uniform matrices.
  Power limiting does not improve performance but it does improve efficiency. You might be able to get 90% of the performance for only 70% of the power usage, for example. It does not make the card go faster though.
- gchamonlive 2 hours ago
  In general, constraints require optimizations and rearchitectures. I'd also expect the ram shortage for instance to have a big impact on the software industry as a whole, specially in games. They will need to make do with what people have, a ps5/pro or similar in PC power.
  [-]
  - aNoob7000 1 hour ago
    I actually think it is a good thing to introduce constraints to AI and the overall tech industry. Hopefully everyone will have to look at improving performance without having to add RAM or increase CPU/GPU performance.
evanjrowley 44 minutes ago
Designing for predictable execution flow is one of the advantages of Tenstorrent hardware.
https://clehaxze.tw/gemlog/2025/04-21-programming-tensotrren...
https://clehaxze.tw/gemlog/2026/01-22-the-real-tenstorrent-t...
https://arxiv.org/html/2604.03279
cold_harbor 1 hour ago
[dead]