Get a quote
GPU cost teardown · 2026

What AI inference actually costs — and where the money goes.

If you run production inference on a hyperscaler's on-demand pricing, you're very likely paying 2–5× the open-market rate for the exact same GPUs. Here's the plain-English breakdown — and how to check your own bill in ten seconds.

The same H100, three very different prices

A GPU-hour is a GPU-hour — an NVIDIA H100 is the same chip whether you rent it from a hyperscaler or a marketplace. What changes wildly is the markup. Rough per-GPU-hour ranges for H100-class hardware:

Where you buy~ $ / GPU-hourTrade-off
Hyperscaler on-demand
AWS / GCP / Azure
$8 – $13Convenient, trusted — and the most expensive way to buy
Neocloud / committed
Lambda, CoreWeave, Crusoe…
$2.5 – $4Much cheaper, but you commit and you run it
Marketplace / spot
RunPod, Vast, TensorDock…
$1.8 – $3Cheapest — but variable supply and you own the uptime
Heliode
managed, open-market sourced
marketplace-classMarketplace pricing, run for you — one bill, real support

Illustrative ranges; GPU pricing moves fast — always check current rates. Hyperscaler figures derive from list 8×H100 instance pricing divided per GPU.

The gap is the story: the chip costs the same. Going from ~$10/GPU-hr on-demand to ~$2.5–3 on the open market is a 60–75% cut on the single biggest line in most inference budgets.

Where a typical inference bill leaks

Beyond the raw rate, most bills bleed in predictable places:

How to cut it — in order of impact

See what you'd save in 10 seconds

Enter your monthly spend and where you buy today — our calculator ballparks what the same workload runs on Heliode. Then send your bill for an exact quote.

Open the savings calculator → Send us your bill