The same H100, three very different prices
A GPU-hour is a GPU-hour — an NVIDIA H100 is the same chip whether you rent it from a hyperscaler or a marketplace. What changes wildly is the markup. Rough per-GPU-hour ranges for H100-class hardware:
| Where you buy | ~ $ / GPU-hour | Trade-off |
|---|---|---|
| Hyperscaler on-demand AWS / GCP / Azure | $8 – $13 | Convenient, trusted — and the most expensive way to buy |
| Neocloud / committed Lambda, CoreWeave, Crusoe… | $2.5 – $4 | Much cheaper, but you commit and you run it |
| Marketplace / spot RunPod, Vast, TensorDock… | $1.8 – $3 | Cheapest — but variable supply and you own the uptime |
| Heliode managed, open-market sourced | marketplace-class | Marketplace pricing, run for you — one bill, real support |
Illustrative ranges; GPU pricing moves fast — always check current rates. Hyperscaler figures derive from list 8×H100 instance pricing divided per GPU.
The gap is the story: the chip costs the same. Going from ~$10/GPU-hr on-demand to ~$2.5–3 on the open market is a 60–75% cut on the single biggest line in most inference budgets.
Where a typical inference bill leaks
Beyond the raw rate, most bills bleed in predictable places:
- On-demand instead of committed. Paying the most flexible (most expensive) rate for workloads that actually run 24/7.
- Over-specced GPUs. Serving models on H100s that an A100, L40S, or A10G would handle within the same latency budget for a fraction of the price.
- Low utilization. Reserved capacity sitting idle between traffic peaks — you pay for the hour whether it's busy or not.
- No batching. Skipping continuous batching (vLLM/TGI) leaves 2–4× throughput per GPU on the table.
- Egress & surprise fees. Data-transfer and ancillary charges that don't show up until the invoice.
How to cut it — in order of impact
- Move off hyperscaler on-demand for steady workloads. Biggest single lever.
- Right-size the GPU to the model and your real latency target.
- Batch and quantize (vLLM/TGI, fp8/int8) to get more out of each card.
- Keep utilization high — or buy managed capacity so someone else carries that risk.
See what you'd save in 10 seconds
Enter your monthly spend and where you buy today — our calculator ballparks what the same workload runs on Heliode. Then send your bill for an exact quote.
Open the savings calculator → Send us your bill