Reserved GPU contracts at wholesale providers save 30–55% versus on-demand for the same silicon. The break-even is 64–68% utilisation — if your cluster runs above that threshold, reserved almost always wins on total cost of ownership.
- Reserved 1-year contracts run 30–55% below on-demand for the same GPU at wholesale providers [1]
- Break-even utilisation sits at 64–68% — production inference clusters run at 85–95%, well into reserved territory
- 8 H200 GPUs reserved vs on-demand for 6 months saves ~$43,000 at wholesale wholesale rates ($2.70 vs $4.00/hr)
- Spot pricing runs 20–40% below on-demand but has no availability guarantee — suited for fault-tolerant batch jobs only [2]
- Development environments and research sandboxes typically run at 30–50% utilisation — those belong on on-demand
This is a decision framework for GPU pricing models as of May 2026. Wholesale rates, break-even points, and real-world workload examples across H100, H200, and B200.
In this article
What Reserved and On-Demand GPU Pricing Actually Mean in 2026
On-demand means you pay per hour, with no commitment. Spin up a cluster this afternoon and shut it down tonight. You are billed for exactly the hours you used, at a rate that reflects the provider's flexibility premium — typically 1.4–2.5x the equivalent reserved rate.
Reserved means you commit to a fixed number of GPU-hours per day for a defined term — typically 3 months, 6 months, or 1 year. In return, the provider holds that capacity for you and discounts the rate. You pay whether you use the capacity or not. A reserved contract for 8 H100 GPUs means those GPUs are yours for the term. If your training run finishes early, you're paying for idle metal.
Spot is a third option many providers offer: deeply discounted capacity that can be preempted with short notice when demand spikes. Spot pricing at wholesale providers sits 20–40% below on-demand, but with no availability guarantee. It works for fault-tolerant batch jobs. It doesn't work for production inference serving.
The Price Gap: What GPU Reserved Pricing Actually Saves in 2026
May 2026 wholesale rates across GPUaaS.com providers. [1] Hyperscale rates are included for comparison — the same silicon, different margin structure.
At 8 H200 GPUs running continuously for 6 months, switching from wholesale on-demand ($4/hr) to wholesale reserved ($2.70/hr) saves approximately $43,000. The cluster is the same. The commitment is what changed.
When On-Demand GPU Is the Right Call for AI Workloads
On-demand wins when flexibility is worth more than the premium. There are four clear scenarios where that's true:
Proof-of-concept runs, hyperparameter sweeps, or architecture evaluations that will complete in under 4 weeks. The total GPU-hours are low enough that the on-demand premium costs less than the minimum reserved commitment.
Research teams that run intensively for a week, then go quiet for two. Bursty inference loads driven by product launches or marketing campaigns. Anywhere utilisation is likely to average below 50%, reserved's fixed cost structure becomes a liability.
If you're still evaluating whether H200 or B200 is the right architecture for your workload, on-demand lets you benchmark both before locking in a reserved contract. Committing to the wrong GPU for a year is more expensive than a few weeks of on-demand testing.
Spinning up a replica cluster to validate deployment configs, test failover, or run load tests before a launch. This capacity is needed once, briefly. On-demand is the obvious fit.
When Reserved GPU Contracts Make More Financial Sense
Reserved wins when you have predictable, sustained demand and the commitment risk is lower than the cost risk of staying on-demand.
Once a model is live, it runs 24/7. A production inference cluster that serves real users has near-100% utilisation by definition. Reserved pricing drops the effective rate by 35–55% versus on-demand, with no change to the operational setup.
Pre-training a foundation model or fine-tuning a large model across months of GPU time. The utilisation will be near-continuous, and the GPU type is known. Reserved locks in a lower rate for a workload with predictable runtime.
On-demand capacity is not guaranteed. During periods of high demand — major model releases, end-of-quarter GPU rushes — on-demand availability drops. Reserved contracts hold your allocation. If you cannot afford a failed capacity request, reserved is insurance as much as it is a discount.
Finance teams, compliance environments, and enterprise procurement processes often require fixed monthly GPU spend. Reserved contracts deliver exactly that. On-demand introduces variance that makes cost forecasting difficult above modest cluster sizes.
GPU Reserved vs On-Demand Break-Even Calculation by Utilisation Rate
The crossover point between reserved and on-demand is straightforward. Reserved costs Cr per hour whether used or not. On-demand costs Cod only when running. If your utilisation rate is U, your effective on-demand cost per reserved-hour is Cod × U.
Reserved wins when: Cr < Cod × U, or equivalently when utilisation exceeds: U > Cr / Cod
If you expect to use a cluster more than 65% of available hours over the contract term, reserved wins. Production inference clusters run at 85–95%. Long training runs run at 90%+. Both are deeply in reserved territory. Research sandboxes and dev environments often run below 40%. Those belong on on-demand.
The Hybrid GPU Pricing Approach: Combining Reserved and On-Demand
Most teams at scale end up running a hybrid. Reserve capacity for the baseline — the minimum GPU-hours you'll need regardless of what happens. Use on-demand for burst above that baseline.
A practical structure for a production inference team: reserve 8 H200 GPUs for the steady-state serving load. Use on-demand or spot when a new model evaluation, batch re-ranking job, or traffic spike requires additional capacity. The reserved base is always available and priced efficiently. The on-demand overhead is occasional and bounded.
For teams beginning GPU procurement, a common entry sequence: start on-demand to validate the workload and measure actual utilisation, then switch to reserved once the pattern is stable. The on-demand phase typically lasts 4–8 weeks. After that, most production workloads have enough data to size a reserved contract accurately. GPUaaS.com can quote both reserved and on-demand options for the same cluster so you can compare total cost before committing. See also: why wholesale GPU pricing beats hyperscale and current B200 availability and pricing.
Last reviewed: May 19, 2026. Pricing data from [1] getdeploying.com (22+ providers tracked) and [2] Spheron GPU pricing (May 14, 2026). Direct wholesale provider quotes via GPUaaS.com. Compare reserved and on-demand GPU options at GPUaaS.com.



