BlogReserved vs On-Demand GPU: When Each Makes Sense

GPU Infrastructure

On-demand GPU feels safer. Reserved saves 30–60%. The right choice comes down to one number: how many hours will this cluster actually run?

Reserved vs On-Demand GPU: When Each Makes Sense

GPUaaS.com
GPUaaS.com Team
GPU Procurement
May 17, 2026
Blog post cover image

Reserved GPU contracts at wholesale providers save 30–55% versus on-demand for the same silicon. The break-even is 64–68% utilisation — if your cluster runs above that threshold, reserved almost always wins on total cost of ownership.

Key takeaways
  • Reserved 1-year contracts run 30–55% below on-demand for the same GPU at wholesale providers [1]
  • Break-even utilisation sits at 64–68% — production inference clusters run at 85–95%, well into reserved territory
  • 8 H200 GPUs reserved vs on-demand for 6 months saves ~$43,000 at wholesale wholesale rates ($2.70 vs $4.00/hr)
  • Spot pricing runs 20–40% below on-demand but has no availability guarantee — suited for fault-tolerant batch jobs only [2]
  • Development environments and research sandboxes typically run at 30–50% utilisation — those belong on on-demand

This is a decision framework for GPU pricing models as of May 2026. Wholesale rates, break-even points, and real-world workload examples across H100, H200, and B200.

What Reserved and On-Demand GPU Pricing Actually Mean in 2026

30–60%
Reserved vs on-demand savings
~65%
Utilisation break-even point
3 mo
Typical reserved min-term
$0/hr
Reserved idle cost (you pay regardless)
◆ DEFINITIONS
What reserved and on-demand actually mean in GPU cloud

On-demand means you pay per hour, with no commitment. Spin up a cluster this afternoon and shut it down tonight. You are billed for exactly the hours you used, at a rate that reflects the provider's flexibility premium — typically 1.4–2.5x the equivalent reserved rate.

Reserved means you commit to a fixed number of GPU-hours per day for a defined term — typically 3 months, 6 months, or 1 year. In return, the provider holds that capacity for you and discounts the rate. You pay whether you use the capacity or not. A reserved contract for 8 H100 GPUs means those GPUs are yours for the term. If your training run finishes early, you're paying for idle metal.

Spot is a third option many providers offer: deeply discounted capacity that can be preempted with short notice when demand spikes. Spot pricing at wholesale providers sits 20–40% below on-demand, but with no availability guarantee. It works for fault-tolerant batch jobs. It doesn't work for production inference serving.

The Price Gap: What GPU Reserved Pricing Actually Saves in 2026

◆ PRICING DATA
The price gap: what reserved actually saves you

May 2026 wholesale rates across GPUaaS.com providers. [1] Hyperscale rates are included for comparison — the same silicon, different margin structure.

GPUOn-demand (wholesale)Reserved 1-yr (wholesale)SavingHyperscale on-demand
H100 SXM 80 GB$0.81–$2.49/hr$0.65–$1.69/hr~30–35%$4.59–$8.90/hr
H200 SXM 141 GB$3.50–$4.54/hr$2.25–$3.20/hr~35–40%$8.00–$13.78/hr
B200 SXM 192 GB$4.99–$6.19/hr$2.25–$4.50/hr~35–55%$10.00–$14.24/hr

At 8 H200 GPUs running continuously for 6 months, switching from wholesale on-demand ($4/hr) to wholesale reserved ($2.70/hr) saves approximately $43,000. The cluster is the same. The commitment is what changed.

When On-Demand GPU Is the Right Call for AI Workloads

◆ ON-DEMAND USE CASES
When on-demand is the right call

On-demand wins when flexibility is worth more than the premium. There are four clear scenarios where that's true:

Short, bounded experiments

Proof-of-concept runs, hyperparameter sweeps, or architecture evaluations that will complete in under 4 weeks. The total GPU-hours are low enough that the on-demand premium costs less than the minimum reserved commitment.

Unpredictable usage patterns

Research teams that run intensively for a week, then go quiet for two. Bursty inference loads driven by product launches or marketing campaigns. Anywhere utilisation is likely to average below 50%, reserved's fixed cost structure becomes a liability.

GPU type uncertainty

If you're still evaluating whether H200 or B200 is the right architecture for your workload, on-demand lets you benchmark both before locking in a reserved contract. Committing to the wrong GPU for a year is more expensive than a few weeks of on-demand testing.

Pre-production staging

Spinning up a replica cluster to validate deployment configs, test failover, or run load tests before a launch. This capacity is needed once, briefly. On-demand is the obvious fit.

◆ RULE OF THUMB
If your projected utilisation is below 60% of capacity over the contract term, on-demand almost always wins on total cost.

When Reserved GPU Contracts Make More Financial Sense

◆ RESERVED USE CASES
When reserved makes more sense

Reserved wins when you have predictable, sustained demand and the commitment risk is lower than the cost risk of staying on-demand.

Production inference serving

Once a model is live, it runs 24/7. A production inference cluster that serves real users has near-100% utilisation by definition. Reserved pricing drops the effective rate by 35–55% versus on-demand, with no change to the operational setup.

Long training runs (3+ months)

Pre-training a foundation model or fine-tuning a large model across months of GPU time. The utilisation will be near-continuous, and the GPU type is known. Reserved locks in a lower rate for a workload with predictable runtime.

Guaranteed capacity needs

On-demand capacity is not guaranteed. During periods of high demand — major model releases, end-of-quarter GPU rushes — on-demand availability drops. Reserved contracts hold your allocation. If you cannot afford a failed capacity request, reserved is insurance as much as it is a discount.

Budget predictability requirements

Finance teams, compliance environments, and enterprise procurement processes often require fixed monthly GPU spend. Reserved contracts deliver exactly that. On-demand introduces variance that makes cost forecasting difficult above modest cluster sizes.

GPU Reserved vs On-Demand Break-Even Calculation by Utilisation Rate

◆ BREAK-EVEN MATH
The break-even calculation

The crossover point between reserved and on-demand is straightforward. Reserved costs Cr per hour whether used or not. On-demand costs Cod only when running. If your utilisation rate is U, your effective on-demand cost per reserved-hour is Cod × U.

Reserved wins when: Cr < Cod × U, or equivalently when utilisation exceeds: U > Cr / Cod

GPUOn-demand rateReserved rateBreak-even utilisation
H100 SXM (wholesale)$2.00/hr$1.30/hr65%
H200 SXM (wholesale)$4.00/hr$2.70/hr68%
B200 SXM (wholesale)$5.50/hr$3.50/hr64%

If you expect to use a cluster more than 65% of available hours over the contract term, reserved wins. Production inference clusters run at 85–95%. Long training runs run at 90%+. Both are deeply in reserved territory. Research sandboxes and dev environments often run below 40%. Those belong on on-demand.

The Hybrid GPU Pricing Approach: Combining Reserved and On-Demand

◆ HYBRID APPROACH
The hybrid approach: combining both

Most teams at scale end up running a hybrid. Reserve capacity for the baseline — the minimum GPU-hours you'll need regardless of what happens. Use on-demand for burst above that baseline.

A practical structure for a production inference team: reserve 8 H200 GPUs for the steady-state serving load. Use on-demand or spot when a new model evaluation, batch re-ranking job, or traffic spike requires additional capacity. The reserved base is always available and priced efficiently. The on-demand overhead is occasional and bounded.

For teams beginning GPU procurement, a common entry sequence: start on-demand to validate the workload and measure actual utilisation, then switch to reserved once the pattern is stable. The on-demand phase typically lasts 4–8 weeks. After that, most production workloads have enough data to size a reserved contract accurately. GPUaaS.com can quote both reserved and on-demand options for the same cluster so you can compare total cost before committing. See also: why wholesale GPU pricing beats hyperscale and current B200 availability and pricing.

◆ FAQ
Frequently asked questions

At wholesale providers, reserved 1-year rates run 30–55% below on-demand for the same GPU. H200 SXM reserved starts at approximately $2.25/hr versus $3.50/hr on-demand at the same provider tier. For B200, reserved rates start near $2.25/hr while on-demand runs $4.99–$6.19/hr at wholesale. Hyperscale on-demand rates are 2–5x higher than wholesale on-demand for identical hardware.

Most wholesale providers offer 3-month, 6-month, and 1-year terms. Shorter terms carry less discount — typically 15–25% below on-demand for 3 months versus 35–55% for 1 year. Some providers offer monthly reserved with a smaller discount (10–15%) for teams that need flexibility but still want budget predictability.

Depends on the provider. Most wholesale reserved contracts have early termination fees or are non-cancellable — you've committed to the full term. Some providers allow partial scale-down after 60 days, or offer contract transfer if your situation changes. Confirm the cancellation terms before signing. This is the primary risk of reserved: if your workload ends early or your GPU requirements change, you continue paying for allocated capacity.

The break-even sits at roughly 64–68% depending on GPU type, based on current wholesale pricing. If you expect to use your reserved cluster more than 65% of available hours over the contract term, reserved almost always costs less in total. Production inference clusters typically run at 85–95% utilisation. Long training runs at 90%+. Both are well into reserved territory. Development environments and research sandboxes often run at 30–50% — those belong on on-demand.

Yes. Reserved contracts hold capacity specifically for you. The provider cannot allocate your reserved GPUs to another customer, regardless of demand spikes. On-demand capacity can disappear when demand is high — this happened across multiple providers during the B200 rush in Q1 2026. If continuous availability is a hard requirement for your workload, reserved is the only model that guarantees it. Get a reserved GPU quote through GPUaaS.com.

Last reviewed: May 19, 2026. Pricing data from [1] getdeploying.com (22+ providers tracked) and [2] Spheron GPU pricing (May 14, 2026). Direct wholesale provider quotes via GPUaaS.com. Compare reserved and on-demand GPU options at GPUaaS.com.

Share this article:LinkedInX / TwitterCopy link
FIND THE BEST GPU DEAL

Get a wholesale GPU quote in a few hours

NVIDIA B200, H200, H100, A100, RTX Pro 6000 — N. America, EU, MEA, APAC. No buyer fees.

Related articles