How much cheaper is reserved GPU compared to on-demand?

At wholesale providers, reserved 1-year rates run 30–55% below on-demand for the same GPU. H200 SXM reserved starts at approximately $2.25/hr versus $3.50/hr on-demand. For B200, reserved rates start near $2.25/hr while on-demand runs $4.99–$6.19/hr at wholesale.

What is the minimum term for a GPU reserved contract?

Most wholesale providers offer 3-month, 6-month, and 1-year terms. Shorter terms carry less discount — typically 15–25% below on-demand for 3 months versus 35–55% for 1 year.

Can I cancel or resize a reserved GPU contract early?

Depends on the provider. Most wholesale reserved contracts have early termination fees or are non-cancellable. Some providers allow partial scale-down after 60 days, or offer contract transfer if your situation changes.

Does reserved GPU guarantee availability during high-demand periods?

Yes. Reserved contracts hold capacity specifically for you. The provider cannot allocate your reserved GPUs to another customer, regardless of demand spikes. On-demand capacity can disappear when demand is high.

Reserved vs On-Demand GPU: When Each Makes Sense (2026)

Q: What utilisation rate makes reserved GPU worth it?

The break-even sits at roughly 64–68% depending on GPU type. If you expect to use your reserved cluster more than 65% of available hours over the contract term, reserved almost always costs less in total. Production inference clusters typically run at 85–95% utilisation.

Reserved GPU contracts at wholesale providers save 30–55% versus on-demand for the same silicon. The break-even is 64–68% utilisation — if your cluster runs above that threshold, reserved almost always wins on total cost of ownership.

Key takeaways

Reserved 1-year contracts run 30–55% below on-demand for the same GPU at wholesale providers ^[1]
Break-even utilisation sits at 64–68% — production inference clusters run at 85–95%, well into reserved territory
8 H200 GPUs reserved vs on-demand for 6 months saves ~$43,000 at wholesale wholesale rates ($2.70 vs $4.00/hr)
Spot pricing runs 20–40% below on-demand but has no availability guarantee — suited for fault-tolerant batch jobs only ^[2]
Development environments and research sandboxes typically run at 30–50% utilisation — those belong on on-demand

This is a decision framework for GPU pricing models as of May 2026. Wholesale rates, break-even points, and real-world workload examples across H100, H200, and B200.

In this article

01What Reserved and On-Demand Actually Mean
02The Price Gap: What Reserved Actually Saves
03When On-Demand Is the Right Call
04When Reserved Makes More Sense
05The Break-Even Calculation
06The Hybrid Approach: Combining Both

What Reserved and On-Demand GPU Pricing Actually Mean in 2026

30–60%

Reserved vs on-demand savings

~65%

Utilisation break-even point

3 mo

Typical reserved min-term

$0/hr

Reserved idle cost (you pay regardless)

◆ DEFINITIONS

What reserved and on-demand actually mean in GPU cloud

On-demand means you pay per hour, with no commitment. Spin up a cluster this afternoon and shut it down tonight. You are billed for exactly the hours you used, at a rate that reflects the provider's flexibility premium — typically 1.4–2.5x the equivalent reserved rate.

Reserved means you commit to a fixed number of GPU-hours per day for a defined term — typically 3 months, 6 months, or 1 year. In return, the provider holds that capacity for you and discounts the rate. You pay whether you use the capacity or not. A reserved contract for 8 H100 GPUs means those GPUs are yours for the term. If your training run finishes early, you're paying for idle metal.

Spot is a third option many providers offer: deeply discounted capacity that can be preempted with short notice when demand spikes. Spot pricing at wholesale providers sits 20–40% below on-demand, but with no availability guarantee. It works for fault-tolerant batch jobs. It doesn't work for production inference serving.

The Price Gap: What GPU Reserved Pricing Actually Saves in 2026

◆ PRICING DATA

The price gap: what reserved actually saves you

May 2026 wholesale rates across GPUaaS.com providers. ^[1] Hyperscale rates are included for comparison — the same silicon, different margin structure.

GPU	On-demand (wholesale)	Reserved 1-yr (wholesale)	Saving	Hyperscale on-demand
H100 SXM 80 GB	$0.81–$2.49/hr	$0.65–$1.69/hr	~30–35%	$4.59–$8.90/hr
H200 SXM 141 GB	$3.50–$4.54/hr	$2.25–$3.20/hr	~35–40%	$8.00–$13.78/hr
B200 SXM 192 GB	$4.99–$6.19/hr	$2.25–$4.50/hr	~35–55%	$10.00–$14.24/hr

At 8 H200 GPUs running continuously for 6 months, switching from wholesale on-demand ($4/hr) to wholesale reserved ($2.70/hr) saves approximately $43,000. The cluster is the same. The commitment is what changed.

When On-Demand GPU Is the Right Call for AI Workloads

◆ ON-DEMAND USE CASES

When on-demand is the right call

On-demand wins when flexibility is worth more than the premium. There are four clear scenarios where that's true:

Short, bounded experiments

Proof-of-concept runs, hyperparameter sweeps, or architecture evaluations that will complete in under 4 weeks. The total GPU-hours are low enough that the on-demand premium costs less than the minimum reserved commitment.

Unpredictable usage patterns

Research teams that run intensively for a week, then go quiet for two. Bursty inference loads driven by product launches or marketing campaigns. Anywhere utilisation is likely to average below 50%, reserved's fixed cost structure becomes a liability.

GPU type uncertainty

If you're still evaluating whether H200 or B200 is the right architecture for your workload, on-demand lets you benchmark both before locking in a reserved contract. Committing to the wrong GPU for a year is more expensive than a few weeks of on-demand testing.

Pre-production staging

Spinning up a replica cluster to validate deployment configs, test failover, or run load tests before a launch. This capacity is needed once, briefly. On-demand is the obvious fit.

◆ RULE OF THUMB

If your projected utilisation is below 60% of capacity over the contract term, on-demand almost always wins on total cost.

When Reserved GPU Contracts Make More Financial Sense

◆ RESERVED USE CASES

When reserved makes more sense

Reserved wins when you have predictable, sustained demand and the commitment risk is lower than the cost risk of staying on-demand.

Production inference serving

Once a model is live, it runs 24/7. A production inference cluster that serves real users has near-100% utilisation by definition. Reserved pricing drops the effective rate by 35–55% versus on-demand, with no change to the operational setup.

Long training runs (3+ months)

Pre-training a foundation model or fine-tuning a large model across months of GPU time. The utilisation will be near-continuous, and the GPU type is known. Reserved locks in a lower rate for a workload with predictable runtime.

Guaranteed capacity needs

On-demand capacity is not guaranteed. During periods of high demand — major model releases, end-of-quarter GPU rushes — on-demand availability drops. Reserved contracts hold your allocation. If you cannot afford a failed capacity request, reserved is insurance as much as it is a discount.

Budget predictability requirements

Finance teams, compliance environments, and enterprise procurement processes often require fixed monthly GPU spend. Reserved contracts deliver exactly that. On-demand introduces variance that makes cost forecasting difficult above modest cluster sizes.

GPU Reserved vs On-Demand Break-Even Calculation by Utilisation Rate

◆ BREAK-EVEN MATH

The break-even calculation

The crossover point between reserved and on-demand is straightforward. Reserved costs C_r per hour whether used or not. On-demand costs C_od only when running. If your utilisation rate is U, your effective on-demand cost per reserved-hour is C_od × U.

Reserved wins when: C_r < C_od × U, or equivalently when utilisation exceeds: U > C_r / C_od

GPU	On-demand rate	Reserved rate	Break-even utilisation
H100 SXM (wholesale)	$2.00/hr	$1.30/hr	65%
H200 SXM (wholesale)	$4.00/hr	$2.70/hr	68%
B200 SXM (wholesale)	$5.50/hr	$3.50/hr	64%

If you expect to use a cluster more than 65% of available hours over the contract term, reserved wins. Production inference clusters run at 85–95%. Long training runs run at 90%+. Both are deeply in reserved territory. Research sandboxes and dev environments often run below 40%. Those belong on on-demand.

The Hybrid GPU Pricing Approach: Combining Reserved and On-Demand

◆ HYBRID APPROACH

The hybrid approach: combining both

Most teams at scale end up running a hybrid. Reserve capacity for the baseline — the minimum GPU-hours you'll need regardless of what happens. Use on-demand for burst above that baseline.

A practical structure for a production inference team: reserve 8 H200 GPUs for the steady-state serving load. Use on-demand or spot when a new model evaluation, batch re-ranking job, or traffic spike requires additional capacity. The reserved base is always available and priced efficiently. The on-demand overhead is occasional and bounded.

For teams beginning GPU procurement, a common entry sequence: start on-demand to validate the workload and measure actual utilisation, then switch to reserved once the pattern is stable. The on-demand phase typically lasts 4–8 weeks. After that, most production workloads have enough data to size a reserved contract accurately. GPUaaS.com can quote both reserved and on-demand options for the same cluster so you can compare total cost before committing. See also: why wholesale GPU pricing beats hyperscale and current B200 availability and pricing.

◆ FAQ

Frequently asked questions

Last reviewed: May 19, 2026. Pricing data from [1] getdeploying.com (22+ providers tracked) and [2] Spheron GPU pricing (May 14, 2026). Direct wholesale provider quotes via GPUaaS.com. Compare reserved and on-demand GPU options at GPUaaS.com.

Reserved vs On-Demand GPU: When Each Makes Sense

What Reserved and On-Demand GPU Pricing Actually Mean in 2026

The Price Gap: What GPU Reserved Pricing Actually Saves in 2026

When On-Demand GPU Is the Right Call for AI Workloads

When Reserved GPU Contracts Make More Financial Sense

GPU Reserved vs On-Demand Break-Even Calculation by Utilisation Rate

The Hybrid GPU Pricing Approach: Combining Reserved and On-Demand

Get a wholesale GPU quote in a few hours

Related articles

Your Idle H100s Are Losing $15,000 a Month. Here's What Enterprises Are Doing About It.

Right-Sizing Your GPUs Will Save You 30%. Where You Rent Them Saves You Another 30% on Top.

Nobody Tells You How the GPU Market Actually Works