GPU billing has three shapes: pay-as-you-go (PAYG), reserved/committed contracts, and spot/preemptible. Each has a different cash flow profile, a different risk profile, and a different right answer depending on your utilisation. Stop hunting for GPU compute. GPUaaS.com gets you enterprise NVIDIA infrastructure at rates hyperscalers won't offer you, on whichever billing model your finance team can actually live with.
- Three billing models dominate GPU procurement in 2026: PAYG (highest unit cost, full flexibility), reserved (20-40% lower rate, multi-month to multi-year commit), and spot (60-80% discount, interruptible)
- The break-even utilisation between PAYG and reserved sits around 65-70%. Above that, reserved wins. Below it, PAYG is cheaper despite the higher headline rate
- Spot pricing is only viable for fault-tolerant workloads with checkpointing. For production inference with SLAs, spot is rarely a net win once interruption costs are modelled
- Hyperscalers require 1 to 3-year non-cancellable commits to access meaningful reserved discounts. GPUaaS.com offers both short-term and long-term contracts without that lock-in
- For finance teams, the maths comes down to three numbers: forecast utilisation, contract length, and the cost of being wrong. Get those right and the model picks itself
Most GPU procurement conversations start with the GPU model. They should start with the billing model. The hardware decision is well-understood by engineering. The contract structure decision determines your cash flow, your downside risk, and whether your budget holds for the year. This guide is for the finance and procurement leads making that call, with the maths in dollar terms. For the underlying rate structures across GPUs, see the GPU pricing guide.
Every GPU provider, hyperscaler or otherwise, offers some variation on three models. The names differ. The mechanics don't.
| Model | Rate | Commit | Cancellable | Best for |
|---|---|---|---|---|
| PAYG / On-demand | Baseline (highest) | None to monthly | Yes, anytime | Pre-production, bursty workloads, unknown utilisation |
| Reserved / Committed | 20-40% below PAYG | 3 months to 3 years | Usually no | Production inference, stable training pipelines |
| Spot / Preemptible | 60-80% below PAYG | None (interruptible) | N/A — provider can reclaim | Fault-tolerant batch jobs, research checkpoints |
The headline rate is the easy part. The real differences are in the contract terms and the implicit cost of each model. PAYG gives you flexibility but you pay for it on every hour. Reserved gives you a better rate but locks up working capital and creates downside risk if your workload evolves. Spot gives you the deepest discount but offloads continuity risk to your engineering team.
GPUaaS.com offers both short-term and long-term contracts at rates hyperscalers won't offer you, without the multi-year lock-in that hyperscaler reserved pricing typically requires. Get a quote and see what each model looks like for your workload.
Pay-as-you-go is the cleanest model from a finance perspective. You pay only for the GPU-hours you consume. No commitment, no contract term, no early-termination fee. The downside: you pay the highest unit rate, and providers typically reserve the right to reprovision your capacity if demand spikes elsewhere.
PAYG makes sense in three specific situations. First, when you don't yet know your utilisation profile, typically the first 30 to 90 days of a new workload. Second, when usage is genuinely bursty: training runs that fire monthly, batch processing windows, or research workloads that don't justify persistent capacity. Third, when you're evaluating providers and don't want to lock in before you know whether the infrastructure performs.
The hidden cost of PAYG isn't the rate, it's the operational overhead. Teams running pure PAYG often forget to turn clusters off. Idle H100 capacity sitting at 5% utilisation costs the same as H100 capacity running at 80%, and that's where most PAYG budgets quietly evaporate. The GPU bill spike post covers the utilisation maths in detail.
Finance perspective
PAYG is OpEx with no balance-sheet impact. Cash flow tracks usage. If usage doubles next quarter, so does the bill. If it halves, so does the bill. Predictable in pattern, unpredictable in amount.
Reserved (also called committed-use, capacity reservations, or savings plans depending on the provider) trades flexibility for a lower rate. You commit to a fixed amount of GPU capacity over a defined term. In exchange, you typically save 20 to 40% off PAYG rates. On hyperscalers, deeper discounts require longer commits, with 1-year savings plans giving roughly 30% off and 3-year reserved instances reaching 50% or more.
$553,000
saved on a 256-GPU H100 cluster over 6 months at a $0.50/GPU/hr rate difference
256 GPUs x $0.50 x 24hrs x 180 days. GPUaaS.com offers up to ~30% less than hyperscaler reserved rates.
The compounding maths are what make reserved compelling at scale. A $0.50/GPU/hr rate difference between PAYG and reserved on an H100 cluster over six months:
| Cluster size | Rate gap | 6-month saving | In human terms |
|---|---|---|---|
| 8-GPU H100 | $0.50/GPU/hr | ~$17,000 | A month of senior eng time |
| 32-GPU H100 | $0.50/GPU/hr | ~$69,000 | A senior ML hire |
| 80-GPU H100 | $0.50/GPU/hr | ~$173,000 | Two senior engineers |
| 256-GPU H100 | $0.50/GPU/hr | ~$553,000 | Your next model training run |
Based on 24/7 operation over 180 days. GPUaaS.com offers up to ~30% less than hyperscaler reserved rates.
The catch on hyperscaler reserved contracts: they're typically non-cancellable. If your workload evolves, your model architecture changes, or you find a better rate mid-term, you continue paying. AWS Savings Plans, Azure Reserved Instances, and GCP Committed Use Discounts all carry this risk. Finance teams need to model the commit as a fixed liability, not a flexible spend line.
GPUaaS.com's contract terms are shorter and more flexible than hyperscaler 1 to 3-year commitments. You can start with a shorter contract and extend as your workload matures, without locking in multi-year spend upfront. For the deep-dive framework on when reserved makes economic sense, see the reserved vs on-demand GPU guide.
Spot pricing offers the deepest discount, typically 60 to 80% off PAYG. The catch: the provider can reclaim your capacity with minutes of notice when on-demand demand spikes. For finance teams modelling spot into a budget, the headline rate is half the picture. The other half is the operational cost of building, running, and recovering from interruptions.
Spot makes sense for a narrow set of workloads. Batch training runs with frequent checkpointing. Embedding generation. Research workloads where 24-hour delays are acceptable. Anything fault-tolerant where the job can resume from a checkpoint when capacity returns.
Spot does not make sense for production inference with SLAs, real-time serving, customer-facing APIs, or anything where interruption translates to revenue loss. The discount looks attractive in the spreadsheet. Once you model the engineering cost of building checkpointing infrastructure, the cost of interruptions during peak demand (exactly when you need capacity most), and the SLA risk, the maths often reverses.
⚠ Watch out
Spot capacity tends to disappear at the worst possible moments, during demand spikes when reserved customers are also burning through their capacity. Your spot capacity gets reclaimed exactly when you need GPUs most. Build that risk into the model before committing the architecture.
For most enterprise teams, the cleanest finance model is reserved for the baseline (predictable, stable workloads) plus PAYG for the burst (training runs, experimentation). Spot fits as a tactical layer for specific batch workloads, not as a strategic billing model.
The PAYG-vs-reserved break-even point depends on your utilisation. The maths are simple. If your reserved rate is 30% below PAYG, you break even at 70% utilisation. Above 70%, reserved is cheaper. Below 70%, PAYG is cheaper, despite the higher headline rate.
Worked example. You're considering an 8-GPU H200 cluster over 12 months. PAYG rate: $3.50/GPU/hr. Reserved rate: $2.45/GPU/hr (30% discount). The maths:
| Utilisation | PAYG (12 mo) | Reserved (12 mo) | Better choice |
|---|---|---|---|
| 30% | ~$73,500 | ~$171,500 | PAYG |
| 50% | ~$122,500 | ~$171,500 | PAYG |
| 70% (break-even) | ~$171,500 | ~$171,500 | Either |
| 85% | ~$208,300 | ~$171,500 | Reserved |
| 95% | ~$232,800 | ~$171,500 | Reserved |
PAYG paid only on hours used. Reserved paid on full 12-month commit regardless of utilisation. Indicative GPUaaS.com H200 rates.
Two important caveats. First, average GPU utilisation across enterprise Kubernetes clusters in 2026 sits at 5% (Cast AI, April 2026, 23,000 clusters measured). Most teams aren't running anywhere near 70% utilisation on day one. Second, utilisation isn't fixed. A workload that runs at 40% utilisation in month 1 might hit 80% by month 6 as adoption grows. Reserved makes more sense after you've validated the utilisation, not before.
The break-even between PAYG and a 30%-discounted reserved contract sits at 70% utilisation. Above that, reserved wins. Below it, the discount on the reserved rate is wiped out by paying for capacity you don't use.
Three questions answer the billing model question. Get all three answered before signing anything.
Your search for enterprise GPU compute ends here.
NVIDIA infrastructure at rates hyperscalers won't offer you. Short-term and long-term contracts. PAYG and reserved options across H100, H200, B200, B300. Competing quotes within 24 hours.
Get a quote and see the maths for your workloadLast reviewed: June 4, 2026. GPU utilisation data from Cast AI 2026 State of Kubernetes Optimisation Report (April 2026, 23,000 clusters). Hyperscaler reserved discount ranges from published AWS, Azure, and GCP pricing pages. GPUaaS.com rates are indicative, contract-based, and quote-dependent on cluster size and contract length.



