BlogWhat an H100 Really Costs Per Hour in 2026

GPU Infrastructure

An H100 SXM5 starts from ~$2.50/GPU/hr through GPUaaS.com on a short-term or long-term contract. The equivalent AWS p5.48xlarge on a 1-year Savings Plan runs ~$3.78/GPU/hr. Here's the full breakdown including SXM5 vs PCIe, hidden costs, and cost per token.

What an H100 Really Costs Per Hour in 2026

GPUaaS.com Team
GPUaaS.com Team
Infrastructure Research
May 31, 2026
Blog post cover image

An H100 SXM5 starts from ~$2.50/GPU/hr through GPUaaS.com on a short-term or long-term contract. The same GPU on AWS p5.48xlarge with a 1-year Savings Plan runs ~$3.78/GPU/hr. That's a real gap for identical silicon, and the headline rate is only part of the story. Egress fees, storage, support tiers, and utilisation rates all change what an H100 actually costs your team per useful output.

Key takeaways
  • H100 SXM5 starts from ~$2.50/GPU/hr through GPUaaS.com (contract-based, short-term and long-term). The AWS p5.48xlarge 1-year Savings Plan runs ~$3.78/GPU/hr. AWS on-demand list is $6.88/GPU/hr
  • H100 PCIe costs 30 to 40% less than H100 SXM5 per hour but delivers 20 to 25% lower memory bandwidth (2.0 TB/s vs 3.35 TB/s). For memory-bound inference workloads, the price difference doesn't close the throughput gap
  • GPUaaS.com's commit terms are shorter and more flexible than the 1 to 3-year commitments hyperscalers typically require to access meaningful discounts
  • Hidden costs that don't show up in the headline rate: egress fees ($0.08 to $0.15/GB on hyperscalers), storage ($0.08 to $0.23/GB/month), and overprovisioning at 5% average GPU utilisation (Cast AI, 2026)
  • At 80% utilisation running Llama 3 70B at FP8, an H100 SXM5 at ~$2.50/hr delivers roughly $0.021 per 1,000 tokens. The same workload on AWS at the 1-year Savings Plan rate (~$3.78/hr) costs ~$0.032 per 1,000 tokens
◆ CONTRACT RATES
H100 contract rates: GPUaaS.com vs hyperscaler reserved

GPUaaS.com is contract-based, with both short-term and long-term commits available. Since GPUaaS.com doesn't sell on-demand access, the right comparison is contract vs contract: GPUaaS.com's rate against the AWS 1-year Savings Plan, which is the closest hyperscaler equivalent. For a full breakdown of how hyperscaler cost structures drive the gap, see the GPU pricing guide.

Provider / billing modeInstance / clusterPer GPU/hrvs GPUaaS.com
GPUaaS.com (contract)8x H100 SXM5~$2.50/GPU/hrBaseline
AWS (1-yr Savings Plan)p5.48xlarge (8x H100 SXM5)~$3.78/GPU/hr+51%
AWS (on-demand, for reference)p5.48xlarge (8x H100 SXM5)$6.88/GPU/hr+175%

AWS p5.48xlarge on-demand rate from Vantage.sh, June 1, 2026 ($55.04/hr total, 8 GPUs). 1-year EC2 Instance Savings Plan rate is 45% off on-demand per official AWS pricing announcement, June 2025 ($30.27/hr total, ~$3.78/GPU/hr). GPUaaS.com rate is indicative, contract-based, and quote-dependent on cluster size, contract length, and region.

The gap comes from cost structure, not silicon. Hyperscalers layer platform fees, compliance infrastructure, support tiers, and egress pricing on top of the raw compute cost. The H100 inside every one of these instances is identical NVIDIA hardware. For the full breakdown, see the wholesale vs hyperscale GPU pricing guide.

◆ On flexibility

GPUaaS.com offers both short-term and long-term contracts without the multi-year lock-in that hyperscaler Savings Plans and Reserved Instances require. AWS's 1-year Savings Plan is the shortest commitment to access a meaningful discount. GPUaaS.com's commit terms start shorter than that.

According to Vantage.sh data updated June 1, 2026, the AWS p5.48xlarge on-demand rate in us-east-1 is $55.04/hr for 8x H100 SXM5 GPUs. AWS's 1-year EC2 Instance Savings Plan brings this to ~$30.27/hr (~$3.78/GPU/hr), compared to ~$2.50/GPU/hr through GPUaaS.com on a contract basis.

◆ SXM5 VS PCIe
H100 SXM5 vs PCIe: which one you're actually renting

Not all H100 GPUs are equal. NVIDIA ships two form factors: SXM5, which mounts directly to the HGX baseboard with NVLink 4.0 at 900 GB/s inter-GPU bandwidth, and PCIe, which slots into standard server motherboards. The specs differ in ways that matter for production workloads.

SpecH100 SXM5H100 PCIe
Memory bandwidth3.35 TB/s2.0 TB/s
GPU memory80 GB HBM380 GB HBM2e
FP8 TFLOPS3,958 TFLOPS3,026 TFLOPS
Inter-GPU interconnectNVLink 4.0 (900 GB/s)PCIe Gen5 (~128 GB/s)
TDP700W350W

The PCIe form factor is 25 to 35% cheaper per hour and uses half the power. For single-GPU inference on sub-30B models, that's a reasonable trade. Once your job needs multiple H100 GPUs working together, tensor parallelism for a 70B model or pipeline-parallel training, PCIe's 128 GB/s inter-GPU bandwidth becomes the bottleneck. SXM5 with NVLink 4.0 at 900 GB/s stays out of the way.

When you see a provider advertising "H100 GPUs" without specifying SXM5 or PCIe, ask. The price difference is substantial, and the performance difference matters for most production workloads. All H100 clusters on GPUaaS.com specify the form factor explicitly.

Which form factor do you need?

Single-GPU inference on models under 30B parameters, embedding generation, or development environments: H100 PCIe is fine and 30% cheaper. Multi-GPU tensor-parallel inference, training runs above 13B parameters, or any workload needing NVLink bandwidth: H100 SXM5 only. Paying for SXM5 and running single-GPU workloads is the most common form of GPU overprovisioning.

◆ HIDDEN COSTS
The hidden costs that inflate the real bill

The GPU hourly rate is what gets quoted. It's not always what gets billed. On hyperscalers especially, four cost categories consistently catch teams off guard the first time they run a production workload.

01

Egress fees

AWS charges $0.09/GB for data leaving the region. Azure runs $0.08/GB. GCP charges $0.08 to $0.12/GB depending on destination. For a team moving model outputs, logs, or checkpoints at scale, egress can add $1,000 to $8,000/month to a mid-sized H100 deployment. It's buried in a separate pricing page and rarely factored into the initial budget. For the full breakdown, see the wholesale vs hyperscale pricing guide.

02

Attached storage

AWS EBS gp3 runs $0.08/GB/month. Azure Premium SSD runs $0.17/GB/month. GCP Persistent Disk runs $0.17/GB/month. A team storing 10 TB of model weights, datasets, and checkpoints pays $800 to $1,700/month in storage before billing a single GPU hour. Worth modelling before you sign a hyperscaler contract.

03

GPU utilisation waste

Cast AI's 2026 State of Kubernetes Optimisation Report measured average GPU utilisation across 23,000 production clusters at 5%. You're paying for 100% of H100 capacity and using 5% of it. At ~$2.50/hr, that's an effective cost of $50/hr for the compute you're actually producing. Continuous batching, proper vLLM configuration, and right-sizing clusters to workload get utilisation to 70 to 85%.

04

Support tiers

AWS Business Support starts at 10% of monthly bill, minimum $100/month. Enterprise Support starts at 10% on the first $150K of monthly spend, minimum $15,000/month. A team running $50K/month of H100 compute on AWS pays $5,000/month in support fees before a single call is made. Worth checking the support tier terms before signing.

⚡ Model total cost, not just the GPU rate

Before committing to any GPU contract, build a total cost model that includes egress volume, storage requirements, support tier costs, and your realistic utilisation rate. The compute line item is visible. The rest isn't, until the bill arrives.

According to Cast AI's 2026 State of Kubernetes Optimisation Report, average GPU utilisation across 23,000 measured production clusters sits at 5%, meaning most teams are paying full H100 rates for 95% idle hardware.

◆ SHORT-TERM VS LONG-TERM
Short-term vs long-term contracts: when each makes sense

GPUaaS.com is contract-based, with both short-term and long-term contracts available. Longer-term commits unlock meaningfully better rates on H100 SXM5 clusters. The break-even utilisation rate is ~68%. If your cluster runs above 68% average utilisation, a longer commit saves money. Below that, shorter-term commits cost less in total. Most production inference clusters run at 70 to 90% utilisation once properly optimised. Most development and research clusters run at 20 to 50%.

One thing worth noting: GPUaaS.com's commit terms are shorter and more flexible than the 1 to 3-year commitments hyperscalers require to unlock their best reserved rates. You can start on a shorter contract and extend as your workload matures, without locking in multi-year spend upfront.

The short-term vs long-term decision is covered in full in the reserved vs on-demand GPU guide.

◆ COST PER TOKEN
What an H100 really costs per token

GPU-hour pricing is a procurement metric. For inference workloads, what you actually care about is cost per million tokens. The translation from GPU-hours to tokens depends on model size, quantisation, batch size, and GPU utilisation.

A single H100 SXM5 running Llama 3 70B at FP8 with continuous batching achieves roughly 21,000 tokens per second at 80% GPU utilisation. At ~$2.50/hr through GPUaaS.com, that works out to roughly $0.021 per 1,000 tokens, or ~$21 per million tokens. Running the same workload at the AWS 1-year Savings Plan rate (~$3.78/hr) gives you ~$0.032 per 1,000 tokens, about 51% more expensive per token for identical output.

GPUaaS.com (contract)

~$0.021
per 1,000 tokens
H100 SXM5 at ~$2.50/hr, 80% util

GPUaaS.com (long-term)

Lower
per 1,000 tokens
H100 SXM5, long-term contract, 80% util

AWS (1-yr Savings Plan)

~$0.032
per 1,000 tokens
p5.48xlarge at ~$3.78/hr, 80% util

These numbers assume 80% GPU utilisation. At 5% utilisation (the industry average), your effective cost per token is 16x higher regardless of provider. Fixing utilisation through proper inference stack configuration is the highest-ROI cost reduction available to most teams. See the KV cache inference cost guide for the optimisation playbook.

The tokenmaxxing context

Teams running their own H100 inference clusters have a fixed cost structure that doesn't scale with token volume. Agentic workloads burning through enterprise AI budgets at API pricing rates cost a fraction of that on dedicated H100 clusters. See the full analysis in the enterprise AI cost breakdown.

◆ H100 VS ALTERNATIVES
H100 vs H200 vs B200: which one to choose

The H100 sits in the middle of GPUaaS.com's current GPU lineup. The H200 adds 141 GB of HBM3e on top of the same Hopper architecture and is the better choice for 70B+ models at long context. The B200 brings the Blackwell architecture and 192 GB HBM3e, optimised for 405B+ inference and frontier training. The decision comes down to whether your workload actually saturates H100's 80 GB and 3.35 TB/s, or whether you're paying for headroom you don't use.

H100 SXM5

~$2.50/GPU/hr
GPUaaS.com contract
Best for 70B inference at standard context, multi-GPU training up to 30B, FP8 workloads. 80 GB HBM3, 3.35 TB/s bandwidth. The sweet spot for most production inference teams in 2026.

H200 SXM

~$3.00/GPU/hr
GPUaaS.com contract
Best for 70B+ inference at high concurrency, 32K+ context, 405B model serving (quantised). 141 GB HBM3e, 4.8 TB/s bandwidth. Worth it when your workload genuinely needs more memory than H100 offers.

B200 SXM

~$4.50/GPU/hr
GPUaaS.com contract
Best for 405B+ inference at scale, MoE serving, frontier training. 192 GB HBM3e, Blackwell architecture, FP4 support. Only worth it if you genuinely need Blackwell-class throughput.

For the full decision framework on when H200 beats H100, see the H200 vs H100 rental guide. For the full three-way comparison including B200, see the H100 vs H200 vs B200 comparison.

Get an H100 cluster quote from GPUaaS.com

H100 SXM5 clusters from ~$2.50/GPU/hr. Short-term and long-term contracts, without the multi-year lock-in. Quote within 24 hours.

See how GPUaaS.com works →
◆ FAQ
Frequently asked questions

An H100 SXM5 starts from ~$2.50/GPU/hr through GPUaaS.com on a contract basis. On AWS, the p5.48xlarge with a 1-year Savings Plan runs ~$3.78/GPU/hr. AWS on-demand is $6.88/GPU/hr. GPUaaS.com's contract terms are shorter and more flexible than the commitments AWS requires to access Savings Plan rates. H100 PCIe runs 25 to 35% cheaper than SXM5 but delivers lower memory bandwidth and inter-GPU connectivity.

The silicon is identical. Comparing contract to contract (GPUaaS.com vs AWS 1-year Savings Plan), GPUaaS.com runs ~51% cheaper per GPU/hr. The cost difference comes from what hyperscalers layer on top: platform fees, compliance infrastructure, support tier pricing, and egress charges. GPUaaS.com connects buyers directly to vetted GPU cloud compute providers.

H100 PCIe typically costs 25 to 35% less per hour than H100 SXM5. The tradeoff is memory bandwidth (2.0 TB/s vs 3.35 TB/s) and inter-GPU connectivity (PCIe Gen5 at ~128 GB/s vs NVLink 4.0 at 900 GB/s). For single-GPU inference on sub-30B models, H100 PCIe is sufficient and cheaper. For multi-GPU workloads needing tensor parallelism or training above 13B parameters, H100 SXM5 is the correct choice.

For sub-70B model inference at standard context lengths, H100 SXM5 at ~$2.50/hr delivers near-identical throughput to H200 at ~$3.00/hr for most workloads. The H200's advantage is its 141 GB HBM3e, which matters for 70B+ models at long context windows or high batch sizes. If your model fits comfortably in H100's 80 GB with headroom for KV cache, H100 is the better value. See the H200 vs H100 rental guide for the full framework.

At ~$2.50/hr through GPUaaS.com with an H100 SXM5 running Llama 3 70B at FP8 and 80% GPU utilisation, cost-per-token works out to roughly $0.021 per 1,000 tokens (~$21 per million tokens). At the AWS 1-year Savings Plan rate (~$3.78/hr), the same workload costs ~$0.032 per 1,000 tokens. Fixing GPU utilisation from 5% to 80% matters more than provider choice for most teams.

GPUaaS.com offers H100 SXM5 clusters from ~$2.50/GPU/hr on a contract basis, with both short-term and long-term contracts available. Submit a quote request at gpuaas.com/how-it-works and you'll hear back within 24 hours.

Last reviewed: June 2, 2026. GPUaaS.com pricing is indicative, contract-based, and quote-dependent on cluster size, contract length, and region. AWS p5.48xlarge on-demand rate from Vantage.sh (June 1, 2026, $55.04/hr total). AWS 1-year EC2 Instance Savings Plan rate is 45% off on-demand per official AWS pricing announcement (June 2025), giving ~$30.27/hr total (~$3.78/GPU/hr). Utilisation data from Cast AI 2026 State of Kubernetes Optimisation Report. Token throughput based on Llama 3 70B FP8 with continuous batching at 80% GPU utilisation.

Share this article:LinkedInX / TwitterCopy link
FIND THE BEST GPU DEAL

Get a wholesale GPU quote in a few hours

NVIDIA B200, H200, H100, A100, RTX Pro 6000 — N. America, EU, MEA, APAC. No buyer fees.

Related articles