How much does an H100 GPU cost per hour in 2026?

An H100 SXM5 starts from ~$2.50/GPU/hr through GPUaaS.com on a contract basis. AWS p5.48xlarge with a 1-year Savings Plan runs ~$3.78/GPU/hr. AWS on-demand is $6.88/GPU/hr. GPUaaS.com's contract terms are shorter and more flexible than what AWS requires to access Savings Plan rates.

Why is H100 cheaper through GPUaaS.com than on AWS?

The silicon is identical. Comparing contract to contract, GPUaaS.com runs ~51% cheaper per GPU/hr than the AWS 1-year Savings Plan rate. The cost difference comes from hyperscaler platform fees, compliance infrastructure, support tier pricing, and egress charges stacked on top of raw compute cost.

What is the difference between H100 SXM5 and H100 PCIe pricing?

H100 PCIe costs 25 to 35% less per hour than H100 SXM5 but delivers lower memory bandwidth (2.0 TB/s vs 3.35 TB/s) and slower inter-GPU connectivity (PCIe Gen5 vs NVLink 4.0 at 900 GB/s). For single-GPU inference on sub-30B models, H100 PCIe is fine. For multi-GPU workloads, H100 SXM5 is required.

What does an H100 cost per million tokens?

At ~$2.50/hr through GPUaaS.com running Llama 3 70B at FP8 with 80% GPU utilisation, an H100 SXM5 delivers roughly $0.021 per 1,000 tokens (~$21 per million tokens). The AWS 1-year Savings Plan rate of ~$3.78/hr works out to ~$0.032 per 1,000 tokens.

Is H100 or H200 better value for inference in 2026?

For sub-70B model inference at standard context lengths, H100 SXM5 at ~$2.50/hr delivers near-identical throughput to H200 at ~$3.00/hr for most workloads. H200 wins when your model exceeds H100's 80 GB VRAM, typically at 70B+ parameters with long context windows or large batch sizes.

What an H100 Really Costs Per Hour in 2026

An H100 SXM5 starts from ~$2.50/GPU/hr through GPUaaS.com on a short-term or long-term contract. The same GPU on AWS p5.48xlarge with a 1-year Savings Plan runs ~$3.78/GPU/hr. That's a real gap for identical silicon, and the headline rate is only part of the story. Egress fees, storage, support tiers, and utilisation rates all change what an H100 actually costs your team per useful output.

Key takeaways

H100 SXM5 starts from ~$2.50/GPU/hr through GPUaaS.com (contract-based, short-term and long-term). The AWS p5.48xlarge 1-year Savings Plan runs ~$3.78/GPU/hr. AWS on-demand list is $6.88/GPU/hr
H100 PCIe costs 30 to 40% less than H100 SXM5 per hour but delivers 20 to 25% lower memory bandwidth (2.0 TB/s vs 3.35 TB/s). For memory-bound inference workloads, the price difference doesn't close the throughput gap
GPUaaS.com's commit terms are shorter and more flexible than the 1 to 3-year commitments hyperscalers typically require to access meaningful discounts
Hidden costs that don't show up in the headline rate: egress fees ($0.08 to $0.15/GB on hyperscalers), storage ($0.08 to $0.23/GB/month), and overprovisioning at 5% average GPU utilisation (Cast AI, 2026)
At 80% utilisation running Llama 3 70B at FP8, an H100 SXM5 at ~$2.50/hr delivers roughly $0.021 per 1,000 tokens. The same workload on AWS at the 1-year Savings Plan rate (~$3.78/hr) costs ~$0.032 per 1,000 tokens

In this article

01H100 contract rates: GPUaaS.com vs hyperscaler reserved 02H100 SXM5 vs PCIe: which one you're actually renting 03The hidden costs that inflate the real bill 04Short-term vs long-term contracts: when each makes sense 05What an H100 really costs per token 06H100 vs H200 vs B200: which one to choose 07Frequently asked questions

◆ CONTRACT RATES

H100 contract rates: GPUaaS.com vs hyperscaler reserved

GPUaaS.com is contract-based, with both short-term and long-term commits available. Since GPUaaS.com doesn't sell on-demand access, the right comparison is contract vs contract: GPUaaS.com's rate against the AWS 1-year Savings Plan, which is the closest hyperscaler equivalent. For a full breakdown of how hyperscaler cost structures drive the gap, see the GPU pricing guide.

Provider / billing mode	Instance / cluster	Per GPU/hr	vs GPUaaS.com
GPUaaS.com (contract)	8x H100 SXM5	~$2.50/GPU/hr	Baseline
AWS (1-yr Savings Plan)	p5.48xlarge (8x H100 SXM5)	~$3.78/GPU/hr	+51%
AWS (on-demand, for reference)	p5.48xlarge (8x H100 SXM5)	$6.88/GPU/hr	+175%

AWS p5.48xlarge on-demand rate from Vantage.sh, June 1, 2026 ($55.04/hr total, 8 GPUs). 1-year EC2 Instance Savings Plan rate is 45% off on-demand per official AWS pricing announcement, June 2025 ($30.27/hr total, ~$3.78/GPU/hr). GPUaaS.com rate is indicative, contract-based, and quote-dependent on cluster size, contract length, and region.

The gap comes from cost structure, not silicon. Hyperscalers layer platform fees, compliance infrastructure, support tiers, and egress pricing on top of the raw compute cost. The H100 inside every one of these instances is identical NVIDIA hardware. For the full breakdown, see the wholesale vs hyperscale GPU pricing guide.

◆ On flexibility

GPUaaS.com offers both short-term and long-term contracts without the multi-year lock-in that hyperscaler Savings Plans and Reserved Instances require. AWS's 1-year Savings Plan is the shortest commitment to access a meaningful discount. GPUaaS.com's commit terms start shorter than that.

According to Vantage.sh data updated June 1, 2026, the AWS p5.48xlarge on-demand rate in us-east-1 is $55.04/hr for 8x H100 SXM5 GPUs. AWS's 1-year EC2 Instance Savings Plan brings this to ~$30.27/hr (~$3.78/GPU/hr), compared to ~$2.50/GPU/hr through GPUaaS.com on a contract basis.

◆ SXM5 VS PCIe

H100 SXM5 vs PCIe: which one you're actually renting

Not all H100 GPUs are equal. NVIDIA ships two form factors: SXM5, which mounts directly to the HGX baseboard with NVLink 4.0 at 900 GB/s inter-GPU bandwidth, and PCIe, which slots into standard server motherboards. The specs differ in ways that matter for production workloads.

Spec	H100 SXM5	H100 PCIe
Memory bandwidth	3.35 TB/s	2.0 TB/s
GPU memory	80 GB HBM3	80 GB HBM2e
FP8 TFLOPS	3,958 TFLOPS	3,026 TFLOPS
Inter-GPU interconnect	NVLink 4.0 (900 GB/s)	PCIe Gen5 (~128 GB/s)
TDP	700W	350W

The PCIe form factor is 25 to 35% cheaper per hour and uses half the power. For single-GPU inference on sub-30B models, that's a reasonable trade. Once your job needs multiple H100 GPUs working together, tensor parallelism for a 70B model or pipeline-parallel training, PCIe's 128 GB/s inter-GPU bandwidth becomes the bottleneck. SXM5 with NVLink 4.0 at 900 GB/s stays out of the way.

When you see a provider advertising "H100 GPUs" without specifying SXM5 or PCIe, ask. The price difference is substantial, and the performance difference matters for most production workloads. All H100 clusters on GPUaaS.com specify the form factor explicitly.

Which form factor do you need?

Single-GPU inference on models under 30B parameters, embedding generation, or development environments: H100 PCIe is fine and 30% cheaper. Multi-GPU tensor-parallel inference, training runs above 13B parameters, or any workload needing NVLink bandwidth: H100 SXM5 only. Paying for SXM5 and running single-GPU workloads is the most common form of GPU overprovisioning.

◆ HIDDEN COSTS

The hidden costs that inflate the real bill

The GPU hourly rate is what gets quoted. It's not always what gets billed. On hyperscalers especially, four cost categories consistently catch teams off guard the first time they run a production workload.

Egress fees

AWS charges $0.09/GB for data leaving the region. Azure runs $0.08/GB. GCP charges $0.08 to $0.12/GB depending on destination. For a team moving model outputs, logs, or checkpoints at scale, egress can add $1,000 to $8,000/month to a mid-sized H100 deployment. It's buried in a separate pricing page and rarely factored into the initial budget. For the full breakdown, see the wholesale vs hyperscale pricing guide.

Attached storage

AWS EBS gp3 runs $0.08/GB/month. Azure Premium SSD runs $0.17/GB/month. GCP Persistent Disk runs $0.17/GB/month. A team storing 10 TB of model weights, datasets, and checkpoints pays $800 to $1,700/month in storage before billing a single GPU hour. Worth modelling before you sign a hyperscaler contract.

GPU utilisation waste

Cast AI's 2026 State of Kubernetes Optimisation Report measured average GPU utilisation across 23,000 production clusters at 5%. You're paying for 100% of H100 capacity and using 5% of it. At ~$2.50/hr, that's an effective cost of $50/hr for the compute you're actually producing. Continuous batching, proper vLLM configuration, and right-sizing clusters to workload get utilisation to 70 to 85%.

Support tiers

AWS Business Support starts at 10% of monthly bill, minimum $100/month. Enterprise Support starts at 10% on the first $150K of monthly spend, minimum $15,000/month. A team running $50K/month of H100 compute on AWS pays $5,000/month in support fees before a single call is made. Worth checking the support tier terms before signing.

⚡ Model total cost, not just the GPU rate

Before committing to any GPU contract, build a total cost model that includes egress volume, storage requirements, support tier costs, and your realistic utilisation rate. The compute line item is visible. The rest isn't, until the bill arrives.

According to Cast AI's 2026 State of Kubernetes Optimisation Report, average GPU utilisation across 23,000 measured production clusters sits at 5%, meaning most teams are paying full H100 rates for 95% idle hardware.

◆ SHORT-TERM VS LONG-TERM

Short-term vs long-term contracts: when each makes sense

GPUaaS.com is contract-based, with both short-term and long-term contracts available. Longer-term commits unlock meaningfully better rates on H100 SXM5 clusters. The break-even utilisation rate is ~68%. If your cluster runs above 68% average utilisation, a longer commit saves money. Below that, shorter-term commits cost less in total. Most production inference clusters run at 70 to 90% utilisation once properly optimised. Most development and research clusters run at 20 to 50%.

One thing worth noting: GPUaaS.com's commit terms are shorter and more flexible than the 1 to 3-year commitments hyperscalers require to unlock their best reserved rates. You can start on a shorter contract and extend as your workload matures, without locking in multi-year spend upfront.

The short-term vs long-term decision is covered in full in the reserved vs on-demand GPU guide.

◆ COST PER TOKEN

What an H100 really costs per token

GPU-hour pricing is a procurement metric. For inference workloads, what you actually care about is cost per million tokens. The translation from GPU-hours to tokens depends on model size, quantisation, batch size, and GPU utilisation.

A single H100 SXM5 running Llama 3 70B at FP8 with continuous batching achieves roughly 21,000 tokens per second at 80% GPU utilisation. At ~$2.50/hr through GPUaaS.com, that works out to roughly $0.021 per 1,000 tokens, or ~$21 per million tokens. Running the same workload at the AWS 1-year Savings Plan rate (~$3.78/hr) gives you ~$0.032 per 1,000 tokens, about 51% more expensive per token for identical output.

GPUaaS.com (contract)

~$0.021

per 1,000 tokens

H100 SXM5 at ~$2.50/hr, 80% util

GPUaaS.com (long-term)

Lower

per 1,000 tokens

H100 SXM5, long-term contract, 80% util

AWS (1-yr Savings Plan)

~$0.032

per 1,000 tokens

p5.48xlarge at ~$3.78/hr, 80% util

These numbers assume 80% GPU utilisation. At 5% utilisation (the industry average), your effective cost per token is 16x higher regardless of provider. Fixing utilisation through proper inference stack configuration is the highest-ROI cost reduction available to most teams. See the KV cache inference cost guide for the optimisation playbook.

The tokenmaxxing context

Teams running their own H100 inference clusters have a fixed cost structure that doesn't scale with token volume. Agentic workloads burning through enterprise AI budgets at API pricing rates cost a fraction of that on dedicated H100 clusters. See the full analysis in the enterprise AI cost breakdown.

◆ H100 VS ALTERNATIVES

H100 vs H200 vs B200: which one to choose

The H100 sits in the middle of GPUaaS.com's current GPU lineup. The H200 adds 141 GB of HBM3e on top of the same Hopper architecture and is the better choice for 70B+ models at long context. The B200 brings the Blackwell architecture and 192 GB HBM3e, optimised for 405B+ inference and frontier training. The decision comes down to whether your workload actually saturates H100's 80 GB and 3.35 TB/s, or whether you're paying for headroom you don't use.

H100 SXM5

~$2.50/GPU/hr

GPUaaS.com contract

Best for 70B inference at standard context, multi-GPU training up to 30B, FP8 workloads. 80 GB HBM3, 3.35 TB/s bandwidth. The sweet spot for most production inference teams in 2026.

H200 SXM

~$3.00/GPU/hr

GPUaaS.com contract

Best for 70B+ inference at high concurrency, 32K+ context, 405B model serving (quantised). 141 GB HBM3e, 4.8 TB/s bandwidth. Worth it when your workload genuinely needs more memory than H100 offers.

B200 SXM

~$4.50/GPU/hr

GPUaaS.com contract

Best for 405B+ inference at scale, MoE serving, frontier training. 192 GB HBM3e, Blackwell architecture, FP4 support. Only worth it if you genuinely need Blackwell-class throughput.

For the full decision framework on when H200 beats H100, see the H200 vs H100 rental guide. For the full three-way comparison including B200, see the H100 vs H200 vs B200 comparison.

Get an H100 cluster quote from GPUaaS.com

H100 SXM5 clusters from ~$2.50/GPU/hr. Short-term and long-term contracts, without the multi-year lock-in. Quote within 24 hours.

See how GPUaaS.com works →

◆ FAQ

Frequently asked questions

Last reviewed: June 2, 2026. GPUaaS.com pricing is indicative, contract-based, and quote-dependent on cluster size, contract length, and region. AWS p5.48xlarge on-demand rate from Vantage.sh (June 1, 2026, $55.04/hr total). AWS 1-year EC2 Instance Savings Plan rate is 45% off on-demand per official AWS pricing announcement (June 2025), giving ~$30.27/hr total (~$3.78/GPU/hr). Utilisation data from Cast AI 2026 State of Kubernetes Optimisation Report. Token throughput based on Llama 3 70B FP8 with continuous batching at 80% GPU utilisation.

What an H100 Really Costs Per Hour in 2026

Get a wholesale GPU quote in a few hours

Related articles

How GPUaaS Gives Buyers Early Access to GPU Capacity Before It Hits the Open Market

How GPUaaS Connects Enterprise GPU Clusters to Vetted Buyers

How GPUaaS.com Gives You Transparent Wholesale GPU Pricing