An H100 SXM5 starts from ~$2.50/GPU/hr through GPUaaS.com on a short-term or long-term contract. The same GPU on AWS p5.48xlarge with a 1-year Savings Plan runs ~$3.78/GPU/hr. That's a real gap for identical silicon, and the headline rate is only part of the story. Egress fees, storage, support tiers, and utilisation rates all change what an H100 actually costs your team per useful output.
- H100 SXM5 starts from ~$2.50/GPU/hr through GPUaaS.com (contract-based, short-term and long-term). The AWS p5.48xlarge 1-year Savings Plan runs ~$3.78/GPU/hr. AWS on-demand list is $6.88/GPU/hr
- H100 PCIe costs 30 to 40% less than H100 SXM5 per hour but delivers 20 to 25% lower memory bandwidth (2.0 TB/s vs 3.35 TB/s). For memory-bound inference workloads, the price difference doesn't close the throughput gap
- GPUaaS.com's commit terms are shorter and more flexible than the 1 to 3-year commitments hyperscalers typically require to access meaningful discounts
- Hidden costs that don't show up in the headline rate: egress fees ($0.08 to $0.15/GB on hyperscalers), storage ($0.08 to $0.23/GB/month), and overprovisioning at 5% average GPU utilisation (Cast AI, 2026)
- At 80% utilisation running Llama 3 70B at FP8, an H100 SXM5 at ~$2.50/hr delivers roughly $0.021 per 1,000 tokens. The same workload on AWS at the 1-year Savings Plan rate (~$3.78/hr) costs ~$0.032 per 1,000 tokens
GPUaaS.com is contract-based, with both short-term and long-term commits available. Since GPUaaS.com doesn't sell on-demand access, the right comparison is contract vs contract: GPUaaS.com's rate against the AWS 1-year Savings Plan, which is the closest hyperscaler equivalent. For a full breakdown of how hyperscaler cost structures drive the gap, see the GPU pricing guide.
| Provider / billing mode | Instance / cluster | Per GPU/hr | vs GPUaaS.com |
|---|---|---|---|
| GPUaaS.com (contract) | 8x H100 SXM5 | ~$2.50/GPU/hr | Baseline |
| AWS (1-yr Savings Plan) | p5.48xlarge (8x H100 SXM5) | ~$3.78/GPU/hr | +51% |
| AWS (on-demand, for reference) | p5.48xlarge (8x H100 SXM5) | $6.88/GPU/hr | +175% |
AWS p5.48xlarge on-demand rate from Vantage.sh, June 1, 2026 ($55.04/hr total, 8 GPUs). 1-year EC2 Instance Savings Plan rate is 45% off on-demand per official AWS pricing announcement, June 2025 ($30.27/hr total, ~$3.78/GPU/hr). GPUaaS.com rate is indicative, contract-based, and quote-dependent on cluster size, contract length, and region.
The gap comes from cost structure, not silicon. Hyperscalers layer platform fees, compliance infrastructure, support tiers, and egress pricing on top of the raw compute cost. The H100 inside every one of these instances is identical NVIDIA hardware. For the full breakdown, see the wholesale vs hyperscale GPU pricing guide.
◆ On flexibility
GPUaaS.com offers both short-term and long-term contracts without the multi-year lock-in that hyperscaler Savings Plans and Reserved Instances require. AWS's 1-year Savings Plan is the shortest commitment to access a meaningful discount. GPUaaS.com's commit terms start shorter than that.
According to Vantage.sh data updated June 1, 2026, the AWS p5.48xlarge on-demand rate in us-east-1 is $55.04/hr for 8x H100 SXM5 GPUs. AWS's 1-year EC2 Instance Savings Plan brings this to ~$30.27/hr (~$3.78/GPU/hr), compared to ~$2.50/GPU/hr through GPUaaS.com on a contract basis.
Not all H100 GPUs are equal. NVIDIA ships two form factors: SXM5, which mounts directly to the HGX baseboard with NVLink 4.0 at 900 GB/s inter-GPU bandwidth, and PCIe, which slots into standard server motherboards. The specs differ in ways that matter for production workloads.
| Spec | H100 SXM5 | H100 PCIe |
|---|---|---|
| Memory bandwidth | 3.35 TB/s | 2.0 TB/s |
| GPU memory | 80 GB HBM3 | 80 GB HBM2e |
| FP8 TFLOPS | 3,958 TFLOPS | 3,026 TFLOPS |
| Inter-GPU interconnect | NVLink 4.0 (900 GB/s) | PCIe Gen5 (~128 GB/s) |
| TDP | 700W | 350W |
The PCIe form factor is 25 to 35% cheaper per hour and uses half the power. For single-GPU inference on sub-30B models, that's a reasonable trade. Once your job needs multiple H100 GPUs working together, tensor parallelism for a 70B model or pipeline-parallel training, PCIe's 128 GB/s inter-GPU bandwidth becomes the bottleneck. SXM5 with NVLink 4.0 at 900 GB/s stays out of the way.
When you see a provider advertising "H100 GPUs" without specifying SXM5 or PCIe, ask. The price difference is substantial, and the performance difference matters for most production workloads. All H100 clusters on GPUaaS.com specify the form factor explicitly.
Which form factor do you need?
Single-GPU inference on models under 30B parameters, embedding generation, or development environments: H100 PCIe is fine and 30% cheaper. Multi-GPU tensor-parallel inference, training runs above 13B parameters, or any workload needing NVLink bandwidth: H100 SXM5 only. Paying for SXM5 and running single-GPU workloads is the most common form of GPU overprovisioning.
According to Cast AI's 2026 State of Kubernetes Optimisation Report, average GPU utilisation across 23,000 measured production clusters sits at 5%, meaning most teams are paying full H100 rates for 95% idle hardware.
GPUaaS.com is contract-based, with both short-term and long-term contracts available. Longer-term commits unlock meaningfully better rates on H100 SXM5 clusters. The break-even utilisation rate is ~68%. If your cluster runs above 68% average utilisation, a longer commit saves money. Below that, shorter-term commits cost less in total. Most production inference clusters run at 70 to 90% utilisation once properly optimised. Most development and research clusters run at 20 to 50%.
One thing worth noting: GPUaaS.com's commit terms are shorter and more flexible than the 1 to 3-year commitments hyperscalers require to unlock their best reserved rates. You can start on a shorter contract and extend as your workload matures, without locking in multi-year spend upfront.
The short-term vs long-term decision is covered in full in the reserved vs on-demand GPU guide.
GPU-hour pricing is a procurement metric. For inference workloads, what you actually care about is cost per million tokens. The translation from GPU-hours to tokens depends on model size, quantisation, batch size, and GPU utilisation.
A single H100 SXM5 running Llama 3 70B at FP8 with continuous batching achieves roughly 21,000 tokens per second at 80% GPU utilisation. At ~$2.50/hr through GPUaaS.com, that works out to roughly $0.021 per 1,000 tokens, or ~$21 per million tokens. Running the same workload at the AWS 1-year Savings Plan rate (~$3.78/hr) gives you ~$0.032 per 1,000 tokens, about 51% more expensive per token for identical output.
These numbers assume 80% GPU utilisation. At 5% utilisation (the industry average), your effective cost per token is 16x higher regardless of provider. Fixing utilisation through proper inference stack configuration is the highest-ROI cost reduction available to most teams. See the KV cache inference cost guide for the optimisation playbook.
The tokenmaxxing context
Teams running their own H100 inference clusters have a fixed cost structure that doesn't scale with token volume. Agentic workloads burning through enterprise AI budgets at API pricing rates cost a fraction of that on dedicated H100 clusters. See the full analysis in the enterprise AI cost breakdown.
The H100 sits in the middle of GPUaaS.com's current GPU lineup. The H200 adds 141 GB of HBM3e on top of the same Hopper architecture and is the better choice for 70B+ models at long context. The B200 brings the Blackwell architecture and 192 GB HBM3e, optimised for 405B+ inference and frontier training. The decision comes down to whether your workload actually saturates H100's 80 GB and 3.35 TB/s, or whether you're paying for headroom you don't use.
For the full decision framework on when H200 beats H100, see the H200 vs H100 rental guide. For the full three-way comparison including B200, see the H100 vs H200 vs B200 comparison.
Get an H100 cluster quote from GPUaaS.com
H100 SXM5 clusters from ~$2.50/GPU/hr. Short-term and long-term contracts, without the multi-year lock-in. Quote within 24 hours.
See how GPUaaS.com works →Last reviewed: June 2, 2026. GPUaaS.com pricing is indicative, contract-based, and quote-dependent on cluster size, contract length, and region. AWS p5.48xlarge on-demand rate from Vantage.sh (June 1, 2026, $55.04/hr total). AWS 1-year EC2 Instance Savings Plan rate is 45% off on-demand per official AWS pricing announcement (June 2025), giving ~$30.27/hr total (~$3.78/GPU/hr). Utilisation data from Cast AI 2026 State of Kubernetes Optimisation Report. Token throughput based on Llama 3 70B FP8 with continuous batching at 80% GPU utilisation.



