In 2026, renting an H100 SXM costs $1.49–$3.50/GPU/hr on specialist clouds, an H200 starts at $3.72/hr on-demand, and a B200 runs $2.12–$6.04/hr depending on provider tier and contract length. Cost-per-token tells a completely different story than cost-per-hour.
- H100 SXM carries 80 GB HBM3 at 3.35 TB/s. It remains the most widely available GPU for 7B–70B model workloads in 2026, listed across 36+ providers with on-demand from $1.49/hr (IntuitionLabs, May 2026)
- H200 triples H100's effective KV-cache headroom with 141 GB HBM3e at 4.8 TB/s. Same 700W TDP, drop-in rack replacement with no infrastructure changes
- B200 delivers ~2.5× H200 inference throughput and supports FP4 precision. It requires liquid cooling; on-demand median ~$5.50/hr across 23 providers as of May 2026 (Spheron, May 2026)
- B200 reserved 36-month pricing has dropped to $2.25/GPU/hr across 23 providers as of April 2026; spot pricing from $2.12/hr
- For Llama 3.3 70B in FP16 (~140 GB VRAM), a single H200 replaces two H100s, cutting inter-GPU overhead and simplifying serving infrastructure
- B200 GPU rental index surged 24% in March 2026 before pulling back. Reserved contracts at $2.25/hr lock out that volatility (Silicon Data, March 2026)
Three GPUs sit at the top of NVIDIA's data-centre lineup: H100, H200, and B200. On a spec sheet they look like a clean generational ladder. In practice, each one optimises for a different bottleneck: compute density (H100), memory capacity (H200), or raw throughput at FP4 precision (B200). Picking the wrong one doesn't just cost money; it can add weeks of wall-clock time to a training run or force an unplanned cluster migration mid-project.
This guide compares all three on the metrics that actually matter for GPU rental decisions: memory, bandwidth, inference throughput, power draw, and effective cost-per-workload. For live cluster pricing, see the GPUaaS.com cluster page.
In this article
- 01Full Spec Comparison: H100 vs H200 vs B200
- 02Architecture Differences That Change Workload Economics
- 03H100 vs H200 vs B200 Rental Pricing in 2026
- 042026 GPU Supply: What the Shortage Means for Procurement
- 05Workload Matching: Which GPU for Which Job
- 06Power Draw and Cooling Requirements
- 07Head-to-Head Scorecard
- 08Frequently Asked Questions
Full Spec Comparison: H100 vs H200 vs B200 SXM GPUs in 2026
The numbers below use SXM form factors throughout. The high-bandwidth interconnect variant used in 8-GPU cluster nodes. PCIe variants are cheaper per card but sacrifice NVLink for multi-GPU jobs.
The H200's 76% VRAM increase over the H100, delivered at identical 700W TDP, makes it the only Hopper-generation upgrade that pays for itself purely through reduced multi-GPU overhead on 70B+ model serving, according to GPUaaS.com infrastructure research.
According to GPUaaS.com infrastructure research, the H200 SXM delivers a 76% VRAM increase over the H100 at identical 700W TDP, making it the only Hopper-generation upgrade that eliminates multi-GPU overhead on 70B+ model serving without any rack or cooling changes.
Architecture Differences Between H100, H200, and B200 That Change Workload Economics
The H100 and H200 share the same Hopper die. The H200 is a memory refresh, not a compute upgrade: same 1,979 TFLOPS FP8, same Transformer Engine, same NVLink 4th Gen fabric. The meaningful change is the jump from 80 GB HBM3 to 141 GB HBM3e, which expands usable KV-cache for long-context workloads and eliminates the need for tensor parallelism on models up to 100B parameters.
The B200 is a different chip entirely. NVIDIA's Blackwell architecture packs 208 billion transistors across a dual-die chiplet design, 2.6x more silicon than the H100. Fifth-generation Tensor Cores add native FP4 precision, and a second-generation Transformer Engine handles per-layer quantisation between FP4 and FP8 automatically during inference. For serving DeepSeek V3-class 671B MoE models, the B200's 192 GB VRAM handles the full model on four cards where the H100 requires nine.
2.5x
B200 inference throughput advantage over H200 on FP8 workloads, rising to ~4.5x when FP4 is enabled
NVIDIA Blackwell architecture brief · 2025
B200 on-demand rates are volatile. The rental index moved 24% in March 2026 alone before pulling back, according to Silicon Data's SDB200RT benchmark. Reserved contracts at $2.25/hr eliminate that risk for teams that can commit to a term.
H100 vs H200 vs B200 GPU Rental Pricing in 2026: Specialist Cloud vs Hyperscaler
May 2026 on-demand rates across specialist clouds. Hyperscaler rates (AWS, Azure, GCP) are included for comparison, running 2-5x higher for identical hardware. Sources: Spheron GPU pricing index, IntuitionLabs H100 tracker.
Enterprises running 8xH200 reserved clusters through wholesale GPU procurement save up to 30% vs hyperscaler on-demand rates, collapsing the effective H200 cost well below H100 hyperscaler pricing. See why wholesale GPU pricing beats hyperscale for the full breakdown.
2026 GPU Supply Context: What the H100, H200, and B200 Shortage Means for Procurement
The GPU rental market in early 2026 is the tightest it has been since 2023. H100 and H200 contract pricing climbed roughly 40% between October 2025 and March 2026, driven by HBM3e cost pass-throughs from Samsung and SK Hynix, surging multi-agent workload demand, and hyperscaler forward purchases that consumed most Blackwell allocation through Q3 2026. According to SemiAnalysis, finding 8 nodes of H100 or H200 capacity on short notice is no longer routine. Half of tracked providers report no Hopper GPU capacity coming off contract at all.
The B300 (Blackwell Ultra) began shipping in early 2026 with 288 GB HBM3e at $4.95-$18.00/hr across early-access providers. Only 6% of B300 listings report confirmed stock as of May 2026. Teams evaluating B200 procurement should factor this into contract-length decisions: locking a 12-month B200 reserved at $2.25/hr now may be preferable to waiting for B300 supply to normalise in Q3-Q4 2026.
According to GPUaaS.com wholesale procurement data, reserved H200 clusters booked through vetted specialist providers remain the most predictable path to sustained capacity in 2026, with access locked at contracted rates regardless of spot market swings.
According to GPUaaS.com wholesale procurement data, H100 and H200 reserved cluster availability in US-East and EU-West regions tightened by roughly 40% between October 2025 and March 2026, with over half of tracked specialist providers reporting no Hopper capacity coming off contract in Q2 2026.
Workload Matching: Which GPU for H100, H200, or B200 for Which AI Job in 2026
The GPU selection decision collapses to three variables: model size, precision requirements, and whether you're compute-bound or memory-bound.
Best for: Fine-tuning 7B-70B models, QLoRA experiments, batch inference under 80 GB, HPC workloads needing mature FP64 support.
Widely available. Mature CUDA stack. Lowest barrier to entry for teams moving from A100.
Best for: 70B-100B inference at scale, 32K+ context windows, multi-model colocation, memory-bottlenecked training runs.
Drop-in H100 replacement. No infrastructure changes. Often cheaper than two H100s for 70B serving.
Best for: 100B+ model inference, frontier pre-training, FP4 production serving, 128K+ context Llama 4 / DeepSeek V3 deployments.
Requires liquid cooling. Highest on-demand cost-per-hour but lowest cost-per-token on massive models.
A team serving Llama 3.3 70B in FP16 needs approximately 140 GB of VRAM. A single H200 at $3.72-$6.00/hr handles what would otherwise require two H100s billed at double the rate. That model-size crossover is why H200 has become the default recommendation for production inference teams running open-weight 70B models in 2026.
Power Draw and Cooling Requirements for H100, H200, and B200 GPU Clusters
The H100 and H200 share a 700W TDP and use identical thermal management. Any rack built for H100 runs H200 without modification. This matters for teams planning an incremental upgrade path: swapping H100 nodes for H200 requires no infrastructure changes, no cooling redesign, and no re-certification of existing HGX trays.
The B200 breaks from that pattern. Its 1,000W TDP is a 43% increase over Hopper, and dense 8-GPU B200 racks can exceed 50 kW, well beyond what standard air-cooled infrastructure handles efficiently. Liquid cooling is a reliability requirement, not an option, for sustained B200 workloads at full capacity. The B300 pushes this further to 1,400W per GPU, making cooling infrastructure an even more critical factor for next-generation procurement planning.
⚠ Watch out
Not all B200 cloud listings specify cooling tier. If a provider's B200 instance runs in an air-cooled facility, sustained FP4 workloads will trigger thermal throttling, reducing effective throughput below the advertised spec. Always confirm the cooling infrastructure before signing a reserved B200 contract.
Head-to-Head Scorecard: H100 vs H200 vs B200 GPU Comparison 2026
The H200 is the most cost-efficient GPU for enterprise teams running production 70B inference in 2026, delivering H100-class compute with 76% more VRAM and no infrastructure overhead. Explore H200 cluster options on GPUaaS.com, compare H200 vs B200 cluster configurations, or view all available GPU clusters to get started.
Frequently Asked Questions: H100 vs H200 vs B200 GPU Rental in 2026
Last reviewed: May 19, 2026. GPU pricing and availability data from [1] Spheron GPU pricing index (May 14, 2026), [2] IntuitionLabs H100 tracker, and [3] Silicon Data SDB200RT benchmark. Direct wholesale provider quotes via GPUaaS.com. Compare H100, H200, and B200 cluster options at GPUaaS.com.



