BlogThe Real TCO of a GPU Cluster in 2026

Procurement

Hardware is 25-35% of GPU cluster TCO. A 100-GPU H100 cluster runs $3M in hardware but $8.6M over 5 years all-in. Here's the line-by-line for 2026.

The Real TCO of a GPU Cluster in 2026

GPUaaS.com Team
GPUaaS.com Team
Procurement Research
June 4, 2026
Blog post cover image

A 100-GPU H100 cluster looks like a $3M decision on the spreadsheet. The actual five-year cost is closer to $8.6M once you add power, cooling, networking, staff, and maintenance, according to Introl's April 2026 TCO model. Hardware is only 25-35% of GPU cluster TCO. The rest is what kills budgets that didn't account for it. This is the line-by-line breakdown of what a real GPU cluster costs to run, why hyperscaler "you don't see it" pricing isn't free either, and where GPUaaS.com lands against both.

Key takeaways
  • GPU hardware is 25-35% of 5-year TCO on a self-built cluster. Power, cooling, networking, staff, and maintenance make up the other 65-75% (Introl, April 2026; KAD, April 2026)
  • A 100-GPU H100 cluster runs ~$3M in hardware but $8.6M over 5 years all-in. Gartner: 73% of enterprises underestimate AI infrastructure costs by failing to account for opex (Introl, April 2026)
  • On-prem effective rate for an 8-GPU H100 node is ~$4.75/GPU/hr before networking and facility lease. Spheron's on-demand H100 is $2.90/hr (Spheron, April 2026)
  • A 256-GPU B200 cluster owned and colocated in Texas costs ~$15.5M over 3 years with loan + opex, or ~$2.88/GPU/hr at 80% utilisation. Buying beats renting above ~70% utilisation (American Compute, March 2026)
  • Hyperscaler "all-in" pricing isn't free either. Egress, storage, support tiers, and reserved-but-idle capacity stack 20-40% on top of the compute rate
  • GPUaaS.com offers both short-term and long-term contracts at rates hyperscalers won't offer you. H100 from ~$2.50/GPU/hr, H200 from ~$3.00/GPU/hr, B200 and B300 from ~$4.50/GPU/hr, with no multi-year lock-in

GPU cluster TCO conversations get derailed at the same point every time: someone quotes the hardware sticker price, someone else quotes a per-hour cloud rate, and the comparison breaks because they aren't comparing the same things. The real TCO of a cluster is hardware plus power plus cooling plus networking plus storage plus staff plus depreciation plus the cost of getting it wrong. This guide walks through each line item with 2026 numbers and shows where the contract-based GPU rental model fits. For the broader pricing structure, see the GPU pricing guide.

◆ HARDWARE
Hardware: the easy 25-35% of the bill

Hardware is the line most teams build the whole budget around. It's the wrong line. On a five-year horizon, GPU hardware is roughly 25-35% of total cost (KAD, April 2026; Silicon Analysts cluster TCO calculator). The rest is what the cloud provider absorbs and what the on-prem team forgets to model.

Reference numbers as of mid-2026, from Silicon Analysts' cluster TCO data and io.net's January 2026 buyer's guide:

ClusterHardware capexPower drawNotes
DGX H100 (8 GPUs)$200K-$500K~10 kWAir-cooled single node
64x H100 training cluster~$4.0M~90 kW+ $0.4M storage, $0.5M network, $0.3M racks
256-GPU H100 SuperPOD$7-10M~540 kWChilled-water cooling, 20 racks
2,048-GPU H100 datacenter~$15.8M capex~2.2 MWRedundant chillers, InfiniBand spines

Capex ranges from Silicon Analysts cluster TCO calculator and io.net January 2026 buyer's guide. Power draw includes server overhead but excludes PUE.

The numbers compress with scale. Per-GPU hardware cost drops about 8% on bulk orders at 512 GPUs versus 64. The opex doesn't compress the same way: facility, power, cooling, networking, and staff costs are largely fixed regardless of how you got the discount on the silicon.

There's also a depreciation problem nobody likes to model. Cyfuture's January 2026 analysis pegs GPU resale value loss at roughly 50% over two years as new generations land. An H100 purchased in early 2024 is worth meaningfully less in 2026 with B200 and B300 in market and H200 reserved contracts expiring. If you depreciate over 5 years, the back half of that schedule is on hardware that's already obsolescent.

◆ POWER + COOLING
Power and cooling: the line that compounds

An H100 SXM5 draws roughly 1,400W of total system power per GPU once you account for CPU, networking, and cooling overhead, per Silicon Analysts. A 1,024-GPU B200 cluster draws about 1.8 MW. A GB200 NVL72 rack draws 120-130 kW and requires liquid cooling.

Spheron's April 2026 power cost guide breaks down the math for a 1,000-GPU H100 cluster: 1,000 GPUs x 700W x 1.80 server overhead multiplier x 1.4 PUE = approximately 1.76 MW continuous draw. At $0.12/kWh (a conservative floor versus the current US commercial average of $0.13-0.14/kWh per EIA), power alone runs about $152/month per GPU. For a single 8-GPU node, on-prem TCO works out to roughly $1.69/hr depreciation + $0.21/hr power + $2.85/hr everything else = ~$4.75/GPU/hr before networking and facility lease.

$420K/yr

Power bill for a 100-GPU cluster, before cooling overhead (Introl, April 2026)

For context, GPUaaS.com H100 at ~$2.50/GPU/hr buys you the GPU and all of this without separate line items.

Cooling is the line that bites teams late. Air-cooled racks top out around 30 kW. Once you push past that (and B200 already does at the rack level), you need direct-to-chip or immersion liquid cooling. 3exhosting's May 2026 analysis pegs the capex delta for direct-to-chip at $2,500-$4,500 per kW above air-cooled, but operational costs are lower because liquid achieves PUE below 1.2 versus 1.4-1.6 for air. Over 10 years, single-phase immersion has roughly 20% lower TCO than air with rear-door heat exchangers.

According to Spheron's April 2026 power consumption analysis, on-prem TCO for an 8-GPU H100 node works out to approximately $4.75/GPU/hr before networking, maintenance contracts, and facility lease costs, while Spheron's on-demand H100 rate is $2.90/hr.

There's also PUE risk that's rarely modelled honestly. A colocation facility promising PUE 1.3 but delivering 1.5 in summer means your electricity bill is roughly 15% higher than the spreadsheet said. If you're modelling TCO at 1.2 PUE because that's what the brochure says, you're underbudgeting by 15-25% on the power line.

◆ NETWORKING + STORAGE
Networking, storage and the facility

Networking is 10-15% of cluster TCO, per Silicon Analysts. InfiniBand spines for a 2,048-GPU cluster need 10 spine switches and the cabling that goes with them. For a 64-node 512-GPU cluster, networking equipment alone runs around $500K capex.

There's a subtle correctness issue on top of the cost. io.net's January 2026 buyer's guide flags that a single Broadcom PEX8900 96-lane switch hanging eight GPUs behind one NUMA node can add 8 microseconds of DMA contention, reducing throughput by 12%. The fix is a 2-root-complex topology with two CPU sockets, but that's a hardware decision that has to be made at purchase time. Get it wrong and you've paid for 12% throughput you'll never see.

Storage is the other line that compounds. Datasets for foundation model training routinely sit in the petabyte range. High-performance parallel filesystems (Lustre, BeeGFS, WekaFS) cost more than vanilla object storage. A 64-GPU cluster needs roughly $400K of storage capex to feed the GPUs without bottlenecking.

Datacenter space adds another 10-15% of TCO. For a 64-node 512-GPU H100 cluster needing roughly 540 kW and 20 racks, colocation in a tier-3 facility runs anywhere from $300K to $600K per year depending on region and power pricing. Owned datacenter is heavier capex but lower per-kW opex over a 10+ year horizon.

⚠ Watch out

Idle facility costs while waiting for GPUs to arrive are the most-missed line. io.net flags that every month of facility lease before the first GPU lands costs ~$180K. If your supplier slips by 3 months, that's $540K spent on empty racks. Build-from-scratch projects regularly hit this; rented capacity doesn't.

◆ STAFF + MAINTENANCE
Staff, maintenance and the obsolescence clock

Introl's April 2026 TCO model pegs a GPU infrastructure engineer at roughly $275K fully loaded. You need at least two for a 100-GPU cluster, more for anything north of 500 GPUs. io.net's model assumes 5 FTEs for a 24x7 ops team at 512 GPUs.

Maintenance contracts on the hardware itself land in the 8-12% of capex per year range. Cyfuture's January 2026 analysis lumps maintenance and staffing together at $150K+ annually for a single-cluster team. Downtime from failures runs 5-10% utilisation loss in their model. That's not noise: at 80% target utilisation, a 5% downtime hit knocks effective utilisation to 76%, which moves the per-GPU-hour cost up by about 5%.

Obsolescence is the line that nobody likes to write down. Silicon Data's pricing trends analysis notes that many organisations who reserved A100 and H100 hardware under 3-year contracts in 2023-2024 will see those reservations expire in 2026, with additional inventory returning to the market as buyers transition to B200 and GB300. Translation: the resale value of H100 hardware bought in 2024 is meaningfully lower in 2026 than the depreciation curve assumed. Cyfuture estimates 50% value loss over two years.

According to Introl's April 2026 TCO model, a single GPU engineer commands roughly $275K annually fully loaded, and Gartner data cited in the same report shows 73% of enterprises underestimate AI infrastructure costs by failing to account for operational expenses.

◆ WORKED EXAMPLE
The full-stack TCO: worked example for a 100-GPU H100 cluster

Introl's April 2026 TCO model walks through a 100-GPU H100 deployment. Hardware: $3M. Five-year all-in cost: $8.6M. Hardware is 35% of the bill. The rest:

Line item5-year cost% of TCOSource
Hardware (100 x H100 + racks)$3.0M35%Introl April 2026
Power (5 yrs at $420K/yr)$2.1M24%Introl April 2026
Cooling, networking, storage$1.4M16%Silicon Analysts cluster TCO
Staff (2 eng x 5 yrs)$1.5M17%Introl April 2026
Maintenance, downtime, software$0.6M8%Introl April 2026

5-year TCO: $8.6M. Effective rate at 80% utilisation: roughly $2.45/GPU/hr.

Compounding effect at scale matters. A 256-GPU B200 cluster owned and air-cooled colocated in Texas runs about $12M in hardware and $15.5M all-in TCO over 3 years with a loan and operating costs, per American Compute's March 2026 analysis. At 80% utilisation, that's about $2.88/GPU/hr actually used. Most enterprises finance ~70% of hardware via a loan, which means $4M in cash up front plus monthly loan repayments around $300K. The TCO is the obvious number. The cash flow timing is the one that breaks teams.

⚡ The 165% rule

Introl's research on real deployments shows organisations that model only hardware costs discover budget overruns averaging 165% by year three. Examples in their dataset: a biotech that budgeted $2M for 50 H100s and hit $7.8M actual TCO; an autonomous vehicle startup that budgeted $6M for 200 GPUs and hit $28M. Both achieved ROI eventually, but both needed emergency funding mid-deployment.

Hyperscaler "you don't see it" pricing isn't free either. Hivenet's June 2026 analysis estimates that when storage ($0.10-0.50/hr workspace), bandwidth ($0.12-1.00/GB egress), and platform fees are included, the effective hyperscaler GPU hourly cost can reach 2-3x the advertised rate. A $4 H100 hourly rate that becomes $8-12 in practice changes the rent-vs-buy maths entirely. See the GPUaaS.com vs hyperscaler pricing breakdown for the full line-by-line.

◆ BUILD VS RENT
Build vs rent: where the break-even actually sits in 2026

American Compute's March 2026 analysis lands at a clean conclusion for B200 clusters: ownership becomes cheaper than renting somewhere above 70% utilisation. Below that, the idle capacity you've paid for outweighs the per-hour discount. KAD's April 2026 analysis lands at a similar 60-70% break-even threshold.

The break-even shifts with three variables: electricity rate (lower = ownership wins sooner), GPU generation lifespan (longer = ownership wins), and reserved cloud pricing (lower reserved rates push break-even higher). Introl's analysis flags that break-even has shifted to favour cloud below 60-70% utilisation thanks to neo-cloud reserved rates approaching $2/hr for H100 by mid-2026.

Most teams sit below the break-even and don't know it. The dominant 2026 procurement pattern is hybrid: a steady-state reserved base for production load, plus on-demand or short-term contract capacity for everything else. Run a base cluster for steady-state workloads, and add flexible capacity for peaks and experiments. The base can be owned hardware or a long-term reserved contract; the flex layer rides on top.

Option for 256-GPU B200 base3-yr costEffective $/GPU/hrNotes
Owned hardware + colo (TX)~$15.5M~$2.8880% utilisation, with loan
Hyperscaler 3-yr reserved B200~$18-25M~$3.50-4.50Includes egress + support, non-cancellable
GPUaaS.com contract B200Quote-dependentfrom ~$4.50Short-term and long-term, no multi-year lock-in

Owned hardware: American Compute March 2026. Hyperscaler reserved estimated from public 1-year tier pricing + typical 3-yr discount. GPUaaS.com rates are indicative.

The case for ownership is utilisation discipline. You commit capital, you make sure it runs hot. The case against ownership is everything that goes wrong before you get there: facility build delays, GPU shipment delays, the ops team you have to hire before you have anything to operate, the obsolescence curve that's faster than the depreciation schedule says. The case for GPUaaS.com is that none of those risks land on you. You get a contract rate, you get dedicated NVIDIA infrastructure, and the contract length matches your utilisation confidence rather than the 36-month minimum your CFO has to write off if the workload changes.

Your search for enterprise GPU compute ends here.

NVIDIA infrastructure at rates hyperscalers won't offer you. H100, H200, B200, B300 clusters. Short-term and long-term contracts. No multi-year lock-in. Quotes within 24 hours.

Get a quote on your cluster
◆ FAQ
Frequently asked questions

GPU hardware represents only 25-35% of 5-year total cost of ownership. Power and cooling (24-34%), networking and storage (10-15%), staff and maintenance (15-20%), and software, datacenter space, and downtime make up the rest. A 100-GPU H100 cluster with $3M in hardware typically reaches $8.6M in 5-year TCO once everything is modelled (Introl, April 2026).

Owned hardware typically beats renting above 70% sustained utilisation, depending on GPU model, electricity rates, and cloud reserved pricing (American Compute, March 2026). Below 70%, the idle capacity outweighs the per-hour discount. Most teams sit well below this threshold without knowing it. The 2026 default is hybrid: a reserved or owned base for steady-state, contract or PAYG for everything else.

Roughly $420,000 per year for a 100-GPU H100 cluster, per Introl's April 2026 model. That assumes $0.12/kWh electricity and 1.4 PUE. At $0.13-0.14/kWh (closer to current US commercial average per EIA), it lands higher. Per GPU, power alone runs about $152/month before cooling overhead.

Five lines most teams miss: facility lease idle time (~$180K/month while waiting for GPUs to arrive), PUE overrun risk (modelled 1.2, delivered 1.5 = 25% higher power bill), downtime/failure utilisation hit (5-10% per Cyfuture), obsolescence depreciation (50% value loss in 2 years), and software/licensing (2-5% of TCO). On hyperscalers, add egress at $0.08-0.12/GB and support tiers at 10% of monthly spend. Combined, these typically inflate the TCO 20-40% above the spreadsheet number.

GPUaaS.com contract rates run from ~$2.50/GPU/hr for H100, ~$3.00/GPU/hr for H200, and ~$4.50/GPU/hr for B200 and B300. Owned hardware lands at ~$2.88/GPU/hr effective for a 256-GPU B200 cluster at 80% utilisation (American Compute, March 2026). The trade-off is risk: GPUaaS.com absorbs the facility, the staff, the cooling, the GPU obsolescence, and the cash flow timing. You get dedicated NVIDIA infrastructure on a contract length that matches your utilisation confidence, with short-term and long-term contracts and no multi-year lock-in. Get a quote.

Roughly 70% sustained utilisation for B200 clusters (American Compute, March 2026), and 60-70% across most GPU generations (KAD, April 2026). Below that threshold, the per-hour discount from ownership doesn't compensate for the idle capacity you've already paid for. Cast AI's 2026 data shows the average GPU utilisation across 23,000 measured clusters sits at just 5%. Most teams aren't anywhere near the break-even threshold, but model the project as if they will be.

Cyfuture's January 2026 analysis pegs payback on an 8-H100 cluster ($300K+ upfront) versus renting at $20/hour at roughly 7-8 months of full-time use, before hidden ops costs. Once you add staff, maintenance, cooling, and the obsolescence curve, the real payback on a similar workload moves out to 18-24 months. For most teams considering ownership, the question isn't "when does payback hit" but "will my workload still be running at the same utilisation level in 24 months". If the model architecture or workload pattern is likely to change, the payback case weakens.

Last reviewed: June 5, 2026. Hardware cost ranges from Silicon Analysts cluster TCO calculator and io.net GPU Cluster Buyer's Guide (January 2026). Power and cooling figures from Spheron AI Inference Power Consumption guide (April 2026) and 3exhosting Data Center Power and Cooling Costs (May 2026). TCO percentages and worked example from Introl GPU Infrastructure TCO model (April 2026) and KAD GPU Cluster TCO analysis (April 2026). 256-GPU B200 build-vs-rent figures from American Compute (March 2026). Hyperscaler hidden cost ranges from Hivenet (June 2026) and Cyfuture (January 2026). GPUaaS.com rates are indicative, contract-based, and quote-dependent.

Share this article:LinkedInX / TwitterCopy link
FIND THE BEST GPU DEAL

Get a wholesale GPU quote in a few hours

NVIDIA B200, H200, H100, A100, RTX Pro 6000 — N. America, EU, MEA, APAC. No buyer fees.

Related articles