B200 SXM availability is tightening. Lead times across the GPUaaS.com provider network have extended to 6–8 weeks for new cluster requests, up from 3–4 weeks in Q1 2026. The primary driver is demand from inference serving deployments, which are scaling faster than anticipated.
What is driving the tightening
Inference workloads have a fundamentally different capacity profile than training. Training runs have defined start and end dates — inference serving runs continuously and scales with traffic. Several large deployments that began in Q1 are now consuming capacity that would otherwise be available for new customers.
What this means for Q3 planning
Enterprise teams planning GPU procurement for Q3 should begin the sourcing process now rather than in June. GPUaaS.com can source quotes within 24 hours, but the lead time for actual cluster availability is being set by provider inventory, not by our sourcing speed.
H200 SXM remains more readily available with 2–3 week lead times for most configurations. Teams with flexibility on GPU model should factor this into their planning.