Lambda Labs is roughly one-hundredth the size of AWS. AWS still acts like Lambda is a real competitor. That's not flattery — it's revenue moving. The bet Lambda made in 2017 (skip everything that isn't ML, hire engineers instead of sales) is paying out in 2026 because the buyer it courts, the team that wants a GPU running PyTorch in three minutes, not a quote from a solutions architect, is now most of the market. This is what that buyer should know before swiping a card.
We ran 11 weeks of mixed workloads across Lambda's on-demand and Reserved tiers. We compared head-to-head against AWS p5, GCP A3, RunPod, CoreWeave, and Modal. Here's the full audit, scored on the eight dimensions that decide who wins this kind of work.
How we tested
Trust the rubric or don't read the review. Our testing window ran from Feb 14 to May 1, 2026. We provisioned identical workloads across six providers and recorded provisioning latency, training throughput, inference throughput, spot interruption rate, support response time, billing accuracy, and uptime against advertised SLA.
Three editors ran the tests independently from separate accounts in separate regions. We didn't tell the providers we were testing. No free credits, no editorial accommodation, every account paid retail. Total spend across providers: $14,420.
The benchmarks we cared about:
- Llama 3.1 8B fine-tune, 5 epochs on a 250k-row instruction dataset, FSDP across 4 GPUs, mixed precision bf16.
- Llama 3.1 70B inference, vLLM 0.7+, FP8 quantization, batch size 32, 2048 input / 512 output tokens.
- Llama 3.1 405B training, 8x H100 SXM node, NCCL all-reduce on InfiniBand, tokens/sec/GPU.
- Stable Diffusion XL inference, diffusers + SDXL Turbo, batch 4, 30 steps, FP16.
- Provisioning latency, time from "Launch" click to SSH-ready VM, sampled 12 times per provider across weekdays and weekend nights.
We published the raw logs and benchmark scripts on the methodology page. Anyone can re-run them. That matters because the second a reviewer hides their test setup, the rubric becomes a vibe check.
The verdict, in 60 seconds
GAX Score: 94/100. Lambda wins the self-serve GPU cloud category outright in 2026. Provisioning time under a minute, transparent pricing that doesn't require a sales call, and the cleanest ML environment of any provider tested.
Buy it if you're an indie ML team, a research lab, or a series-A startup training models under $50k/month. Skip it if you're in healthcare under HIPAA, public sector under FedRAMP, running a global app that needs sub-100ms latency to Asia, or you've graduated to the kind of workload that needs guaranteed 24/7 capacity on a year-long contract, that's CoreWeave's job, not Lambda's.
Where the 94 comes from
The GAX rubric for GPU cloud weights 8 dimensions. Here's how Lambda scored on each, and what each dimension is worth in the composite.
| Dimension | Weight | Lambda | What it measures |
|---|---|---|---|
| Throughput (FP8) | 20% | 96 | Sustained tokens/sec on standardized inference + training runs |
| Pricing per GPU-hr | 18% | 93 | On-demand + reserved $/GPU-hr against blended market median |
| Software stack | 14% | 95 | Time to first training step, image freshness, framework support |
| Latency | 12% | 88 | Inference tail latency P95 + intra-cluster all-reduce |
| Trust & uptime | 10% | 86 | SLA adherence, incident transparency, status page quality |
| Support | 10% | 84 | Median response time across paying tiers |
| Spot availability | 8% | 78 | Capacity hit rate on H100 SXM under load |
| Regions | 8% | 64 | Geographic coverage + sovereign options |
The two scores dragging Lambda down, regions (64) and spot availability (78), are the two things AWS, GCP, and Azure dominate on, and where Lambda has made no real progress in 18 months. If those matter to you more than the top three, the math changes.
What it gets right
Provisioning is the fastest in the industry, full stop
Average time from clicking "Launch" to SSH-ready VM across 12 samples: 52 seconds. AWS p5 in us-east-1 averaged 6 minutes 14 seconds in the same week. GCP A3 averaged 4 minutes 41 seconds. RunPod Secure averaged 92 seconds. Modal cold-starts a function in 8-15 seconds but that's a different product category.
What 5 minutes vs 52 seconds means for an ML team is the difference between "I'll start the run and grab coffee" and "I'll start the run." Across a week of iteration, that's hours of latent waiting that just stops existing. If you're a researcher trying ideas, this single thing changes your relationship with the cloud.
Lambda Stack is the only ML image that ships ready to run
Every Lambda VM ships with Ubuntu 22.04, CUDA 12.4, cuDNN 9.x, PyTorch 2.4+, TensorFlow 2.17, JAX 0.4.x, and the matched NVIDIA drivers. All of it pre-installed, all of it tested against each other. Your first torch.cuda.is_available() returns True without a single apt install.
We timed setup-to-first-step on a Llama 3.1 fine-tune across providers. Lambda: 4 minutes including dataset upload. AWS Deep Learning AMI: 23 minutes (the AMI is older, has to be patched). GCP Container Image: 11 minutes. The numbers compound across a team. If five engineers each save 20 minutes a day on environment work, that's almost two engineer-weeks recovered per quarter.
Pricing transparent enough to do math in your head
Lambda lists every GPU rate on the public website. No "Contact us." No tier-gating for accounts under $10k/month. Sign up with a credit card, get billed by the hour, see the meter run. Reserved Cloud requires a conversation above 8 H100s, but the on-demand grid below that is fully self-serve.
For a series-A startup that needs to model out training spend for a board deck next week, this matters. You can know what you'll pay before you commit. With AWS Capacity Blocks for ML you can technically do this too, but the UX assumes you're already a customer of half the AWS console.
Where it falls short
Capacity on H100 SXM is a coin flip during Q4 and earnings season
Of our 12 launch attempts on 1x H100 SXM, 4 returned "Coming back soon" or queued for more than 30 minutes. That's a 33% miss rate on availability, sampled across two months. Lambda is honest about this, they show capacity in real time on the launch page, but if you need a GPU at a specific time, this is the thing that bites you.
The pattern repeats every Q4 like clockwork: NeurIPS deadline crunch + end-of-year model release calendar + GPU rental from training labs renting Lambda fleet. December is unusable for self-serve H100 SXM in some weeks. H100 PCIe and A100 SXM stay available, but if you specifically need SXM for NVLink-bound training, you'll feel it.
Five US regions and nothing else
Lambda runs out of Texas, North Carolina, Arizona, Oregon, and California. That's it. No EU sovereign region. No Asia-Pacific. No GovCloud equivalent.
For training jobs this rarely matters, your model doesn't care where it lives during a 36-hour run. For inference serving, this is a real ceiling. If your users sit in Sydney, your P95 latency to Lambda is 200+ ms before your model even responds. AWS, GCP, and Azure all have 25+ regions; CoreWeave has 14; even RunPod has 30+ data centers. Lambda is the smallest geographic footprint of any meaningful GPU cloud.
No serverless function tier
You run VMs on Lambda. You don't run functions. There's no def my_inference(): that auto-scales between 0 and 100. If your workload is bursty inference where you want to pay zero when idle, Modal and Replicate beat Lambda outright. RunPod has serverless too. Lambda doesn't, and from talking to their team, isn't building it.
Reserved Cloud above 8 GPUs is sales-led
Want 16 H100s reserved for a year? You're getting a Calendly link. Pricing isn't public for these commitments and depends on how desperate you sound. That's normal in this market, CoreWeave is sales-led at every tier, but it does mean Lambda's self-serve magic stops at the medium-team boundary.
No HIPAA, no FedRAMP, no GovCloud equivalent in 2026
If your workload touches PHI under HIPAA, regulated public-sector data under FedRAMP Moderate/High, or controlled unclassified information, Lambda is off the table. They've published no compliance roadmap. AWS HealthLake or Azure Confidential GPUs are the answers there, not Lambda.
Pricing reality
The published rates as of May 19, 2026:
| GPU | VRAM | Lambda on-demand | Lambda reserved (1yr) | RunPod Secure | AWS effective |
|---|---|---|---|---|---|
| H100 SXM | 80 GB | $2.99/hr | $1.85/hr | $2.99/hr | ~$12.29/hr |
| H100 PCIe | 80 GB | $2.49/hr | $1.59/hr | $2.49/hr | ~$9.80/hr |
| H200 SXM | 141 GB | $3.29/hr | $2.10/hr | $3.49/hr | ~$14.50/hr |
| B200 SXM | 192 GB | $3.79/hr | n/a | n/a | n/a (preview) |
| A100 80GB SXM | 80 GB | $1.79/hr | $1.10/hr | $1.89/hr | ~$5.12/hr |
| A6000 | 48 GB | $0.80/hr | $0.49/hr | $0.76/hr | n/a |
AWS effective rate is calculated from p5.48xlarge at $98.32/hr divided by 8 GPUs. That's a 4.1x gap vs Lambda on-demand H100 SXM. Hyperscaler tax is real, and most of it is paying for things you already aren't using (multi-AZ failover, IAM granularity, FedRAMP overhead).
Benchmark matrix
All numbers are GAX-measured (May 2026). For training, higher is better. For latency, lower is better.
| Workload | Lambda H100 SXM | RunPod H100 SXM | CoreWeave H100 SXM | AWS p5 H100 SXM |
|---|---|---|---|---|
| Llama 3.1 8B fine-tune (tok/s/GPU) | 412 | 406 | 409 | 403 |
| Llama 3.1 70B inference (tok/s, vLLM FP8) | 1,892 | 1,840 | 1,876 | 1,801 |
| Llama 3.1 405B training (tok/s/GPU, 8x node) | 418 | n/a | 431 | 422 |
| SDXL inference (img/s, batch 4) | 3.41 | 3.28 | 3.35 | 3.22 |
| NCCL all-reduce P50 (μs, 4-GPU) | 78 | 89 | 72 | 81 |
| SSH-ready latency (s) | 52 | 92 | 117 | 374 |
The raw silicon performs identically across providers, same H100 SXM5 is the same H100 SXM5, that's NVIDIA's job. The variance comes from how each provider configures InfiniBand, NVLink topology, and the underlying hypervisor. Lambda runs bare metal on most SKUs; AWS adds a Nitro overhead that costs ~3% on most workloads.
Cost-to-performance ratio
The number that actually decides procurement: cost per million tokens generated. Calculated from above benchmark + on-demand pricing.
| Provider | $/hr | Llama 70B tok/s | $/M tokens (on-demand) | vs Lambda |
|---|---|---|---|---|
| Lambda H100 SXM | $2.99 | 1,892 | $0.439 | n/a |
| RunPod Secure H100 SXM | $2.99 | 1,840 | $0.451 | +3% |
| CoreWeave H100 SXM (contract) | $2.40 | 1,876 | $0.355 | −19% (with year commit) |
| AWS p5 H100 SXM | $12.29 | 1,801 | $1.895 | +331% |
| Lambda Reserved 1-yr H100 SXM | $1.85 | 1,892 | $0.271 | −38% |
The Reserved-Cloud rate beats every public on-demand price in the market. If you can commit a year, Lambda becomes the cheapest production inference platform in the on-shore US, period.
Hardware & software stack
GPU SKUs available on Lambda Cloud right now: H100 SXM, H100 PCIe, H200 SXM, B200 SXM (preview), A100 SXM 80GB, A100 PCIe 80GB, A6000, A40, A10, V100, GH200. Multi-GPU instances available in 1x, 2x, 4x, and 8x configurations. 1-Click Clusters extend to 64 H100s with NVLink + InfiniBand for tightly-coupled training.
Storage: Lambda Filesystem provides NVMe-backed persistent volumes that survive instance termination. 10 TB tier free with paid usage; pricing scales linearly. Throughput hits 4-6 GB/s read on standard tier.
Software: Lambda Stack is the headline image but you can BYO Docker if you want a different base. Lambda 1-Click Cluster ships SLURM pre-configured for multi-node jobs. K8s on Lambda is in beta and works for most non-GPU-pinned workloads; for GPU-pinned, you're better off with their managed SLURM.
Networking: Each H100 SXM node has 8x 400 Gbps InfiniBand NDR. Intra-cluster all-reduce is competitive with CoreWeave's offering. Public egress is $0.05/GB after the first 10 TB free.
Scenario simulation: what Lambda costs for your actual work
Generic prices mean nothing until you map them to your work. Three scenarios at representative monthly volumes.
Scenario A: Indie ML researcher iterating on fine-tunes
Workload: 4x H100 PCIe, 6 hours/day, 22 days/month.
Monthly cost: $2.49 × 4 × 6 × 22 = $1,315
What you get for that: ~3,168 GPU-hours, enough to fine-tune 5-8 Llama 3.1 8B variants on real datasets, plus light SDXL work. AWS equivalent: ~$5,400. Lambda is the rational choice here.
Scenario B: Series-A startup running production inference
Workload: 2x H100 SXM, 24/7, on Reserved 1-year contract.
Monthly cost: $1.85 × 2 × 24 × 30 = $2,664
For ~91M Llama 70B tokens/month at our measured throughput. $/M tokens: $0.029. That undercuts most managed inference API pricing for the same model. The catch: you have to run the inference layer (vLLM, TGI) yourself.
Scenario C: Mid-size lab doing pretraining sprints
Workload: 64x H100 SXM cluster, 14 days, then released.
Monthly cost: $2.99 × 64 × 24 × 14 = $64,329
Same compute on AWS Capacity Blocks for ML (reserved 14 days, p5.48xlarge × 8): ~$262,000. Lambda 1-Click Clusters made this provisioning workable in one afternoon, booking the AWS equivalent took us seven business days and three approval emails.
Use-case match matrix
If your workload looks like the left column, this is whether Lambda is the right call.
| Workload | Lambda fit | Better alternative |
|---|---|---|
| Fine-tune Llama 70B on 8x H100 for 24 hours | ✓ Strong | n/a |
| Train a foundation model for 30 days, 64 GPUs | ✓ Strong (reserve up-front) | CoreWeave for >128 GPU year commits |
| Production inference, US-only users | ✓ Strong | n/a |
| Production inference, global <100ms target | ✗ Weak | AWS p5 multi-region or Replicate |
| Burst inference (idle most of the time) | ✗ Weak (you pay 24/7) | Modal, Replicate, RunPod serverless |
| HIPAA / FedRAMP / GovCloud | ✗ Blocked | AWS HealthLake, AWS GovCloud, Azure |
| Research iteration / Jupyter notebooks | ✓ Best in class | n/a |
| Long-running batch training, <$10k/mo | ✓ Strong | n/a |
| Indie SDXL / image gen API | ~ OK | Replicate if you want a hosted API |
| Multimodal inference with custom CUDA kernels | ✓ Strong (BYO Docker) | n/a |
Stability & uptime history
Lambda publishes a status page at status.lambdalabs.com. We pulled the last 18 months of incidents and cross-referenced with our own monitoring.
| Period | Measured uptime | Major incidents | Notes |
|---|---|---|---|
| Nov 2024 – Jan 2025 | 99.58% | 1 multi-region outage (3h 12min, Dec 18) | Filesystem service degraded; compute unaffected for most |
| Feb 2025 – Apr 2025 | 99.91% | 0 major | Three planned maintenance windows, all under-promise |
| May 2025 – Jul 2025 | 99.74% | 1 SXM cluster outage (5h 40min, Jun 22) | Texas DC cooling failure; postmortem published 6 days later |
| Aug 2025 – Oct 2025 | 99.83% | 1 network event (1h 8min, Sep 14) | Backbone provider issue, not Lambda's stack |
| Nov 2025 – Jan 2026 | 99.46% | 2 capacity events | Not strictly downtime; H100 SXM exhausted for 6+ hour windows |
| Feb 2026 – Apr 2026 | 99.88% | 0 major | Best quarter on record |
Blended 18-month measured uptime: 99.73%. Lambda's published SLA is 99.5% on Reserved Cloud, so they're inside SLA every period above, but the December 2024 incident and June 2025 cooling event both warranted SLA credits. To Lambda's credit, both postmortems went up within a week with root cause and engineering response, which is more than most providers in this segment do.
Longitudinal pricing data
Lambda's pricing has moved in a way that tells you something about the GPU cloud market in general.
| Date | H100 SXM | H100 PCIe | A100 80GB | Notes |
|---|---|---|---|---|
| May 2024 | $3.29/hr | $2.69/hr | $1.99/hr | n/a |
| Nov 2024 | $2.99/hr | $2.49/hr | $1.79/hr | −9% cut on H100 lineup |
| Feb 2025 | $2.99/hr | $2.49/hr | $1.79/hr | H200 added at $3.49 |
| Aug 2025 | $2.99/hr | $2.49/hr | $1.79/hr | H200 cut to $3.29 |
| Feb 2026 | $2.99/hr | $2.49/hr | $1.79/hr | B200 preview added at $3.79 |
| May 2026 | $2.99/hr | $2.49/hr | $1.79/hr | Current |
Two takeaways. One: H100 on-demand pricing has been flat for 18 months. That's unusual in a market that's supposed to be hyperscaling. It means the cost floor for H100s is around $2.50-3.00/hr on-demand and nobody can profitably go lower without changing the SKU mix. Two: B200 entering at $3.79/hr signals that Blackwell pricing is anchoring well above H100. Don't expect a price collapse soon.
Community sentiment
We pulled 6 months of mentions from Reddit (r/LocalLLaMA, r/MachineLearning, r/MLOps), Hacker News comment threads tagged GPU/ML, and X/Twitter posts that named Lambda. Manually classified each by sentiment (positive / neutral / negative) and theme. Sample size: 1,847 mentions.
| Source | Positive | Negative | Top complaint | Top praise |
|---|---|---|---|---|
| r/LocalLLaMA (n=812) | 71% | 18% | H100 SXM capacity | Spin-up speed |
| Hacker News (n=394) | 64% | 22% | Limited regions | Pricing transparency |
| r/MachineLearning (n=287) | 78% | 11% | No serverless | Lambda Stack |
| X/Twitter (n=354) | 69% | 15% | Support response | Self-serve UX |
Net sentiment: +52 (highly positive). The single most common praise across all four sources is "I had a GPU running in less than two minutes." The single most common complaint is "I tried to launch an H100 SXM and got Coming back soon." Those two threads basically define Lambda's experience.
Who should avoid this
Don't use Lambda if you fall into any of these categories. Saying so up-front saves you a refund request later.
- Healthcare ML teams handling PHI under HIPAA. Lambda doesn't sign a BAA. Use AWS HealthLake, Azure Health Data Services, or Google Cloud Healthcare.
- Public sector workloads under FedRAMP Moderate/High. No GovCloud equivalent. AWS GovCloud or Azure Government.
- Global inference with sub-100ms P95 latency targets to APAC or EMEA. Lambda's five US regions are not enough. AWS p5 multi-region or Replicate's edge inference.
- Bursty inference workloads that should pay zero when idle. Lambda bills per VM-hour; you pay for the box, not the work. Modal Labs and Replicate are the right call for true serverless GPU.
- 24/7 production with strict availability SLA on H100 SXM specifically. Lambda's capacity is not deterministic at the SXM tier. CoreWeave reserved contracts give you guaranteed allocation.
- Enterprise procurement with $100k+/month spend wanting a TAM and dedicated solutions architect. Lambda's enterprise motion exists but is light by hyperscaler standards. CoreWeave or AWS enterprise.
- Anyone whose buying committee includes Legal and Procurement first. Lambda's contract templates work but aren't infinitely customizable. AWS, Azure, GCP have done this dance ten thousand times.
Testing evidence
The benchmarks above came from our own logs. We're publishing the relevant excerpts and screenshots so you can replicate or argue with them.
[2026-04-22 14:08:11] vLLM 0.7.3 starting on lambda-h100-sxm-1... [2026-04-22 14:08:14] Loading model meta-llama/Llama-3.1-70B-Instruct-FP8... [2026-04-22 14:08:47] Model loaded. KV cache: 64 GB. Max context: 8192. [2026-04-22 14:09:00] Benchmark start. Concurrency: 32. Input: 2048. Output: 512. [2026-04-22 14:14:00] Tokens generated: 567,600. Wall time: 300.04s. [2026-04-22 14:14:00] Throughput: 1,892.21 tok/s. P50 latency: 211ms. P95: 478ms.
launch_id region gpu launch_ts ssh_ready_ts delta L-001 us-tx-1 h100-sxm-1 2026-03-14 09:22:08 2026-03-14 09:22:58 50s L-002 us-az-1 h100-pcie-1 2026-03-15 11:04:33 2026-03-15 11:05:21 48s L-003 us-nc-1 h100-sxm-1 2026-03-17 08:18:09 Coming back soon , L-004 us-tx-1 h100-pcie-1 2026-03-19 14:30:21 2026-03-19 14:31:18 57s L-005 us-tx-1 h100-sxm-1 2026-03-22 02:11:08 2026-03-22 02:11:53 45s ... [7 more] Average ssh_ready_delta (successful): 52s Capacity miss rate (h100-sxm): 4/12 = 33%
You can request the full dataset at editorial@hardtechbrief.com. Full benchmark scripts (vLLM config, dataset preprocessing, NCCL test parameters) live on the methodology page.
ROI calculator
Plug your team's workload to see what Lambda costs you. Numbers update live.
Reserved rate shown assumes 1-year commitment on equivalent SKU. AWS comparison: at p5 effective rate, the same configuration would cost roughly 4.1x Lambda's on-demand price.
The verdict
If you're a researcher iterating on models, a series-A startup running production inference under $50k/month, or a lab booking a 14-day pretraining sprint, Lambda Labs is the right call in 2026. Provisioning speed, ML-clean images, transparent pricing, and a price/performance ratio nobody else hits on-demand make this the default GPU cloud for the indie-to-mid-stage tier.
The places it fails, regions, serverless, compliance, peak capacity on H100 SXM, are real, and we've named them above so you can route around them. For everyone else: stop reading reviews, swipe a card, launch an H100, and start the work. That's the whole point.
If Lambda doesn't fit, consider
Modal Labs
Python-native, function-style billing, zero cost when idle. Best for bursty workloads.
Read Modal review →CoreWeave
Bigger fleet, longer contracts, white-glove enterprise motion. Best above $50k/month.
Read CoreWeave review →Vast.ai
Decentralized marketplace. Lowest hourly rates if you accept interrupt risk and uneven hosts.
Read Vast.ai review →