Item: Lambda Labs
Rating: 94
Author: GAX Online

Lambda Labs is roughly one-hundredth the size of AWS, yet AWS still treats it as a real competitor in AI infrastructure. After 11 weeks of mixed workloads across six providers, here's why that gap exists and which buyers should care.

Lambda Labs is roughly one-hundredth the size of AWS. AWS still acts like Lambda is a real competitor. That's not flattery, it's revenue moving. The bet Lambda made in 2017 (skip everything that isn't ML, hire engineers instead of sales) is paying out in 2026 because the buyer it courts, the team that wants a GPU running PyTorch in three minutes, not a quote from a solutions architect, is now most of the market. This is what that buyer should know before swiping a card.

We ran 11 weeks of mixed workloads across Lambda's on-demand and Reserved tiers. We compared head-to-head against AWS p5, GCP A3, RunPod, CoreWeave, and Modal. Here's the full audit, scored on the eight dimensions that decide who wins this kind of work.

How we tested

Trust the rubric or don't read the review. Our testing window ran from Feb 14 to May 1, 2026. We provisioned identical workloads across six providers and recorded provisioning latency, training throughput, inference throughput, spot interruption rate, support response time, billing accuracy, and uptime against advertised SLA.

Three editors ran the tests independently from separate accounts in separate regions. We didn't tell the providers we were testing. No free credits, no editorial accommodation, every account paid retail. Total spend across providers: $14,420.

The benchmarks we cared about:

Llama 3.1 8B fine-tune, 5 epochs on a 250k-row instruction dataset, FSDP across 4 GPUs, mixed precision bf16.
Llama 3.1 70B inference, vLLM 0.7+, FP8 quantization, batch size 32, 2048 input / 512 output tokens.
Llama 3.1 405B training, 8x H100 SXM node, NCCL all-reduce on InfiniBand, tokens/sec/GPU.
Stable Diffusion XL inference, diffusers + SDXL Turbo, batch 4, 30 steps, FP16.
Provisioning latency, time from "Launch" click to SSH-ready VM, sampled 12 times per provider across weekdays and weekend nights.

We published the raw logs and benchmark scripts on the methodology page. Anyone can re-run them. That matters because the second a reviewer hides their test setup, the rubric becomes a vibe check.

The verdict, in 60 seconds

GAX Score: 94/100. Lambda wins the self-serve GPU cloud category outright in 2026. Provisioning time under a minute, transparent pricing that doesn't require a sales call, and the cleanest ML environment of any provider tested.

Buy it if you're an indie ML team, a research lab, or a series-A startup training models under $50k/month. Skip it if you're in healthcare under HIPAA, public sector under FedRAMP, running a global app that needs sub-100ms latency to Asia, or you've graduated to the kind of workload that needs guaranteed 24/7 capacity on a year-long contract, that's CoreWeave's job, not Lambda's.

Where the 94 comes from

The GAX rubric for GPU cloud weights 8 dimensions. Here's how Lambda scored on each, and what each dimension is worth in the composite.

Dimension	Weight	Lambda	What it measures
Throughput (FP8)	20%	96	Sustained tokens/sec on standardized inference + training runs
Pricing per GPU-hr	18%	93	On-demand + reserved $/GPU-hr against blended market median
Software stack	14%	95	Time to first training step, image freshness, framework support
Latency	12%	88	Inference tail latency P95 + intra-cluster all-reduce
Trust & uptime	10%	86	SLA adherence, incident transparency, status page quality
Support	10%	84	Median response time across paying tiers
Spot availability	8%	78	Capacity hit rate on H100 SXM under load
Regions	8%	64	Geographic coverage + sovereign options

The two scores dragging Lambda down, regions (64) and spot availability (78), are the two things AWS, GCP, and Azure dominate on, and where Lambda has made no real progress in 18 months. If those matter to you more than the top three, the math changes.

What it gets right

Provisioning is the fastest in the industry, full stop

Average time from clicking "Launch" to SSH-ready VM across 12 samples: 52 seconds. AWS p5 in us-east-1 averaged 6 minutes 14 seconds in the same week. GCP A3 averaged 4 minutes 41 seconds. RunPod Secure averaged 92 seconds. Modal cold-starts a function in 8-15 seconds but that's a different product category.

What 5 minutes vs 52 seconds means for an ML team is the difference between "I'll start the run and grab coffee" and "I'll start the run." Across a week of iteration, that's hours of latent waiting that just stops existing. If you're a researcher trying ideas, this single thing changes your relationship with the cloud.

Lambda Stack is the only ML image that ships ready to run

Every Lambda VM ships with Ubuntu 22.04, CUDA 12.4, cuDNN 9.x, PyTorch 2.4+, TensorFlow 2.17, JAX 0.4.x, and the matched NVIDIA drivers. All of it pre-installed, all of it tested against each other. Your first torch.cuda.is_available() returns True without a single apt install.

We timed setup-to-first-step on a Llama 3.1 fine-tune across providers. Lambda: 4 minutes including dataset upload. AWS Deep Learning AMI: 23 minutes (the AMI is older, has to be patched). GCP Container Image: 11 minutes. The numbers compound across a team. If five engineers each save 20 minutes a day on environment work, that's almost two engineer-weeks recovered per quarter.

Pricing transparent enough to do math in your head

Lambda lists every GPU rate on the public website. No "Contact us." No tier-gating for accounts under $10k/month. Sign up with a credit card, get billed by the hour, see the meter run. Reserved Cloud requires a conversation above 8 H100s, but the on-demand grid below that is fully self-serve.

For a series-A startup that needs to model out training spend for a board deck next week, this matters. You can know what you'll pay before you commit. With AWS Capacity Blocks for ML you can technically do this too, but the UX assumes you're already a customer of half the AWS console.

Where it falls short

Capacity on H100 SXM is a coin flip during Q4 and earnings season

Of our 12 launch attempts on 1x H100 SXM, 4 returned "Coming back soon" or queued for more than 30 minutes. That's a 33% miss rate on availability, sampled across two months. Lambda is honest about this, they show capacity in real time on the launch page, but if you need a GPU at a specific time, this is the thing that bites you.

The pattern repeats every Q4 like clockwork: NeurIPS deadline crunch + end-of-year model release calendar + GPU rental from training labs renting Lambda fleet. December is unusable for self-serve H100 SXM in some weeks. H100 PCIe and A100 SXM stay available, but if you specifically need SXM for NVLink-bound training, you'll feel it.

Five US regions and nothing else

Lambda runs out of Texas, North Carolina, Arizona, Oregon, and California. That's it. No EU sovereign region. No Asia-Pacific. No GovCloud equivalent.

For training jobs this rarely matters, your model doesn't care where it lives during a 36-hour run. For inference serving, this is a real ceiling. If your users sit in Sydney, your P95 latency to Lambda is 200+ ms before your model even responds. AWS, GCP, and Azure all have 25+ regions; CoreWeave has 14; even RunPod has 30+ data centers. Lambda is the smallest geographic footprint of any meaningful GPU cloud.

No serverless function tier

You run VMs on Lambda. You don't run functions. There's no def my_inference(): that auto-scales between 0 and 100. If your workload is bursty inference where you want to pay zero when idle, Modal and Replicate beat Lambda outright. RunPod has serverless too. Lambda doesn't, and from talking to their team, isn't building it.

Reserved Cloud above 8 GPUs is sales-led

Want 16 H100s reserved for a year? You're getting a Calendly link. Pricing isn't public for these commitments and depends on how desperate you sound. That's normal in this market, CoreWeave is sales-led at every tier, but it does mean Lambda's self-serve magic stops at the medium-team boundary.

No HIPAA, no FedRAMP, no GovCloud equivalent in 2026

If your workload touches PHI under HIPAA, regulated public-sector data under FedRAMP Moderate/High, or controlled unclassified information, Lambda is off the table. They've published no compliance roadmap. AWS HealthLake or Azure Confidential GPUs are the answers there, not Lambda.

Pricing reality

The published rates as of May 19, 2026:

GPU	VRAM	Lambda on-demand	Lambda reserved (1yr)	RunPod Secure	AWS effective
H100 SXM	80 GB	$2.99/hr	$1.85/hr	$2.99/hr	~$12.29/hr
H100 PCIe	80 GB	$2.49/hr	$1.59/hr	$2.49/hr	~$9.80/hr
H200 SXM	141 GB	$3.29/hr	$2.10/hr	$3.49/hr	~$14.50/hr
B200 SXM	192 GB	$3.79/hr	n/a	n/a	n/a (preview)
A100 80GB SXM	80 GB	$1.79/hr	$1.10/hr	$1.89/hr	~$5.12/hr
A6000	48 GB	$0.80/hr	$0.49/hr	$0.76/hr	n/a

AWS effective rate is calculated from p5.48xlarge at $98.32/hr divided by 8 GPUs. That's a 4.1x gap vs Lambda on-demand H100 SXM. Hyperscaler tax is real, and most of it is paying for things you already aren't using (multi-AZ failover, IAM granularity, FedRAMP overhead).

Benchmark matrix

All numbers are GAX-measured (May 2026). For training, higher is better. For latency, lower is better.

Workload	Lambda H100 SXM	RunPod H100 SXM	CoreWeave H100 SXM	AWS p5 H100 SXM
Llama 3.1 8B fine-tune (tok/s/GPU)	412	406	409	403
Llama 3.1 70B inference (tok/s, vLLM FP8)	1,892	1,840	1,876	1,801
Llama 3.1 405B training (tok/s/GPU, 8x node)	418	n/a	431	422
SDXL inference (img/s, batch 4)	3.41	3.28	3.35	3.22
NCCL all-reduce P50 (μs, 4-GPU)	78	89	72	81
SSH-ready latency (s)	52	92	117	374

The raw silicon performs identically across providers, same H100 SXM5 is the same H100 SXM5, that's NVIDIA's job. The variance comes from how each provider configures InfiniBand, NVLink topology, and the underlying hypervisor. Lambda runs bare metal on most SKUs; AWS adds a Nitro overhead that costs ~3% on most workloads.

Cost-to-performance ratio

The number that actually decides procurement: cost per million tokens generated. Calculated from above benchmark + on-demand pricing.

Provider	$/hr	Llama 70B tok/s	$/M tokens (on-demand)	vs Lambda
Lambda H100 SXM	$2.99	1,892	$0.439	n/a
RunPod Secure H100 SXM	$2.99	1,840	$0.451	+3%
CoreWeave H100 SXM (contract)	$2.40	1,876	$0.355	−19% (with year commit)
AWS p5 H100 SXM	$12.29	1,801	$1.895	+331%
Lambda Reserved 1-yr H100 SXM	$1.85	1,892	$0.271	−38%

The Reserved-Cloud rate beats every public on-demand price in the market. If you can commit a year, Lambda becomes the cheapest production inference platform in the on-shore US, period.

Hardware & software stack

GPU SKUs available on Lambda Cloud right now: H100 SXM, H100 PCIe, H200 SXM, B200 SXM (preview), A100 SXM 80GB, A100 PCIe 80GB, A6000, A40, A10, V100, GH200. Multi-GPU instances available in 1x, 2x, 4x, and 8x configurations. 1-Click Clusters extend to 64 H100s with NVLink + InfiniBand for tightly-coupled training.

Storage: Lambda Filesystem provides NVMe-backed persistent volumes that survive instance termination. 10 TB tier free with paid usage; pricing scales linearly. Throughput hits 4-6 GB/s read on standard tier.

Software: Lambda Stack is the headline image but you can BYO Docker if you want a different base. Lambda 1-Click Cluster ships SLURM pre-configured for multi-node jobs. K8s on Lambda is in beta and works for most non-GPU-pinned workloads; for GPU-pinned, you're better off with their managed SLURM.

Networking: Each H100 SXM node has 8x 400 Gbps InfiniBand NDR. Intra-cluster all-reduce is competitive with CoreWeave's offering. Public egress is $0.05/GB after the first 10 TB free.

Scenario simulation: what Lambda costs for your actual work

Generic prices mean nothing until you map them to your work. Three scenarios at representative monthly volumes.

Scenario A: Indie ML researcher iterating on fine-tunes

Workload: 4x H100 PCIe, 6 hours/day, 22 days/month.

Monthly cost: $2.49 × 4 × 6 × 22 = $1,315

What you get for that: ~3,168 GPU-hours, enough to fine-tune 5-8 Llama 3.1 8B variants on real datasets, plus light SDXL work. AWS equivalent: ~$5,400. Lambda is the rational choice here.

Scenario B: Series-A startup running production inference

Workload: 2x H100 SXM, 24/7, on Reserved 1-year contract.

Monthly cost: $1.85 × 2 × 24 × 30 = $2,664

For ~91M Llama 70B tokens/month at our measured throughput. $/M tokens: $0.029. That undercuts most managed inference API pricing for the same model. The catch: you have to run the inference layer (vLLM, TGI) yourself.

Scenario C: Mid-size lab doing pretraining sprints

Workload: 64x H100 SXM cluster, 14 days, then released.

Monthly cost: $2.99 × 64 × 24 × 14 = $64,329

Same compute on AWS Capacity Blocks for ML (reserved 14 days, p5.48xlarge × 8): ~$262,000. Lambda 1-Click Clusters made this provisioning workable in one afternoon, booking the AWS equivalent took us seven business days and three approval emails.

Use-case match matrix

If your workload looks like the left column, this is whether Lambda is the right call.

Workload	Lambda fit	Better alternative
Fine-tune Llama 70B on 8x H100 for 24 hours	✓ Strong	n/a
Train a foundation model for 30 days, 64 GPUs	✓ Strong (reserve up-front)	CoreWeave for >128 GPU year commits
Production inference, US-only users	✓ Strong	n/a
Production inference, global <100ms target	✗ Weak	AWS p5 multi-region or Replicate
Burst inference (idle most of the time)	✗ Weak (you pay 24/7)	Modal, Replicate, RunPod serverless
HIPAA / FedRAMP / GovCloud	✗ Blocked	AWS HealthLake, AWS GovCloud, Azure
Research iteration / Jupyter notebooks	✓ Best in class	n/a
Long-running batch training, <$10k/mo	✓ Strong	n/a
Indie SDXL / image gen API	~ OK	Replicate if you want a hosted API
Multimodal inference with custom CUDA kernels	✓ Strong (BYO Docker)	n/a

Stability & uptime history

Lambda publishes a status page at status.lambdalabs.com. We pulled the last 18 months of incidents and cross-referenced with our own monitoring.

Period	Measured uptime	Major incidents	Notes
Nov 2024 – Jan 2025	99.58%	1 multi-region outage (3h 12min, Dec 18)	Filesystem service degraded; compute unaffected for most
Feb 2025 – Apr 2025	99.91%	0 major	Three planned maintenance windows, all under-promise
May 2025 – Jul 2025	99.74%	1 SXM cluster outage (5h 40min, Jun 22)	Texas DC cooling failure; postmortem published 6 days later
Aug 2025 – Oct 2025	99.83%	1 network event (1h 8min, Sep 14)	Backbone provider issue, not Lambda's stack
Nov 2025 – Jan 2026	99.46%	2 capacity events	Not strictly downtime; H100 SXM exhausted for 6+ hour windows
Feb 2026 – Apr 2026	99.88%	0 major	Best quarter on record

Blended 18-month measured uptime: 99.73%. Lambda's published SLA is 99.5% on Reserved Cloud, so they're inside SLA every period above, but the December 2024 incident and June 2025 cooling event both warranted SLA credits. To Lambda's credit, both postmortems went up within a week with root cause and engineering response, which is more than most providers in this segment do.

Longitudinal pricing data

Lambda's pricing has moved in a way that tells you something about the GPU cloud market in general.

Date	H100 SXM	H100 PCIe	A100 80GB	Notes
May 2024	$3.29/hr	$2.69/hr	$1.99/hr	n/a
Nov 2024	$2.99/hr	$2.49/hr	$1.79/hr	−9% cut on H100 lineup
Feb 2025	$2.99/hr	$2.49/hr	$1.79/hr	H200 added at $3.49
Aug 2025	$2.99/hr	$2.49/hr	$1.79/hr	H200 cut to $3.29
Feb 2026	$2.99/hr	$2.49/hr	$1.79/hr	B200 preview added at $3.79
May 2026	$2.99/hr	$2.49/hr	$1.79/hr	Current

Two takeaways. One: H100 on-demand pricing has been flat for 18 months. That's unusual in a market that's supposed to be hyperscaling. It means the cost floor for H100s is around $2.50-3.00/hr on-demand and nobody can profitably go lower without changing the SKU mix. Two: B200 entering at $3.79/hr signals that Blackwell pricing is anchoring well above H100. Don't expect a price collapse soon.

Community sentiment

We pulled 6 months of mentions from Reddit (r/LocalLLaMA, r/MachineLearning, r/MLOps), Hacker News comment threads tagged GPU/ML, and X/Twitter posts that named Lambda. Manually classified each by sentiment (positive / neutral / negative) and theme. Sample size: 1,847 mentions.

Source	Positive	Negative	Top complaint	Top praise
r/LocalLLaMA (n=812)	71%	18%	H100 SXM capacity	Spin-up speed
Hacker News (n=394)	64%	22%	Limited regions	Pricing transparency
r/MachineLearning (n=287)	78%	11%	No serverless	Lambda Stack
X/Twitter (n=354)	69%	15%	Support response	Self-serve UX

Net sentiment: +52 (highly positive). The single most common praise across all four sources is "I had a GPU running in less than two minutes." The single most common complaint is "I tried to launch an H100 SXM and got Coming back soon." Those two threads basically define Lambda's experience.

Who should avoid this

Don't use Lambda if you fall into any of these categories. Saying so up-front saves you a refund request later.

Healthcare ML teams handling PHI under HIPAA. Lambda doesn't sign a BAA. Use AWS HealthLake, Azure Health Data Services, or Google Cloud Healthcare.
Public sector workloads under FedRAMP Moderate/High. No GovCloud equivalent. AWS GovCloud or Azure Government.
Global inference with sub-100ms P95 latency targets to APAC or EMEA. Lambda's five US regions are not enough. AWS p5 multi-region or Replicate's edge inference.
Bursty inference workloads that should pay zero when idle. Lambda bills per VM-hour; you pay for the box, not the work. Modal Labs and Replicate are the right call for true serverless GPU.
24/7 production with strict availability SLA on H100 SXM specifically. Lambda's capacity is not deterministic at the SXM tier. CoreWeave reserved contracts give you guaranteed allocation.
Enterprise procurement with $100k+/month spend wanting a TAM and dedicated solutions architect. Lambda's enterprise motion exists but is light by hyperscaler standards. CoreWeave or AWS enterprise.
Anyone whose buying committee includes Legal and Procurement first. Lambda's contract templates work but aren't infinitely customizable. AWS, Azure, GCP have done this dance ten thousand times.

Testing evidence

The benchmarks above came from our own logs. We're publishing the relevant excerpts and screenshots so you can replicate or argue with them.

FIG 1.0, Llama 3.1 70B inference on Lambda H100 SXM (vLLM 0.7.3 FP8)

[2026-04-22 14:08:11] vLLM 0.7.3 starting on lambda-h100-sxm-1..
[2026-04-22 14:08:14] Loading model meta-llama/Llama-3.1-70B-Instruct-FP8..
[2026-04-22 14:08:47] Model loaded. KV cache: 64 GB. Max context: 8192.
[2026-04-22 14:09:00] Benchmark start. Concurrency: 32. Input: 2048. Output: 512.
[2026-04-22 14:14:00] Tokens generated: 567,600. Wall time: 300.04s.
[2026-04-22 14:14:00] Throughput: 1,892.21 tok/s. P50 latency: 211ms. P95: 478ms.

FIG 1.1, Provisioning latency, 12 launches, Mar–Apr 2026

launch_id region gpu launch_ts ssh_ready_ts delta
L-001 us-tx-1 h100-sxm-1 2026-03-14 09:22:08 2026-03-14 09:22:58 50s
L-002 us-az-1 h100-pcie-1 2026-03-15 11:04:33 2026-03-15 11:05:21 48s
L-003 us-nc-1 h100-sxm-1 2026-03-17 08:18:09 Coming back soon,
L-004 us-tx-1 h100-pcie-1 2026-03-19 14:30:21 2026-03-19 14:31:18 57s
L-005 us-tx-1 h100-sxm-1 2026-03-22 02:11:08 2026-03-22 02:11:53 45s.. [7 more]
Average ssh_ready_delta (successful): 52s
Capacity miss rate (h100-sxm): 4/12 = 33%

You can request the full dataset at editorial@hardtechbrief.com. Full benchmark scripts (vLLM config, dataset preprocessing, NCCL test parameters) live on the methodology page.

ROI calculator

Plug your team's workload to see what Lambda costs you. Numbers update live.

GPU type H100 SXM ($2.99/hr) H100 PCIe ($2.49/hr) H200 SXM ($3.29/hr) B200 SXM ($3.79/hr) A100 SXM ($1.79/hr) A6000 ($0.80/hr)

GPU count

Hours per day

Days per month

ON-DEMAND

$4,306/mo

RESERVED 1-YR

$2,664/mo

YOU SAVE

$1,642/mo (38%)

Reserved rate shown assumes 1-year commitment on equivalent SKU. AWS comparison: at p5 effective rate, the same configuration would cost roughly 4.1x Lambda's on-demand price.

The verdict

If you're a researcher iterating on models, a series-A startup running production inference under $50k/month, or a lab booking a 14-day pretraining sprint, Lambda Labs is the right call in 2026. Provisioning speed, ML-clean images, transparent pricing, and a price/performance ratio nobody else hits on-demand make this the default GPU cloud for the indie-to-mid-stage tier.

The places it fails, regions, serverless, compliance, peak capacity on H100 SXM, are real, and we've named them above so you can route around them. For everyone else: stop reading reviews, swipe a card, launch an H100, and start the work. That's the whole point.

If Lambda doesn't fit, consider

For serverless inference

Modal Labs

Python-native, function-style billing, zero cost when idle. Best for bursty workloads.

Read Modal review →

For enterprise reserved

CoreWeave

Bigger fleet, longer contracts, white-glove enterprise motion. Best above $50k/month.

Read CoreWeave review →

For cheapest hourly

Vast.ai

Decentralized marketplace. Lowest hourly rates if you accept interrupt risk and uneven hosts.

Read Vast.ai review →

Lambda Labs is the right GPU cloud if you can swipe a card and start an H100 in under a minute.

The first product we've reviewed in three years that we'd actually buy ourselves.