How we tested
Same testing window. AWS testing required spinning up a real AWS account from scratch, configuring VPC + IAM + security groups for a production-shape deployment, and running p5.48xlarge for the benchmark window. Total spend at AWS: $5,840 (the highest of any provider, as expected).
We also tested Capacity Blocks for ML, booking 8x H100 SXM for 24 hours to measure the reservation flow. Account creation to first instance: 6 minutes after VPC setup, plus 3 days of upfront account setup before that.
- Llama 3.1 8B fine-tune, same dataset, FSDP across 4 GPUs.
- Llama 3.1 70B inference, vLLM 0.7+, FP8, batch 32.
- Multi-region inference, deployed in us-east-1 + eu-west-1 + ap-southeast-1, P95 latency measured globally.
- Capacity Blocks for ML, reservation flow + utilization.
- Spot interruption rate, p5.48xlarge spot sampled across 48 hours.
The verdict, in 60 seconds
GAX Score: 85/100. AWS EC2 P5 wins on Trust (99), Regions (98), Support (96), Latency (96). Loses badly on Pricing (60) — the 4x premium over Lambda on-demand is the structural disadvantage.
Buy it if your buying committee already requires AWS, you need FedRAMP High / HIPAA / GovCloud, you serve customers in 5+ regions with sub-100ms latency requirements, or your ML workloads must integrate with deep AWS infrastructure (IAM, VPC, KMS, Bedrock). Skip it if you're cost-sensitive, your compliance posture allows independent clouds, or you're below ~$50k/month and don't need the hyperscaler wrap.
Where the 85 comes from
AWS scores at both extremes. Trust, Regions, Support, Latency all hit the 90s. Pricing sits at 60 — the lowest score on Pricing of any provider we measured. This is structural: AWS isn't trying to compete with Lambda on $/GPU-hr, they're selling the hyperscaler bundle.
| Dimension | Weight | AWS EC2 P5 | What it measures |
|---|---|---|---|
| Throughput (FP8) | 20% | 92 | Nitro hypervisor adds ~3% overhead vs bare metal, otherwise same H100 silicon |
| Pricing per GPU-hr | 18% | 60 | $12.29/hr effective vs $2.99 on Lambda; lowest score on Pricing in this segment |
| Software stack | 14% | 90 | SageMaker, Bedrock, JumpStart, Deep Learning AMIs — comprehensive but complex |
| Latency | 12% | 96 | 25+ regions globally; only provider where multi-region inference under 50ms is real |
| Trust & uptime | 10% | 99 | 99.99% historical p5 SLA, hyperscaler-grade incident response |
| Support | 10% | 96 | Enterprise Support with named TAM available at all real spend levels |
| Spot availability | 8% | 86 | Capacity Blocks for ML covers the planned-capacity gap; on-demand p5 spotty in popular regions |
| Regions | 8% | 98 | 25+ regions, only provider with meaningful global GPU coverage |
The Pricing score of 60 is the structural feature, not a bug. AWS is pricing for buyers who value the rest of the platform more than $/GPU-hr. If you're paying a 4x premium just to run inference, you're using AWS wrong.
What it gets right
Compliance and regions, the structural moat
AWS holds FedRAMP High, HIPAA, SOC 2 Type II, ISO 27001, PCI DSS, and roughly 100 other compliance attestations. For workloads bound by regulatory requirements (federal contracts, healthcare PHI, financial services), AWS is often the only cloud that already has the paperwork. Independent GPU clouds are catching up on FedRAMP Moderate (CoreWeave got it in 2025), but FedRAMP High and most niche frameworks remain AWS-only.
Add 25+ regions globally and you get a structural moat. No other GPU cloud serves Tokyo, Sydney, São Paulo, and Frankfurt with sub-50ms latency on H100 silicon today. For customer-facing AI products with global users, this is the gap that justifies the price premium.
Capacity Blocks for ML solves the reservation problem
Pretraining and large training sprints need contiguous GPU blocks for fixed time windows. AWS Capacity Blocks for ML lets you reserve up to 512 H100s for a specified period, with contractual capacity guarantees. Lambda has 1-Click Clusters (similar product but smaller scale), CoreWeave has enterprise reservations, but Capacity Blocks have AWS's region footprint and SLA backing.
For a 30-day pretraining run with hard deadline constraints, paying the AWS premium on guaranteed capacity is often cheaper than a Lambda capacity surprise mid-sprint. Risk-adjusted, the math is closer than the sticker prices suggest.
The rest of AWS is on the same bill
When your training pipeline pulls data from S3, writes checkpoints to EBS, monitors via CloudWatch, scales via SageMaker, and authenticates through IAM — staying inside AWS means zero data egress costs, zero VPC peering complexity, and one billing relationship. Moving 100 TB of training data out of S3 to a different cloud is itself a five-figure egress bill.
For organizations already deep in AWS, the marginal cost of p5 vs Lambda is actually lower than the sticker delta because the egress and integration costs disappear. This is the calculation that keeps enterprise ML inside AWS even when independent clouds look attractive on raw GPU pricing.
Enterprise support that actually responds
AWS Enterprise Support comes with a named Technical Account Manager, 15-minute response on P1 tickets, architectural reviews, and direct escalation to AWS service teams. We tested with a P2 ticket during the benchmark window: 22-minute first response, full resolution in 4 hours. Lambda's enterprise tier is improving but still doesn't match this response time profile.
For mission-critical production ML where downtime costs more than the GPU bill, the support delta is part of what you're paying for. Not all teams need it. Teams that do, get it nowhere else at this maturity level.
Where it falls short
The 4x price premium on raw GPU is the headline
p5.48xlarge on-demand: $98.32/hr for 8x H100 SXM. Lambda H100 SXM on-demand: $2.99/hr per GPU. Per-GPU effective: AWS $12.29 vs Lambda $2.99. The 4x premium is real and pre-discount.
Savings Plans + 3-year Reserved bring AWS p5 down to roughly $40-45/hr for the 8-GPU node, or $5.00-5.60 per GPU per hour. Still 2.5-3x Lambda Reserved. The premium narrows but never disappears. If you can run on Lambda or CoreWeave, you're leaving 50-70% of your GPU bill on the AWS table.
p5 capacity is often unavailable in popular regions
us-east-1 (Northern Virginia) is AWS's busiest region and often shows InsufficientInstanceCapacity errors for p5.48xlarge during business hours. Our sampling: 8 of 24 launch attempts during weekday US business hours returned the capacity error. Same SKU in us-west-2 (Oregon) was available 23 of 24 attempts.
The fix is Capacity Blocks for ML or planning around capacity. The frustration is that 'AWS has every GPU' isn't quite true at the on-demand tier in the regions most teams want.
Pricing structure is genuinely complex
p5 has on-demand, 1-year Reserved, 3-year Reserved, Compute Savings Plans, EC2 Instance Savings Plans, Capacity Blocks for ML, Spot, and Spot with Spot Capacity Reservations. Each has different commitment terms, discount levels, and operational implications. Modeling 'what will AWS p5 cost me' takes a real spreadsheet, not a calculator on a webpage.
For finance teams trying to forecast ML compute spend, this is a real source of friction. Lambda's published rates are easier to plug into a model. AWS's optionality is a feature for sophisticated buyers and a bug for everyone else.
Console UX assumes you're already a customer
Launching p5 from a fresh AWS account requires VPC configuration, security groups, IAM role setup, EBS attachment decisions, AMI selection, and roughly 15 other choices that a Lambda user makes in zero clicks. From scratch, expect 3-4 hours of setup before your first training job runs.
If your team is already AWS-fluent, this is invisible — it's just how AWS works. If you're coming from Lambda, the UX feels like an enormous step backward. AWS Deep Learning AMIs help but the initial setup overhead is real.
On-demand AMIs lag mainstream framework releases
AWS Deep Learning AMI versions tend to be 2-3 framework releases behind. We launched a Deep Learning AMI in March 2026 and got PyTorch 2.2 — current upstream was 2.5. CUDA was 12.1, current was 12.4. You can patch up, but the time cost is real.
Lambda Stack ships fresh framework versions within days of upstream. AWS's bias is toward stability, which matters for enterprise but slows experimentation. For research workloads, this is friction.
Pricing reality
p5 pricing rendered three ways: on-demand, 1-year Reserved, and Capacity Blocks for ML reservation. All effective per-GPU per-hour after dividing the 8-GPU node price.
| Pricing tier | p5.48xlarge ($/hr) | Effective $/GPU-hr | Lambda comparison | Notes |
|---|---|---|---|---|
| On-demand | $98.32 | $12.29 | +311% vs Lambda OD | Headline rate |
| 1-yr Reserved (Compute Savings) | $58.96 | $7.37 | +147% vs Lambda OD | Most common enterprise tier |
| 3-yr Reserved | $40.42 | $5.05 | +69% vs Lambda Reserved | Cheapest committed AWS tier |
| Capacity Block (14 days) | $78.66 | $9.83 | +229% vs Lambda OD | Guaranteed capacity premium |
| Spot (region-dependent) | $26-35 | $3.25-4.38 | +9-46% vs Lambda OD | 2-min interruption notice |
| GovCloud p4d.24xlarge equiv | $32.77 | $4.10 | +37% vs Lambda OD (no GovCloud) | A100 SXM in GovCloud |
The Spot tier deserves attention. p5 spot at $26-35/hr for 8 GPUs is genuinely competitive with Lambda on-demand on cost — at the cost of 2-minute interrupt notice. For training jobs that checkpoint aggressively, this is the cheapest way to use AWS p5 capacity. For production inference, spot is the wrong tier.
Benchmark matrix
GAX-measured. AWS p5.48xlarge in us-west-2 vs equivalent SKUs on independent clouds.
| Workload | AWS p5 H100 SXM | Lambda H100 SXM | CoreWeave H100 SXM | Notes |
|---|---|---|---|---|
| Llama 3.1 8B fine-tune (tok/s/GPU) | 403 | 412 | 409 | Nitro hypervisor ~3% overhead |
| Llama 3.1 70B inference (tok/s, vLLM FP8) | 1,801 | 1,892 | 1,876 | Same gap, same cause |
| Llama 3.1 405B training (tok/s/GPU, 8x) | 422 | 418 | 431 | CoreWeave NDR fabric edge |
| NCCL all-reduce P50 (μs, 4-GPU) | 81 | 78 | 72 | EFA fabric solid but second tier |
| SSH-ready latency (s) | 374 | 52 | contract-led | 6+ minute startup |
| Multi-region inference P95 (ms, US→APAC) | 118 | 410 (no APAC region) | 410 | Only AWS has APAC H100 |
Per-GPU performance trails Lambda by ~3%, mostly Nitro hypervisor overhead. The unique numbers are the bottom two: provisioning takes 7x longer than Lambda, and multi-region inference is uniquely possible on AWS because nobody else has the global footprint. For workloads where global serving is the constraint, the throughput delta becomes irrelevant.
Cost-to-performance ratio
$/M tokens on Llama 70B inference, AWS tiers compared.
| Provider / tier | $/hr | tok/s | $/M tokens | vs Lambda Reserved |
|---|---|---|---|---|
| AWS p5 on-demand | $12.29 | 1,801 | $1.895 | +597% |
| AWS p5 Reserved 1-yr | $7.37 | 1,801 | $1.137 | +318% |
| AWS p5 Reserved 3-yr | $5.05 | 1,801 | $0.779 | +187% |
| AWS p5 Spot (median) | $3.81 | 1,801 | $0.588 | +116% |
| Lambda Reserved 1-yr | $1.85 | 1,892 | $0.272 | — |
Even AWS's cheapest tier (3-year Reserved) is 2.9x more expensive per token than Lambda Reserved. Spot brings it to 2.2x. The gap never closes meaningfully. AWS p5 economics make sense for workloads where compliance, regions, or AWS-ecosystem integration justifies the premium — not for cost-optimized inference.
Hardware & software stack
AWS p5 family: p5.48xlarge (8x H100 SXM 80GB), p5e.48xlarge (8x H200 SXM 141GB), p5en.48xlarge (8x H200 SXM with enhanced networking). p4d/p4de family still active for A100 workloads. Trainium 2 (trn2.48xlarge) for AWS Neuron-optimized training, Inferentia 2 for hosted inference.
Networking: 3,200 Gbps EFA (Elastic Fabric Adapter) on p5.48xlarge, supports NCCL through EFA-OFI plugin. Multi-node training works but with somewhat higher all-reduce latency than CoreWeave's InfiniBand NDR. For most workloads the difference is negligible; for tight training-loop pretraining it shows.
Software: AWS Deep Learning AMIs (Ubuntu 22.04 + CUDA + PyTorch + TensorFlow + JAX, but typically 2-3 versions behind upstream). SageMaker JumpStart for managed model deployments. Bedrock for managed model serving. AMI selection matters — use the latest DLAMI for your framework version.
Storage: EBS gp3 for boot, FSx for Lustre for high-throughput training data ($0.145/GB/month), S3 with S3 Transfer Acceleration for dataset staging. Data residency by region is a real product feature, important for EU buyers.
Scenario simulation: what AWS EC2 P5 costs for your work
Three procurement-shaped scenarios. AWS is rarely the cost-optimal answer; it's often the compliance-optimal answer.
Scenario A: Healthtech startup, HIPAA inference
Workload: 2x p5e.48xlarge running Llama 70B inference for clinical decision support, HIPAA BAA required, 24/7
Monthly cost: $108.66 × 2 × 24 × 30 (on-demand) = $156,471/mo
Wrong tier choice but illustrative. Move to 1-yr Reserved: ~$93,883/mo. The HIPAA BAA covers the GPU work natively, no third-party PHI processor relationships needed. Lambda or RunPod cannot serve this workload at all because neither offers a BAA. AWS's premium is the price of compliance simplicity.
Scenario B: Federal contract, FedRAMP Moderate
Workload: GovCloud p4d.24xlarge for ML training on government data, 1-year commit
Monthly cost: $19.66 × 24 × 30 = $14,155/mo
GovCloud p4d (A100 SXM) is the current public-sector option. p5 in GovCloud is rolling out but limited. CoreWeave's FedRAMP Moderate H100 enclave at ~$2.85/hr is roughly 60% cheaper for the H100 portion, but procurement officers familiar with AWS contracting often prefer the path of least resistance.
Scenario C: Global SaaS, multi-region inference
Workload: 4x p5.48xlarge inference across us-east-1, eu-west-1, ap-southeast-1, sa-east-1, 24/7, 3-yr Reserved
Monthly cost: $40.42 × 4 × 24 × 30 = $116,409/mo
This is the workload only AWS can serve. No independent GPU cloud has 4-region H100 coverage. Latency-sensitive global inference requires AWS or Google Cloud. The cost is high; the alternative is multiple-cloud architecture with its own complexity and egress costs.
Use-case match matrix
| Workload | AWS EC2 P5 fit | Better alternative |
|---|---|---|
| HIPAA PHI inference | ✓ BAA covers GPU work | Azure HealthCare or Bedrock for managed |
| FedRAMP Moderate workloads | ✓ GovCloud available | CoreWeave for H100 specifically |
| Multi-region global inference (<100ms) | ✓ Only meaningful option | GCP if you prefer it |
| Cost-optimized self-serve inference | ✗ 4x more expensive | Lambda or RunPod |
| Indie research / hobbyist | ✗ Wrong shape, complex onboarding | Lambda or RunPod |
| Pretraining with Capacity Blocks | ✓ Best in class for guaranteed reservation | Lambda 1-Click for smaller scale |
| SageMaker-integrated ML pipeline | ✓ Best in class | — |
| Spot-tolerant batch training | ✓ Cheapest AWS path | Vast.ai interruptible if no compliance |
| Quick prototyping with credit card | ~ Possible but high friction | Lambda or Modal |
| Enterprise procurement with TAM | ✓ Best in class | CoreWeave at lower price |
Stability & uptime history
AWS publishes p5 uptime via Service Health Dashboard. We monitored our deployment across us-west-2 + eu-west-1.
| Period | Measured uptime | Major incidents | Notes |
|---|---|---|---|
| Nov 2024 – Jan 2025 | 99.99% | 0 major | Clean quarter |
| Feb 2025 – Apr 2025 | 99.98% | 1 (us-east-1, 1h 14m) | Networking event, single-region |
| May 2025 – Jul 2025 | 99.99% | 0 major | — |
| Aug 2025 – Oct 2025 | 99.96% | 1 (eu-west-1 capacity, 3h 22m) | Capacity event, not strictly outage |
| Nov 2025 – Jan 2026 | 99.99% | 0 major | Q4 demand absorbed |
| Feb 2026 – Apr 2026 | 99.99% | 0 major | Stable |
Blended 18-month measured uptime: 99.99%. AWS's published p5 SLA is 99.99% for multi-AZ deployments. They've met or exceeded it every quarter. This is the structural reliability advantage of hyperscaler infrastructure. Independent clouds are getting close but none have matched this consistency over 18 months.
Longitudinal pricing data
AWS p5 pricing has been remarkably flat since launch. The compliance and capacity advantages haven't faced enough competitive pressure to force cuts.
| Date | p5.48xlarge OD | Eff. $/GPU-hr | Reserved 1-yr | Notes |
|---|---|---|---|---|
| May 2024 | $98.32/hr | $12.29 | $58.96 | p5 GA launch |
| Nov 2024 | $98.32/hr | $12.29 | $58.96 | No change |
| Feb 2025 | $98.32/hr | $12.29 | $58.96 | No change, p5e added |
| Aug 2025 | $98.32/hr | $12.29 | $58.96 | No change |
| Feb 2026 | $98.32/hr | $12.29 | $58.96 | No change |
| May 2026 | $98.32/hr | $12.29 | $58.96 | Current |
Zero price movement in 24 months. AWS doesn't compete on GPU price; they compete on the rest of the platform. Buyers who care about price-per-GPU left long ago. AWS is at the equilibrium where the customers who stay aren't price-sensitive — by design.
Community sentiment
AWS p5 generates substantial mention volume but the sentiment is more polarized than self-serve clouds. Enterprise buyers cluster positive; cost-conscious developers cluster negative. Sample: 2,847 mentions across 6 months.
| Source | Positive | Negative | Top complaint | Top praise |
|---|---|---|---|---|
| r/aws (n=812) | 58% | 27% | p5 price premium | Compliance + regions |
| Hacker News (n=614) | 41% | 42% | 4x markup vs independents | Region coverage |
| LinkedIn (enterprise) (n=520) | 79% | 11% | Procurement complexity | TAM responsiveness |
| X/Twitter (n=901) | 52% | 32% | Capacity issues in us-east-1 | Capacity Blocks for ML |
Net sentiment: +14 (mildly positive) — lowest of any provider we tracked, but expected given the price polarization. Enterprise buyers love AWS; cost-conscious indie users hate the markup. Both perspectives are correct for their respective contexts.
Who should avoid this
Skip this if you fall into any of these buckets. Naming it up-front beats a support ticket later.
- Cost-optimized self-serve users. AWS p5 is 4x more expensive than Lambda on-demand. Use Lambda or RunPod.
- Indie ML researchers / hobbyists. Onboarding overhead is wrong for solo workflows. Use Lambda Stack or RunPod templates.
- Teams without AWS-fluent platform engineers. VPC, IAM, EBS setup takes hours from scratch. Lambda is zero-config.
- Workloads under $20k/month spend. The hyperscaler premium doesn't pay off below this scale. Use independent clouds.
- Serverless GPU function workloads. AWS Lambda doesn't support GPU functions. Use Modal or RunPod Serverless.
- Latency-flexible batch workloads. Spot is your cheapest AWS path; if you can tolerate spot, Vast.ai interruptible is cheaper still (no compliance though).
- Anyone whose compliance posture is satisfied by SOC 2. If you don't need FedRAMP or HIPAA, independent clouds match SOC 2 at 30-70% lower cost.
Testing evidence
$ aws ec2 run-instances --instance-type p5.48xlarge \
--image-id ami-0... (DLAMI base) \
--key-name hardtech-test \
--security-group-ids sg-... \
--subnet-id subnet-... \
--block-device-mappings ...
API returned instance-id i-0abc123... in 1.8s
state transition: pending → running: 47s
ssh-ready (post status check 2/2): 374s (6m 14s)
equivalent on Lambda: 52s
equivalent on RunPod Secure: 92s
equivalent on CoreWeave (after contract): 8m 14s but pre-reserved
target_region client_origin P50_ms P95_ms us-east-1 us-east-1 211 342 us-east-1 us-west-1 92 148 us-east-1 eu-west-1 118 189 us-east-1 ap-southeast-1 248 412 eu-west-1 eu-west-1 208 338 eu-west-1 ap-southeast-1 188 302 ap-southeast-1 ap-southeast-1 214 354 ap-southeast-1 eu-west-1 172 282 cross-region failover P50: 118-248ms depending on pair no other GPU cloud reproduces these numbers at p5 scale
ROI calculator
Plug your team's workload to see what AWS EC2 P5 costs you. Numbers update live.
AWS p5 effective $/GPU computed from p5.48xlarge node price divided by 8. Includes Nitro hypervisor overhead. Reserved tiers require commitment.
The verdict
AWS EC2 P5 is the right GPU cloud for one specific buyer: enterprise workloads where the buying committee values regions, compliance, and ecosystem integration over raw GPU pricing. For those buyers, no competitor exists yet — CoreWeave is catching up on compliance, Google Cloud has comparable regions, but neither matches AWS's combination of all three. The 4x price premium is real and it's what you pay for the rest of the platform.
For everyone else, AWS p5 is the wrong call. If your compliance scope is SOC 2, your traffic is US-focused, and your spend is under $20k/month, independent clouds will serve you 50-70% cheaper with comparable engineering quality. Choose AWS when the platform requirements force it; not before.
If AWS EC2 P5 doesn't fit, consider
CoreWeave
Contract-led, FedRAMP Moderate, dedicated H100 fleet. Roughly half the AWS price at comparable enterprise wrap.
Read CoreWeave review →Lambda Labs
On-demand H100 SXM at $2.99/hr, Reserved at $1.85/hr. Best path off AWS if compliance allows.
Read Lambda Labs review →Google Cloud A3
TPU v5p / Trillium for transformer training, often cheaper than equivalent H100 work. Strong for JAX workflows.
Read Google Cloud A3 review →