How we tested
Same testing window. GCP testing covered A3 Mega (H100), A3 Ultra (H200), and Trillium TPU v6. We benchmarked equivalent training workloads on GPU vs TPU to surface the economics question. Total spend at GCP: $4,820.
We tested Vertex AI's training pipeline and deployment endpoints alongside raw A3 instances to capture the full GCP ML experience.
- Llama 3.1 8B fine-tune, FSDP on 4x H100 (A3 Mega) and equivalent on TPU v5p.
- Llama 3.1 70B inference, vLLM 0.7+ FP8 on A3 Mega, JAX-based serving on Trillium.
- Foundation-model training proxy, 7B model training run for 24 hours on H200 vs Trillium.
- Vertex AI deployment flow, training job to endpoint via Vertex SDK.
- Multi-region inference, deployed in us-central1 + europe-west4 + asia-southeast1.
The verdict, in 60 seconds
GAX Score: 84/100. Google Cloud A3 wins on Trillium TPU economics, Regions (90), Trust (96), and Vertex AI integration. Loses on raw GPU pricing — A3 tracks AWS, both 3-4x more expensive than Lambda on H100 SXM.
Buy it if you're committed to JAX or TPUs, you're already on Google Cloud for data, you need Vertex AI MLOps, or your workload genuinely maps better to TPU. Skip it if you're H100-focused, cost-sensitive, or PyTorch-first without GCP ecosystem dependencies. Trillium economics are real for the right workloads; GPU economics aren't.
Where the 84 comes from
GCP's profile mirrors AWS in structure: high on Regions, Trust, Latency; mid on Pricing because of the hyperscaler premium. Unique upside: TPU pricing reshapes the cost picture for transformer training workloads.
| Dimension | Weight | Google Cloud A3 | What it measures |
|---|---|---|---|
| Throughput (FP8/BF16) | 20% | 94 | H100/H200 standard; Trillium TPU v6 beats H200 on transformer training |
| Pricing per GPU-hr | 18% | 70 | A3 GPU pricing tracks AWS; TPU pricing 30-40% cheaper for fitting workloads |
| Software stack | 14% | 88 | Vertex AI strong, JAX-on-TPU best in class, PyTorch/XLA second-tier |
| Latency | 12% | 92 | 35+ regions, near-AWS coverage; TPU multi-pod latency competitive |
| Trust & uptime | 10% | 96 | 99.99% Compute Engine SLA, hyperscaler-grade |
| Support | 10% | 88 | Enterprise tier solid; standard tier behind AWS Enterprise Support |
| Spot availability | 8% | 90 | Spot Compute Engine spot pricing 60-80% off; preemption real |
| Regions | 8% | 90 | 35+ regions, second to AWS but ahead of every independent cloud |
The whole story for GCP is: TPU is unique, regions are wide, GPU pricing follows the hyperscaler floor. Either the TPU story matters to you or you're paying AWS-tier prices without the AWS-tier compliance breadth.
What it gets right
Trillium TPU v6 actually changes the spreadsheet
For dense transformer training that fits TPU's compute pattern, Trillium delivers genuine cost savings. We trained a 7B model for 24 hours on 8x Trillium TPU v6 chips vs 8x H100 SXM on A3 Mega. Throughput per dollar: Trillium 18% better. For a 30-day pretraining sprint that's $25-40k of saved compute.
The catch is the workload has to map. Standard decoder-only transformer pretraining maps well. Workloads with sparse activation patterns, custom CUDA kernels, or non-standard collectives don't. Validate your specific architecture on a 24-hour test run before committing to a TPU-centric training plan.
Vertex AI is the strongest hyperscaler MLOps offering
Vertex AI handles training pipelines (Vertex Pipelines), model versioning (Model Registry), deployment endpoints, monitoring, feature stores, and MLOps governance in a single integrated product. For teams that want one bill for ML lifecycle plus compute, Vertex AI is the cleanest hyperscaler offering.
AWS SageMaker is comparable functionally but has more product-line complexity. Azure ML is solid but less integrated. Vertex AI's value shows up most when training, serving, and monitoring are owned by the same team — the integration savings compound across the team's MLOps work.
35+ regions is the second-widest GPU footprint anywhere
Google Cloud runs A3 GPUs across 35+ regions globally. Tokyo, Sydney, São Paulo, Frankfurt, Mumbai all have meaningful H100 capacity. For multi-region inference apps, GCP and AWS are the only two clouds with real coverage. Lambda has 5 US regions. RunPod has 30+ host network but inconsistent enterprise-grade availability.
For SaaS products serving global users with strict latency SLAs, GCP and AWS are the choice. Among them, GCP is often slightly cheaper for compute and significantly cheaper for data egress on Premium Tier networking.
Sustainability claims that hold up
Google's data centers run on carbon-free energy commitments across most regions, with public reporting on per-region carbon intensity. For organizations with explicit sustainability targets (CDP reporting, SBTi commitments, ESG-driven procurement), GCP's environmental story is genuinely better-documented than competitors.
This matters more than it used to. Procurement teams at Fortune 500 companies increasingly weight carbon footprint in vendor selection. GCP's transparency here is a real procurement advantage in some industries.
Where it falls short
A3 GPU pricing tracks AWS, both 3-4x Lambda
A3 Mega (8x H100) on-demand: $88/hr, effective $11/GPU. A3 Ultra (8x H200): $98.50/hr. These rates are AWS-comparable. For workloads that don't benefit from TPU economics, GCP A3 has the same hyperscaler-premium problem as AWS — without AWS's wider compliance footprint.
Committed Use Discounts (1-yr, 3-yr) bring A3 down to roughly $6-7/GPU-hr effective. Still 3-4x Lambda Reserved. For cost-optimized GPU inference, GCP doesn't compete on H100/H200 pricing.
PyTorch on TPU has rough edges
PyTorch/XLA works for standard transformer architectures and improves every release. For workloads with custom CUDA kernels, NCCL-specific optimizations, or sparse activations, PyTorch/XLA introduces friction — XLA compilation is slow on first call, debugging hooks differ from CUDA, and some PyTorch operators don't have efficient TPU implementations.
JAX-on-TPU avoids most of this, but only if you can rewrite your training stack in JAX. For teams committed to PyTorch, TPU's claimed economics often don't translate after accounting for migration engineering cost.
Pricing combinatorics are genuinely complex
GCP A3 has on-demand, Spot, Sustained Use Discounts (automatic), Committed Use Discounts (1-yr and 3-yr), DGX Cloud reservations, plus the TPU pricing matrix on top. Modeling 'GCP compute spend' requires understanding which discounts apply to which SKU in which region. The Google Cloud pricing calculator helps but it's still a real spreadsheet exercise.
For finance teams used to Lambda's published rates, GCP feels like additional accounting overhead. The savings exist but you have to work to find them.
TPU performance claims need workload-specific validation
Google publishes Trillium benchmarks showing impressive numbers vs H100. The benchmarks are real for the specific workloads Google measures. Your workload might not match. Custom layer implementations, dataset I/O patterns, gradient accumulation strategies — all affect TPU vs GPU economics in ways the marketing materials don't show.
The right way to evaluate: get a 24-hour Trillium quota, train your actual model architecture for a few epochs, compare cost-per-token-trained against an equivalent A3 run. If Trillium wins by 15%+ after the migration cost, the math works. If it's within 10%, the migration cost probably eats the savings.
Customer support tier matters in ways AWS standardizes
GCP's standard support is functional but slow — 4-8 hour first response on P2 tickets, no named TAM under enterprise spend. Enterprise support is comparable to AWS Enterprise. The gap between standard and enterprise tiers is wider on GCP than on AWS.
For mission-critical production work, plan for Enterprise support. For experimentation and dev work, the standard tier works but expect to lean on documentation and community more than direct support engagement.
Pricing reality
GCP A3 pricing across on-demand and committed tiers, plus TPU comparison for the workload it actually competes on.
| SKU | On-demand | 1-yr CUD | Effective $/GPU-hr (on-demand) | vs Lambda Reserved |
|---|---|---|---|---|
| A3 Mega (8x H100 SXM) | $88/hr | $57/hr | $11.00/hr | +495% |
| A3 Ultra (8x H200 SXM) | $98.50/hr | $64/hr | $12.31/hr | +571% |
| TPU v5p (per chip) | $4.20/hr | $2.73/hr | n/a (TPU not GPU) | competes on $/training-flop |
| Trillium TPU v6 (per chip) | $4.50/hr | $2.93/hr | n/a | competitive with H200 on TPU-fitting workloads |
| Spot A3 Mega | ~$28-35/hr | n/a | $3.50-4.38/hr | preemption real |
| DGX Cloud (8x H100 on GCP) | $48k/mo | custom | ≈ $8.33/GPU-hr | reservation product |
The TPU rows are the unique value proposition. Trillium TPU v6 at $4.50/chip-hour with comparable transformer training throughput to H200 makes the cost-per-flop math meaningfully better than any GPU equivalent. For workloads that map to TPU, this is the structural reason to be on GCP.
Benchmark matrix
GAX-measured. GCP A3 GPU compared to AWS and Lambda; Trillium TPU v6 compared to H200 on equivalent transformer training.
| Workload | GCP A3 Mega H100 | AWS p5 H100 | Lambda H100 | GCP Trillium TPU v6 |
|---|---|---|---|---|
| Llama 3.1 70B inference (tok/s) | 1,841 | 1,801 | 1,892 | not directly comparable |
| Llama 3.1 8B fine-tune (tok/s/GPU) | 406 | 403 | 412 | TPU equiv ~422 (workload-mapped) |
| 7B model training (tok/s/chip) | 418 | 412 | n/a | TPU: 412 (similar throughput, lower $) |
| NCCL all-reduce P50 (μs, 16-GPU) | 79 | 81 | 78 | TPU pod fabric ~64 μs |
| Multi-region inference P95 (US→APAC) | 108 | 118 | 410 (no APAC) | TPU regions narrower |
| Cost-per-training-token (7B model) | equiv | equiv | cheaper at Reserved | ~18% cheaper |
GPU performance on GCP A3 is within 1-2% of AWS p5 (same H100 silicon, similar hypervisor overhead). Multi-region latency edges GCP ahead of AWS on US→APAC routes due to network architecture. The Trillium row is where GCP's unique value shows: ~18% cheaper per training token for fitting workloads.
Cost-to-performance ratio
Cost per million Llama 70B tokens, GCP tiers compared.
| Provider / tier | Effective $/hr | tok/s | $/M tokens | vs Lambda Reserved |
|---|---|---|---|---|
| GCP A3 Mega on-demand | $11.00 | 1,841 | $1.660 | +510% |
| GCP A3 Mega 1-yr CUD | $7.13 | 1,841 | $1.076 | +296% |
| GCP A3 Mega 3-yr CUD | $4.40 | 1,841 | $0.664 | +144% |
| GCP A3 Mega Spot | $3.94 | 1,841 | $0.594 | +118% |
| Lambda Reserved 1-yr | $1.85 | 1,892 | $0.272 | — |
For inference economics specifically, GCP A3 is structurally expensive vs Lambda Reserved. The 3-yr CUD path narrows the gap to 2.4x; Spot brings it to 2.2x. Neither closes meaningfully. Use GCP for inference when ecosystem integration (Vertex, GCS, BigQuery) matters more than per-token cost; use Lambda when it doesn't.
Hardware & software stack
GCP A3 family: A3 Mega (8x H100 SXM 80GB), A3 Ultra (8x H200 SXM 141GB), A3 Edge (single-GPU H100 variants). Multi-node training supported via GPUDirect RDMA. A4 family with B200 announced for late 2026.
TPU family: TPU v5e (inference-optimized), TPU v5p (training-optimized), Trillium TPU v6 (latest generation, GA in early 2026). TPU pods scale to 4,096+ chips with optical interconnect for the largest training runs.
Software: Vertex AI (managed ML platform), Deep Learning VM images (Ubuntu + CUDA + PyTorch + TensorFlow + JAX), JAX-on-TPU SDK, PyTorch/XLA, Google's own JAX-based training libraries (T5X, MaxText, AXLearn). Vertex AI Workbench for managed notebooks. GKE for Kubernetes orchestration of GPU workloads.
Storage: GCS for object storage (often free egress to GCS-resident compute), Filestore for NFS-style mounts, Cloud Storage for Tensor Storage Service (specialized for ML checkpoints with throughput optimization). Premium Tier networking gives global anycast routing for inference endpoints.
Scenario simulation: what Google Cloud A3 costs for your work
Three scenarios that surface where GCP wins, loses, or ties on real-world economics.
Scenario A: Foundation model training on Trillium
Workload: 256-chip TPU v6 pod, 30 days of pretraining
Monthly cost: $2.93 × 256 × 24 × 30 (CUD 1-yr) = $540,317/mo
Same training run on 64x H100 SXM AWS p5 1-yr Reserved: ~$904,000/mo equivalent compute. Trillium delivers ~18% better cost-per-token for this fitting workload. For foundation model training, this is the real economic argument for GCP.
Scenario B: Series-A production inference, multi-region
Workload: 4x A3 Mega (32 H100) across us-central + europe-west + asia-southeast, 24/7
Monthly cost: $57/hr × 4 × 24 × 30 (1-yr CUD) = $164,160/mo
Multi-region requirement forces hyperscaler. AWS equivalent: ~$170,000/mo. GCP slightly cheaper, comparable region coverage. Lambda would be $32,000/mo but only US — wrong shape for global SaaS.
Scenario C: Hybrid Vertex AI + Gemini for AI app
Workload: Custom model on A3 Edge + Gemini API for fallback, ~50M tokens/mo
Monthly cost: ≈ $4,200/mo A3 + $1,800/mo Gemini API = $6,000/mo
Where Vertex AI's integration pays off. Single bill, GCS data residency, IAM for both custom model and Gemini access. Splitting across providers (Lambda + Together) saves ~$3,500/mo but adds operational overhead. Trade is worth it above $20k/mo total.
Use-case match matrix
| Workload | Google Cloud A3 fit | Better alternative |
|---|---|---|
| Foundation model training with TPU-mapping workload | ✓ Best in class | — |
| JAX-first ML team | ✓ Best in class | — |
| Multi-region inference, global SaaS | ✓ Tied with AWS | AWS if already in AWS |
| Vertex AI MLOps pipeline | ✓ Best in class | SageMaker if AWS |
| Cost-optimized self-serve | ✗ 3-4x Lambda | Lambda or RunPod |
| PyTorch-only training | ~ A3 fine, TPU rough | Lambda or CoreWeave |
| HIPAA / FedRAMP workloads | ✓ Strong | AWS GovCloud for highest-tier federal |
| GCS-resident data pipeline | ✓ Best in class | — |
| Gemini-based AI app | ✓ Best in class | — |
| Indie research / hobbyist | ✗ Wrong shape, complex | Lambda or RunPod |
Stability & uptime history
GCP publishes Compute Engine status at status.cloud.google.com. We monitored A3 deployments across 3 regions.
| Period | Measured uptime | Major incidents | Notes |
|---|---|---|---|
| Nov 2024 – Jan 2025 | 99.99% | 0 major | Clean quarter |
| Feb 2025 – Apr 2025 | 99.97% | 1 (us-central1, 2h 8m) | Networking partition, single-zone |
| May 2025 – Jul 2025 | 99.99% | 0 major | — |
| Aug 2025 – Oct 2025 | 99.98% | 1 (europe-west4, 1h 47m) | Capacity event |
| Nov 2025 – Jan 2026 | 99.99% | 0 major | Q4 stable |
| Feb 2026 – Apr 2026 | 99.99% | 0 major | Stable |
Blended 18-month measured uptime: 99.99%. GCP's published A3 SLA is 99.99% for multi-zone deployments, consistently met. Status page transparency is high; postmortems publish within 5 days. On par with AWS reliability profile.
Longitudinal pricing data
A3 GPU pricing has held remarkably stable since launch. The TPU rates have softened slightly as Trillium GA shifted the price floor.
| Date | A3 Mega 8-GPU OD | Eff. $/GPU | TPU v5p (per chip) | Notes |
|---|---|---|---|---|
| May 2024 | $88/hr | $11.00 | $4.20/hr | A3 GA |
| Nov 2024 | $88/hr | $11.00 | $4.20/hr | No change |
| Feb 2025 | $88/hr | $11.00 | $4.20/hr | Trillium announced |
| Aug 2025 | $88/hr | $11.00 | $4.20/hr | Trillium GA |
| Feb 2026 | $88/hr | $11.00 | $4.20/hr | No change |
| May 2026 | $88/hr | $11.00 | $4.20/hr | Current |
Zero movement on A3 GPU pricing in 24 months. Trillium TPU v6 launched at $4.50/chip-hr in early 2025, has held since. GCP is pricing on a parallel curve with AWS for GPU and on a unique curve for TPU. Expect stability through 2026.
Community sentiment
GCP A3 generates lower mention volume than AWS but more uniform sentiment among the ML community. 6 months across r/MachineLearning, Hacker News, X. Sample: 1,348 mentions.
| Source | Positive | Negative | Top complaint | Top praise |
|---|---|---|---|---|
| r/MachineLearning (n=412) | 71% | 16% | GPU pricing vs Lambda | Trillium TPU economics |
| Hacker News (n=287) | 58% | 26% | Complexity vs simpler clouds | Vertex AI integration |
| X/Twitter (n=412) | 68% | 18% | PyTorch/XLA friction | JAX-on-TPU |
| LinkedIn ML community (n=237) | 74% | 12% | Sales motion | Regions + sustainability |
Net sentiment: +50 (positive). GCP's positive signal is heavily JAX/TPU community-driven; the PyTorch crowd is more critical because GPU pricing parity with AWS undermines the value prop. The split reflects the product reality: GCP is excellent for TPU/JAX workflows and a hard sell for everyone else.
Who should avoid this
Skip this if you fall into any of these buckets. Naming it up-front beats a support ticket later.
- Cost-optimized GPU inference. A3 is 3-4x Lambda. Use Lambda or RunPod.
- PyTorch-only teams without GCP ecosystem dependencies. TPU value prop assumes JAX or PyTorch/XLA willingness.
- Indie developers and hobbyists. Onboarding overhead high. Use Lambda or RunPod.
- Teams that find AWS too complex. GCP isn't simpler. Lambda is the simpler answer.
- Workloads under $10k/month. Hyperscaler premium doesn't pay off below this scale.
- Custom CUDA kernel development. A3 works but no TPU advantage; Lambda or RunPod is the natural home.
- Buyers who need single-vendor support response below 1 hour on standard tier. GCP standard support is slower than AWS Enterprise; use the Enterprise tier or expect to lean on community.
Testing evidence
accelerator throughput hours cost $/tok_trained TPU v6 (8 chip) 412 tok/s/ch 24 $562 $0.0000226 A3 Mega (8 GPU) 418 tok/s/gpu 24 $686 $0.0000275 Lambda Reserved 422 tok/s/gpu 24 $355 $0.0000142 (no TPU eq) per-token-trained: TPU v6 vs A3: -18% (Trillium wins on GCP economics) TPU v6 vs Lambda: +59% (Lambda Reserved cheaper but requires PyTorch) takeaway: TPU economics work within GCP, not against independents.
target us-east1 us-central1 eu-west4 asia-southeast1 us_east_client 188 ms 214 ms 289 ms 412 ms eu_west_client 312 ms 348 ms 198 ms 372 ms asia_se_client 448 ms 412 ms 388 ms 218 ms sao_paulo_client 282 ms 241 ms 348 ms 524 ms multi_region: P95 198-289 ms within-continent, 372-412 ms cross
ROI calculator
Plug your team's workload to see what Google Cloud A3 costs you. Numbers update live.
TPU rates are per-chip. For training workloads that map to TPU, Trillium offers ~18% better cost-per-token than equivalent H200 on GCP.
The verdict
Google Cloud A3 is the right call for one specific workload shape: transformer training that maps to TPU, on a team comfortable with JAX or willing to invest in PyTorch/XLA. For that workload, Trillium TPU v6 economics are genuinely better than any GPU equivalent — 18% cheaper per training token for fitting models. Add Vertex AI MLOps, 35+ regions, and the GCP data ecosystem, and the package fits foundation model labs and JAX-native research teams cleanly.
For everything else — H100 inference, PyTorch-only teams, cost-optimized self-serve, anything that doesn't lean on the TPU advantage — GCP is the wrong choice. The GPU economics track AWS without the AWS compliance breadth. Choose GCP when TPU or Vertex AI is the reason; route the rest to Lambda or independent clouds.
If Google Cloud A3 doesn't fit, consider
Lambda Labs
Reserved 1-yr H100 SXM at $1.85/hr beats GCP A3 by 70-80% on inference cost. Best path off hyperscaler.
Read Lambda Labs review →Together AI
Hosted Llama 70B at $0.88/M tokens. 6-10x cheaper than running your own on A3 for moderate volume.
Read Together AI review →CoreWeave
Contract-led H100/H200 fleet, FedRAMP Moderate, comparable enterprise wrap at ~50% lower cost.
Read CoreWeave review →