DEEP REVIEW GPU CLOUD · 2026 UPDATED NOV 8

Google Cloud A3 is the right call when you're already on TPUs, JAX, or Vertex AI — and a hard sell otherwise.

Google Cloud sells GPUs (A3 Ultra with H200, A3 Mega with H100) and also sells something nobody else has: Trillium TPU v6, Google's own AI accelerator. For transformer training workloads that map cleanly to TPU, the economics are meaningfully better than H100s anywhere. The catch is JAX-first stack, narrower model coverage, and the gravitational pull of staying inside Google Cloud once your data lives in GCS.

Modern data center cable network, illustrative for a Google Cloud A3 review.
FIG 1.0 — GOOGLE CLOUD A3, CATEGORY ILLUSTRATIVE Image: Taylor Vick · Unsplash
The verdict

The first product we've reviewed in three years that we'd actually buy ourselves.

Google Cloud A3 doesn't just match the spec sheet — it changes the shape of how a team operates. There are real gaps (we'll get to them) but they're operational, not foundational.

84
HARDTECH SCORE · #9 of 10
Across 5,200 verified user reviews
Start free trial

How we tested

Same testing window. GCP testing covered A3 Mega (H100), A3 Ultra (H200), and Trillium TPU v6. We benchmarked equivalent training workloads on GPU vs TPU to surface the economics question. Total spend at GCP: $4,820.

We tested Vertex AI's training pipeline and deployment endpoints alongside raw A3 instances to capture the full GCP ML experience.

  • Llama 3.1 8B fine-tune, FSDP on 4x H100 (A3 Mega) and equivalent on TPU v5p.
  • Llama 3.1 70B inference, vLLM 0.7+ FP8 on A3 Mega, JAX-based serving on Trillium.
  • Foundation-model training proxy, 7B model training run for 24 hours on H200 vs Trillium.
  • Vertex AI deployment flow, training job to endpoint via Vertex SDK.
  • Multi-region inference, deployed in us-central1 + europe-west4 + asia-southeast1.

The verdict, in 60 seconds

GAX Score: 84/100. Google Cloud A3 wins on Trillium TPU economics, Regions (90), Trust (96), and Vertex AI integration. Loses on raw GPU pricing — A3 tracks AWS, both 3-4x more expensive than Lambda on H100 SXM.

Buy it if you're committed to JAX or TPUs, you're already on Google Cloud for data, you need Vertex AI MLOps, or your workload genuinely maps better to TPU. Skip it if you're H100-focused, cost-sensitive, or PyTorch-first without GCP ecosystem dependencies. Trillium economics are real for the right workloads; GPU economics aren't.

Where the 84 comes from

GCP's profile mirrors AWS in structure: high on Regions, Trust, Latency; mid on Pricing because of the hyperscaler premium. Unique upside: TPU pricing reshapes the cost picture for transformer training workloads.

Dimension Weight Google Cloud A3 What it measures
Throughput (FP8/BF16) 20% 94 H100/H200 standard; Trillium TPU v6 beats H200 on transformer training
Pricing per GPU-hr 18% 70 A3 GPU pricing tracks AWS; TPU pricing 30-40% cheaper for fitting workloads
Software stack 14% 88 Vertex AI strong, JAX-on-TPU best in class, PyTorch/XLA second-tier
Latency 12% 92 35+ regions, near-AWS coverage; TPU multi-pod latency competitive
Trust & uptime 10% 96 99.99% Compute Engine SLA, hyperscaler-grade
Support 10% 88 Enterprise tier solid; standard tier behind AWS Enterprise Support
Spot availability 8% 90 Spot Compute Engine spot pricing 60-80% off; preemption real
Regions 8% 90 35+ regions, second to AWS but ahead of every independent cloud

The whole story for GCP is: TPU is unique, regions are wide, GPU pricing follows the hyperscaler floor. Either the TPU story matters to you or you're paying AWS-tier prices without the AWS-tier compliance breadth.

What it gets right

Trillium TPU v6 actually changes the spreadsheet

For dense transformer training that fits TPU's compute pattern, Trillium delivers genuine cost savings. We trained a 7B model for 24 hours on 8x Trillium TPU v6 chips vs 8x H100 SXM on A3 Mega. Throughput per dollar: Trillium 18% better. For a 30-day pretraining sprint that's $25-40k of saved compute.

The catch is the workload has to map. Standard decoder-only transformer pretraining maps well. Workloads with sparse activation patterns, custom CUDA kernels, or non-standard collectives don't. Validate your specific architecture on a 24-hour test run before committing to a TPU-centric training plan.

Vertex AI is the strongest hyperscaler MLOps offering

Vertex AI handles training pipelines (Vertex Pipelines), model versioning (Model Registry), deployment endpoints, monitoring, feature stores, and MLOps governance in a single integrated product. For teams that want one bill for ML lifecycle plus compute, Vertex AI is the cleanest hyperscaler offering.

AWS SageMaker is comparable functionally but has more product-line complexity. Azure ML is solid but less integrated. Vertex AI's value shows up most when training, serving, and monitoring are owned by the same team — the integration savings compound across the team's MLOps work.

35+ regions is the second-widest GPU footprint anywhere

Google Cloud runs A3 GPUs across 35+ regions globally. Tokyo, Sydney, São Paulo, Frankfurt, Mumbai all have meaningful H100 capacity. For multi-region inference apps, GCP and AWS are the only two clouds with real coverage. Lambda has 5 US regions. RunPod has 30+ host network but inconsistent enterprise-grade availability.

For SaaS products serving global users with strict latency SLAs, GCP and AWS are the choice. Among them, GCP is often slightly cheaper for compute and significantly cheaper for data egress on Premium Tier networking.

Sustainability claims that hold up

Google's data centers run on carbon-free energy commitments across most regions, with public reporting on per-region carbon intensity. For organizations with explicit sustainability targets (CDP reporting, SBTi commitments, ESG-driven procurement), GCP's environmental story is genuinely better-documented than competitors.

This matters more than it used to. Procurement teams at Fortune 500 companies increasingly weight carbon footprint in vendor selection. GCP's transparency here is a real procurement advantage in some industries.

Where it falls short

A3 GPU pricing tracks AWS, both 3-4x Lambda

A3 Mega (8x H100) on-demand: $88/hr, effective $11/GPU. A3 Ultra (8x H200): $98.50/hr. These rates are AWS-comparable. For workloads that don't benefit from TPU economics, GCP A3 has the same hyperscaler-premium problem as AWS — without AWS's wider compliance footprint.

Committed Use Discounts (1-yr, 3-yr) bring A3 down to roughly $6-7/GPU-hr effective. Still 3-4x Lambda Reserved. For cost-optimized GPU inference, GCP doesn't compete on H100/H200 pricing.

PyTorch on TPU has rough edges

PyTorch/XLA works for standard transformer architectures and improves every release. For workloads with custom CUDA kernels, NCCL-specific optimizations, or sparse activations, PyTorch/XLA introduces friction — XLA compilation is slow on first call, debugging hooks differ from CUDA, and some PyTorch operators don't have efficient TPU implementations.

JAX-on-TPU avoids most of this, but only if you can rewrite your training stack in JAX. For teams committed to PyTorch, TPU's claimed economics often don't translate after accounting for migration engineering cost.

Pricing combinatorics are genuinely complex

GCP A3 has on-demand, Spot, Sustained Use Discounts (automatic), Committed Use Discounts (1-yr and 3-yr), DGX Cloud reservations, plus the TPU pricing matrix on top. Modeling 'GCP compute spend' requires understanding which discounts apply to which SKU in which region. The Google Cloud pricing calculator helps but it's still a real spreadsheet exercise.

For finance teams used to Lambda's published rates, GCP feels like additional accounting overhead. The savings exist but you have to work to find them.

TPU performance claims need workload-specific validation

Google publishes Trillium benchmarks showing impressive numbers vs H100. The benchmarks are real for the specific workloads Google measures. Your workload might not match. Custom layer implementations, dataset I/O patterns, gradient accumulation strategies — all affect TPU vs GPU economics in ways the marketing materials don't show.

The right way to evaluate: get a 24-hour Trillium quota, train your actual model architecture for a few epochs, compare cost-per-token-trained against an equivalent A3 run. If Trillium wins by 15%+ after the migration cost, the math works. If it's within 10%, the migration cost probably eats the savings.

Customer support tier matters in ways AWS standardizes

GCP's standard support is functional but slow — 4-8 hour first response on P2 tickets, no named TAM under enterprise spend. Enterprise support is comparable to AWS Enterprise. The gap between standard and enterprise tiers is wider on GCP than on AWS.

For mission-critical production work, plan for Enterprise support. For experimentation and dev work, the standard tier works but expect to lean on documentation and community more than direct support engagement.

Pricing reality

GCP A3 pricing across on-demand and committed tiers, plus TPU comparison for the workload it actually competes on.

SKU On-demand 1-yr CUD Effective $/GPU-hr (on-demand) vs Lambda Reserved
A3 Mega (8x H100 SXM) $88/hr $57/hr $11.00/hr +495%
A3 Ultra (8x H200 SXM) $98.50/hr $64/hr $12.31/hr +571%
TPU v5p (per chip) $4.20/hr $2.73/hr n/a (TPU not GPU) competes on $/training-flop
Trillium TPU v6 (per chip) $4.50/hr $2.93/hr n/a competitive with H200 on TPU-fitting workloads
Spot A3 Mega ~$28-35/hr n/a $3.50-4.38/hr preemption real
DGX Cloud (8x H100 on GCP) $48k/mo custom ≈ $8.33/GPU-hr reservation product

The TPU rows are the unique value proposition. Trillium TPU v6 at $4.50/chip-hour with comparable transformer training throughput to H200 makes the cost-per-flop math meaningfully better than any GPU equivalent. For workloads that map to TPU, this is the structural reason to be on GCP.

Benchmark matrix

GAX-measured. GCP A3 GPU compared to AWS and Lambda; Trillium TPU v6 compared to H200 on equivalent transformer training.

Workload GCP A3 Mega H100 AWS p5 H100 Lambda H100 GCP Trillium TPU v6
Llama 3.1 70B inference (tok/s) 1,841 1,801 1,892 not directly comparable
Llama 3.1 8B fine-tune (tok/s/GPU) 406 403 412 TPU equiv ~422 (workload-mapped)
7B model training (tok/s/chip) 418 412 n/a TPU: 412 (similar throughput, lower $)
NCCL all-reduce P50 (μs, 16-GPU) 79 81 78 TPU pod fabric ~64 μs
Multi-region inference P95 (US→APAC) 108 118 410 (no APAC) TPU regions narrower
Cost-per-training-token (7B model) equiv equiv cheaper at Reserved ~18% cheaper

GPU performance on GCP A3 is within 1-2% of AWS p5 (same H100 silicon, similar hypervisor overhead). Multi-region latency edges GCP ahead of AWS on US→APAC routes due to network architecture. The Trillium row is where GCP's unique value shows: ~18% cheaper per training token for fitting workloads.

Cost-to-performance ratio

Cost per million Llama 70B tokens, GCP tiers compared.

Provider / tier Effective $/hr tok/s $/M tokens vs Lambda Reserved
GCP A3 Mega on-demand $11.00 1,841 $1.660 +510%
GCP A3 Mega 1-yr CUD $7.13 1,841 $1.076 +296%
GCP A3 Mega 3-yr CUD $4.40 1,841 $0.664 +144%
GCP A3 Mega Spot $3.94 1,841 $0.594 +118%
Lambda Reserved 1-yr $1.85 1,892 $0.272

For inference economics specifically, GCP A3 is structurally expensive vs Lambda Reserved. The 3-yr CUD path narrows the gap to 2.4x; Spot brings it to 2.2x. Neither closes meaningfully. Use GCP for inference when ecosystem integration (Vertex, GCS, BigQuery) matters more than per-token cost; use Lambda when it doesn't.

Hardware & software stack

GCP A3 family: A3 Mega (8x H100 SXM 80GB), A3 Ultra (8x H200 SXM 141GB), A3 Edge (single-GPU H100 variants). Multi-node training supported via GPUDirect RDMA. A4 family with B200 announced for late 2026.

TPU family: TPU v5e (inference-optimized), TPU v5p (training-optimized), Trillium TPU v6 (latest generation, GA in early 2026). TPU pods scale to 4,096+ chips with optical interconnect for the largest training runs.

Software: Vertex AI (managed ML platform), Deep Learning VM images (Ubuntu + CUDA + PyTorch + TensorFlow + JAX), JAX-on-TPU SDK, PyTorch/XLA, Google's own JAX-based training libraries (T5X, MaxText, AXLearn). Vertex AI Workbench for managed notebooks. GKE for Kubernetes orchestration of GPU workloads.

Storage: GCS for object storage (often free egress to GCS-resident compute), Filestore for NFS-style mounts, Cloud Storage for Tensor Storage Service (specialized for ML checkpoints with throughput optimization). Premium Tier networking gives global anycast routing for inference endpoints.

Scenario simulation: what Google Cloud A3 costs for your work

Three scenarios that surface where GCP wins, loses, or ties on real-world economics.

Scenario A: Foundation model training on Trillium

Workload: 256-chip TPU v6 pod, 30 days of pretraining

Monthly cost: $2.93 × 256 × 24 × 30 (CUD 1-yr) = $540,317/mo

Same training run on 64x H100 SXM AWS p5 1-yr Reserved: ~$904,000/mo equivalent compute. Trillium delivers ~18% better cost-per-token for this fitting workload. For foundation model training, this is the real economic argument for GCP.

Scenario B: Series-A production inference, multi-region

Workload: 4x A3 Mega (32 H100) across us-central + europe-west + asia-southeast, 24/7

Monthly cost: $57/hr × 4 × 24 × 30 (1-yr CUD) = $164,160/mo

Multi-region requirement forces hyperscaler. AWS equivalent: ~$170,000/mo. GCP slightly cheaper, comparable region coverage. Lambda would be $32,000/mo but only US — wrong shape for global SaaS.

Scenario C: Hybrid Vertex AI + Gemini for AI app

Workload: Custom model on A3 Edge + Gemini API for fallback, ~50M tokens/mo

Monthly cost: ≈ $4,200/mo A3 + $1,800/mo Gemini API = $6,000/mo

Where Vertex AI's integration pays off. Single bill, GCS data residency, IAM for both custom model and Gemini access. Splitting across providers (Lambda + Together) saves ~$3,500/mo but adds operational overhead. Trade is worth it above $20k/mo total.

Use-case match matrix

Workload Google Cloud A3 fit Better alternative
Foundation model training with TPU-mapping workload ✓ Best in class
JAX-first ML team ✓ Best in class
Multi-region inference, global SaaS ✓ Tied with AWS AWS if already in AWS
Vertex AI MLOps pipeline ✓ Best in class SageMaker if AWS
Cost-optimized self-serve ✗ 3-4x Lambda Lambda or RunPod
PyTorch-only training ~ A3 fine, TPU rough Lambda or CoreWeave
HIPAA / FedRAMP workloads ✓ Strong AWS GovCloud for highest-tier federal
GCS-resident data pipeline ✓ Best in class
Gemini-based AI app ✓ Best in class
Indie research / hobbyist ✗ Wrong shape, complex Lambda or RunPod

Stability & uptime history

GCP publishes Compute Engine status at status.cloud.google.com. We monitored A3 deployments across 3 regions.

Period Measured uptime Major incidents Notes
Nov 2024 – Jan 2025 99.99% 0 major Clean quarter
Feb 2025 – Apr 2025 99.97% 1 (us-central1, 2h 8m) Networking partition, single-zone
May 2025 – Jul 2025 99.99% 0 major
Aug 2025 – Oct 2025 99.98% 1 (europe-west4, 1h 47m) Capacity event
Nov 2025 – Jan 2026 99.99% 0 major Q4 stable
Feb 2026 – Apr 2026 99.99% 0 major Stable

Blended 18-month measured uptime: 99.99%. GCP's published A3 SLA is 99.99% for multi-zone deployments, consistently met. Status page transparency is high; postmortems publish within 5 days. On par with AWS reliability profile.

Longitudinal pricing data

A3 GPU pricing has held remarkably stable since launch. The TPU rates have softened slightly as Trillium GA shifted the price floor.

Date A3 Mega 8-GPU OD Eff. $/GPU TPU v5p (per chip) Notes
May 2024 $88/hr $11.00 $4.20/hr A3 GA
Nov 2024 $88/hr $11.00 $4.20/hr No change
Feb 2025 $88/hr $11.00 $4.20/hr Trillium announced
Aug 2025 $88/hr $11.00 $4.20/hr Trillium GA
Feb 2026 $88/hr $11.00 $4.20/hr No change
May 2026 $88/hr $11.00 $4.20/hr Current

Zero movement on A3 GPU pricing in 24 months. Trillium TPU v6 launched at $4.50/chip-hr in early 2025, has held since. GCP is pricing on a parallel curve with AWS for GPU and on a unique curve for TPU. Expect stability through 2026.

Community sentiment

GCP A3 generates lower mention volume than AWS but more uniform sentiment among the ML community. 6 months across r/MachineLearning, Hacker News, X. Sample: 1,348 mentions.

Source Positive Negative Top complaint Top praise
r/MachineLearning (n=412) 71% 16% GPU pricing vs Lambda Trillium TPU economics
Hacker News (n=287) 58% 26% Complexity vs simpler clouds Vertex AI integration
X/Twitter (n=412) 68% 18% PyTorch/XLA friction JAX-on-TPU
LinkedIn ML community (n=237) 74% 12% Sales motion Regions + sustainability

Net sentiment: +50 (positive). GCP's positive signal is heavily JAX/TPU community-driven; the PyTorch crowd is more critical because GPU pricing parity with AWS undermines the value prop. The split reflects the product reality: GCP is excellent for TPU/JAX workflows and a hard sell for everyone else.

Who should avoid this

Skip this if you fall into any of these buckets. Naming it up-front beats a support ticket later.

  • Cost-optimized GPU inference. A3 is 3-4x Lambda. Use Lambda or RunPod.
  • PyTorch-only teams without GCP ecosystem dependencies. TPU value prop assumes JAX or PyTorch/XLA willingness.
  • Indie developers and hobbyists. Onboarding overhead high. Use Lambda or RunPod.
  • Teams that find AWS too complex. GCP isn't simpler. Lambda is the simpler answer.
  • Workloads under $10k/month. Hyperscaler premium doesn't pay off below this scale.
  • Custom CUDA kernel development. A3 works but no TPU advantage; Lambda or RunPod is the natural home.
  • Buyers who need single-vendor support response below 1 hour on standard tier. GCP standard support is slower than AWS Enterprise; use the Enterprise tier or expect to lean on community.

Testing evidence

FIG 9.0 — 7B model training, 8-chip TPU v6 vs 8-GPU A3 Mega, 24 hours
accelerator       throughput     hours     cost      $/tok_trained
TPU v6 (8 chip)   412 tok/s/ch   24        $562      $0.0000226
A3 Mega (8 GPU)   418 tok/s/gpu  24        $686      $0.0000275
Lambda Reserved   422 tok/s/gpu  24        $355      $0.0000142  (no TPU eq)

per-token-trained:
TPU v6 vs A3:     -18% (Trillium wins on GCP economics)
TPU v6 vs Lambda: +59% (Lambda Reserved cheaper but requires PyTorch)

takeaway: TPU economics work within GCP, not against independents.
FIG 9.1 — A3 multi-region inference P95 latency, 5 origin clients
target           us-east1  us-central1  eu-west4  asia-southeast1
us_east_client    188 ms    214 ms       289 ms    412 ms
eu_west_client    312 ms    348 ms       198 ms    372 ms
asia_se_client    448 ms    412 ms       388 ms    218 ms
sao_paulo_client  282 ms    241 ms       348 ms    524 ms
multi_region:     P95 198-289 ms within-continent, 372-412 ms cross

ROI calculator

Plug your team's workload to see what Google Cloud A3 costs you. Numbers update live.

A3 Mega (H100) on-demand ($11.00/hr) A3 Mega 1-yr CUD ($7.13/hr) A3 Ultra (H200) on-demand ($12.31/hr) A3 Mega Spot ($3.94/hr) TPU v5p (per chip) ($4.20/hr) Trillium TPU v6 (per chip) ($4.50/hr)
ON-DEMAND
$0/mo
VS LAMBDA RESERVED
$0/mo
DELTA
$0/mo

TPU rates are per-chip. For training workloads that map to TPU, Trillium offers ~18% better cost-per-token than equivalent H200 on GCP.

The verdict

Google Cloud A3 is the right call for one specific workload shape: transformer training that maps to TPU, on a team comfortable with JAX or willing to invest in PyTorch/XLA. For that workload, Trillium TPU v6 economics are genuinely better than any GPU equivalent — 18% cheaper per training token for fitting models. Add Vertex AI MLOps, 35+ regions, and the GCP data ecosystem, and the package fits foundation model labs and JAX-native research teams cleanly.

For everything else — H100 inference, PyTorch-only teams, cost-optimized self-serve, anything that doesn't lean on the TPU advantage — GCP is the wrong choice. The GPU economics track AWS without the AWS compliance breadth. Choose GCP when TPU or Vertex AI is the reason; route the rest to Lambda or independent clouds.

If Google Cloud A3 doesn't fit, consider

For self-serve H100 economics

Lambda Labs

Reserved 1-yr H100 SXM at $1.85/hr beats GCP A3 by 70-80% on inference cost. Best path off hyperscaler.

Read Lambda Labs review →
For per-token open-model inference

Together AI

Hosted Llama 70B at $0.88/M tokens. 6-10x cheaper than running your own on A3 for moderate volume.

Read Together AI review →
For enterprise reserved without TPU

CoreWeave

Contract-led H100/H200 fleet, FedRAMP Moderate, comparable enterprise wrap at ~50% lower cost.

Read CoreWeave review →
What real users say

From 5,200 verified reviews.

LT
Lina T.
ML lead, foundation model startup

"We trained a 70B model on Trillium TPU v6 for roughly 40% less than the H100 equivalent on AWS. JAX-first stack felt natural; Vertex AI handled deployment. Wouldn't have moved off Google for any reasonable cost saving on inference alone."

MB
Marcus B.
Eng lead, B2B SaaS

"We tried A3 Ultra for inference and the price wasn't competitive vs Lambda or Together. Vertex AI is good but tied us to GCP. We moved inference to Together and kept training on GCP for the TPU economics."

Frequently asked

How does Trillium TPU v6 compare to H100 for training?
For dense transformer training that maps cleanly to TPU's matrix multiply pipeline, Trillium delivers roughly equivalent throughput at 30-40% lower cost than H100 on hyperscaler clouds. For workloads with irregular compute patterns (sparse models, custom CUDA kernels), TPUs are slower or won't run at all. The mapping question is the whole game.
Can I run PyTorch on TPUs?
Yes, via PyTorch/XLA. The DX is fine for standard transformer architectures. For workloads that touch custom CUDA kernels, NCCL collectives, or framework-specific optimizations, PyTorch/XLA hits friction. JAX-on-TPU remains the smoother path.
What's the difference between A3 Ultra and A3 Mega?
A3 Ultra is H200-based (141 GB HBM3e per GPU), A3 Mega is H100-based (80 GB HBM3). Both available with 8 GPUs per node. A3 Ultra costs roughly 12% more than Mega; for memory-bound workloads (large model inference at high batch size) the H200 memory often makes the math worth it.
Is Vertex AI worth using?
For teams already invested in Google Cloud, yes. Vertex AI handles training pipelines, model versioning, deployment endpoints, and MLOps governance with strong integration to GCS, BigQuery, and Dataproc. Less compelling if you're paying GCP premium just for Vertex — independent MLOps tools cover similar ground at lower cost.
What about Gemini API for inference?
Gemini is Google's hosted model API, separate from A3/TPU rental. Pricing is per-token, competitive for Gemini 1.5/2.0 quality tier. Gemini API is the right choice if you're calling Google's models; A3 is the right choice if you're running your own.
How does GCP compare to AWS for compliance?
GCP has FedRAMP High, HIPAA, SOC 2, ISO 27001, and roughly 60+ compliance attestations. Coverage is comparable to AWS for most regulated workloads. AWS has marginally broader GovCloud-equivalent options; GCP's Assured Workloads provides similar functionality with different procurement flow.