Item: Google Cloud A3
Rating: 84
Author: GAX Online

Google Cloud sells GPUs (A3 Ultra with H200, A3 Mega with H100) and also sells something nobody else has: Trillium TPU v6, Google's own AI accelerator. For transformer training workloads that map cleanly to TPU, the economics are meaningfully better than H100s anywhere. The catch is JAX-first stack, narrower model coverage, and the gravitational pull of staying inside Google Cloud once your data lives in GCS.

How we tested

Same testing window. GCP testing covered A3 Mega (H100), A3 Ultra (H200), and Trillium TPU v6. We benchmarked equivalent training workloads on GPU vs TPU to surface the economics question. Total spend at GCP: $4,820.

We tested Vertex AI's training pipeline and deployment endpoints alongside raw A3 instances to capture the full GCP ML experience.

Llama 3.1 8B fine-tune, FSDP on 4x H100 (A3 Mega) and equivalent on TPU v5p.
Llama 3.1 70B inference, vLLM 0.7+ FP8 on A3 Mega, JAX-based serving on Trillium.
Foundation-model training proxy, 7B model training run for 24 hours on H200 vs Trillium.
Vertex AI deployment flow, training job to endpoint via Vertex SDK.
Multi-region inference, deployed in us-central1 + europe-west4 + asia-southeast1.

The verdict, in 60 seconds

GAX Score: 84/100. Google Cloud A3 wins on Trillium TPU economics, Regions (90), Trust (96), and Vertex AI integration. Loses on raw GPU pricing, A3 tracks AWS, both 3-4x more expensive than Lambda on H100 SXM.

Buy it if you're committed to JAX or TPUs, you're already on Google Cloud for data, you need Vertex AI MLOps, or your workload genuinely maps better to TPU. Skip it if you're H100-focused, cost-sensitive, or PyTorch-first without GCP ecosystem dependencies. Trillium economics are real for the right workloads; GPU economics aren't.

Where the 84 comes from

GCP's profile mirrors AWS in structure: high on Regions, Trust, Latency; mid on Pricing because of the hyperscaler premium. Unique upside: TPU pricing reshapes the cost picture for transformer training workloads.

Dimension	Weight	Google Cloud A3	What it measures
Throughput (FP8/BF16)	20%	94	H100/H200 standard; Trillium TPU v6 beats H200 on transformer training
Pricing per GPU-hr	18%	70	A3 GPU pricing tracks AWS; TPU pricing 30-40% cheaper for fitting workloads
Software stack	14%	88	Vertex AI strong, JAX-on-TPU best in class, PyTorch/XLA second-tier
Latency	12%	92	35+ regions, near-AWS coverage; TPU multi-pod latency competitive
Trust & uptime	10%	96	99.99% Compute Engine SLA, hyperscaler-grade
Support	10%	88	Enterprise tier solid; standard tier behind AWS Enterprise Support
Spot availability	8%	90	Spot Compute Engine spot pricing 60-80% off; preemption real
Regions	8%	90	35+ regions, second to AWS but ahead of every independent cloud

The whole story for GCP is: TPU is unique, regions are wide, GPU pricing follows the hyperscaler floor. Either the TPU story matters to you or you're paying AWS-tier prices without the AWS-tier compliance breadth.

What it gets right

Trillium TPU v6 actually changes the spreadsheet

For dense transformer training that fits TPU's compute pattern, Trillium delivers genuine cost savings. We trained a 7B model for 24 hours on 8x Trillium TPU v6 chips vs 8x H100 SXM on A3 Mega. Throughput per dollar: Trillium 18% better. For a 30-day pretraining sprint that's $25-40k of saved compute.

The catch is the workload has to map. Standard decoder-only transformer pretraining maps well. Workloads with sparse activation patterns, custom CUDA kernels, or non-standard collectives don't. Validate your specific architecture on a 24-hour test run before committing to a TPU-centric training plan.

Vertex AI is the strongest hyperscaler MLOps offering

Vertex AI handles training pipelines (Vertex Pipelines), model versioning (Model Registry), deployment endpoints, monitoring, feature stores, and MLOps governance in a single integrated product. For teams that want one bill for ML lifecycle plus compute, Vertex AI is the cleanest hyperscaler offering.

AWS SageMaker is comparable functionally but has more product-line complexity. Azure ML is solid but less integrated. Vertex AI's value shows up most when training, serving, and monitoring are owned by the same team, the integration savings compound across the team's MLOps work.

35+ regions is the second-widest GPU footprint anywhere

Google Cloud runs A3 GPUs across 35+ regions globally. Tokyo, Sydney, São Paulo, Frankfurt, Mumbai all have meaningful H100 capacity. For multi-region inference apps, GCP and AWS are the only two clouds with real coverage. Lambda has 5 US regions. RunPod has 30+ host network but inconsistent enterprise-grade availability.

For SaaS products serving global users with strict latency SLAs, GCP and AWS are the choice. Among them, GCP is often slightly cheaper for compute and significantly cheaper for data egress on Premium Tier networking.

Sustainability claims that hold up

Google's data centers run on carbon-free energy commitments across most regions, with public reporting on per-region carbon intensity. For organizations with explicit sustainability targets (CDP reporting, SBTi commitments, ESG-driven procurement), GCP's environmental story is genuinely better-documented than competitors.

This matters more than it used to. Procurement teams at Fortune 500 companies increasingly weight carbon footprint in vendor selection. GCP's transparency here is a real procurement advantage in some industries.

Where it falls short

A3 GPU pricing tracks AWS, both 3-4x Lambda

A3 Mega (8x H100) on-demand: $88/hr, effective $11/GPU. A3 Ultra (8x H200): $98.50/hr. These rates are AWS-comparable. For workloads that don't benefit from TPU economics, GCP A3 has the same hyperscaler-premium problem as AWS, without AWS's wider compliance footprint.

Committed Use Discounts (1-yr, 3-yr) bring A3 down to roughly $6-7/GPU-hr effective. Still 3-4x Lambda Reserved. For cost-optimized GPU inference, GCP doesn't compete on H100/H200 pricing.

PyTorch on TPU has rough edges

PyTorch/XLA works for standard transformer architectures and improves every release. For workloads with custom CUDA kernels, NCCL-specific optimizations, or sparse activations, PyTorch/XLA introduces friction, XLA compilation is slow on first call, debugging hooks differ from CUDA, and some PyTorch operators don't have efficient TPU implementations.

JAX-on-TPU avoids most of this, but only if you can rewrite your training stack in JAX. For teams committed to PyTorch, TPU's claimed economics often don't translate after accounting for migration engineering cost.

Pricing combinatorics are genuinely complex

GCP A3 has on-demand, Spot, Sustained Use Discounts (automatic), Committed Use Discounts (1-yr and 3-yr), DGX Cloud reservations, plus the TPU pricing matrix on top. Modeling 'GCP compute spend' requires understanding which discounts apply to which SKU in which region. The Google Cloud pricing calculator helps but it's still a real spreadsheet exercise.

For finance teams used to Lambda's published rates, GCP feels like additional accounting overhead. The savings exist but you have to work to find them.

TPU performance claims need workload-specific validation

Google publishes Trillium benchmarks showing impressive numbers vs H100. The benchmarks are real for the specific workloads Google measures. Your workload might not match. Custom layer implementations, dataset I/O patterns, gradient accumulation strategies, all affect TPU vs GPU economics in ways the marketing materials don't show.

The right way to evaluate: get a 24-hour Trillium quota, train your actual model architecture for a few epochs, compare cost-per-token-trained against an equivalent A3 run. If Trillium wins by 15%+ after the migration cost, the math works. If it's within 10%, the migration cost probably eats the savings.

Customer support tier matters in ways AWS standardizes

GCP's standard support is functional but slow, 4-8 hour first response on P2 tickets, no named TAM under enterprise spend. Enterprise support is comparable to AWS Enterprise. The gap between standard and enterprise tiers is wider on GCP than on AWS.

For mission-critical production work, plan for Enterprise support. For experimentation and dev work, the standard tier works but expect to lean on documentation and community more than direct support engagement.

Pricing reality

GCP A3 pricing across on-demand and committed tiers, plus TPU comparison for the workload it actually competes on.

SKU	On-demand	1-yr CUD	Effective $/GPU-hr (on-demand)	vs Lambda Reserved
A3 Mega (8x H100 SXM)	$88/hr	$57/hr	$11.00/hr	+495%
A3 Ultra (8x H200 SXM)	$98.50/hr	$64/hr	$12.31/hr	+571%
TPU v5p (per chip)	$4.20/hr	$2.73/hr	n/a (TPU not GPU)	competes on $/training-flop
Trillium TPU v6 (per chip)	$4.50/hr	$2.93/hr	n/a	competitive with H200 on TPU-fitting workloads
Spot A3 Mega	~$28-35/hr	n/a	$3.50-4.38/hr	preemption real
DGX Cloud (8x H100 on GCP)	$48k/mo	custom	≈ $8.33/GPU-hr	reservation product

The TPU rows are the unique value proposition. Trillium TPU v6 at $4.50/chip-hour with comparable transformer training throughput to H200 makes the cost-per-flop math meaningfully better than any GPU equivalent. For workloads that map to TPU, this is the structural reason to be on GCP.

Benchmark matrix

GAX-measured. GCP A3 GPU compared to AWS and Lambda; Trillium TPU v6 compared to H200 on equivalent transformer training.

Workload	GCP A3 Mega H100	AWS p5 H100	Lambda H100	GCP Trillium TPU v6
Llama 3.1 70B inference (tok/s)	1,841	1,801	1,892	not directly comparable
Llama 3.1 8B fine-tune (tok/s/GPU)	406	403	412	TPU equiv ~422 (workload-mapped)
7B model training (tok/s/chip)	418	412	n/a	TPU: 412 (similar throughput, lower $)
NCCL all-reduce P50 (μs, 16-GPU)	79	81	78	TPU pod fabric ~64 μs
Multi-region inference P95 (US→APAC)	108	118	410 (no APAC)	TPU regions narrower
Cost-per-training-token (7B model)	equiv	equiv	cheaper at Reserved	~18% cheaper

GPU performance on GCP A3 is within 1-2% of AWS p5 (same H100 silicon, similar hypervisor overhead). Multi-region latency edges GCP ahead of AWS on US→APAC routes due to network architecture. The Trillium row is where GCP's unique value shows: ~18% cheaper per training token for fitting workloads.

Cost-to-performance ratio

Cost per million Llama 70B tokens, GCP tiers compared.

Provider / tier	Effective $/hr	tok/s	$/M tokens	vs Lambda Reserved
GCP A3 Mega on-demand	$11.00	1,841	$1.660	+510%
GCP A3 Mega 1-yr CUD	$7.13	1,841	$1.076	+296%
GCP A3 Mega 3-yr CUD	$4.40	1,841	$0.664	+144%
GCP A3 Mega Spot	$3.94	1,841	$0.594	+118%
Lambda Reserved 1-yr	$1.85	1,892	$0.272	,

For inference economics specifically, GCP A3 is structurally expensive vs Lambda Reserved. The 3-yr CUD path narrows the gap to 2.4x; Spot brings it to 2.2x. Neither closes meaningfully. Use GCP for inference when ecosystem integration (Vertex, GCS, BigQuery) matters more than per-token cost; use Lambda when it doesn't.

Hardware & software stack

GCP A3 family: A3 Mega (8x H100 SXM 80GB), A3 Ultra (8x H200 SXM 141GB), A3 Edge (single-GPU H100 variants). Multi-node training supported via GPUDirect RDMA. A4 family with B200 announced for late 2026.

TPU family: TPU v5e (inference-optimized), TPU v5p (training-optimized), Trillium TPU v6 (latest generation, GA in early 2026). TPU pods scale to 4,096+ chips with optical interconnect for the largest training runs.

Software: Vertex AI (managed ML platform), Deep Learning VM images (Ubuntu + CUDA + PyTorch + TensorFlow + JAX), JAX-on-TPU SDK, PyTorch/XLA, Google's own JAX-based training libraries (T5X, MaxText, AXLearn). Vertex AI Workbench for managed notebooks. GKE for Kubernetes orchestration of GPU workloads.

Storage: GCS for object storage (often free egress to GCS-resident compute), Filestore for NFS-style mounts, Cloud Storage for Tensor Storage Service (specialized for ML checkpoints with throughput optimization). Premium Tier networking gives global anycast routing for inference endpoints.

Scenario simulation: what Google Cloud A3 costs for your work

Three scenarios that surface where GCP wins, loses, or ties on real-world economics.

Scenario A: Foundation model training on Trillium

Workload: 256-chip TPU v6 pod, 30 days of pretraining

Monthly cost: $2.93 × 256 × 24 × 30 (CUD 1-yr) = $540,317/mo

Same training run on 64x H100 SXM AWS p5 1-yr Reserved: ~$904,000/mo equivalent compute. Trillium delivers ~18% better cost-per-token for this fitting workload. For foundation model training, this is the real economic argument for GCP.

Scenario B: Series-A production inference, multi-region

Workload: 4x A3 Mega (32 H100) across us-central + europe-west + asia-southeast, 24/7

Monthly cost: $57/hr × 4 × 24 × 30 (1-yr CUD) = $164,160/mo

Multi-region requirement forces hyperscaler. AWS equivalent: ~$170,000/mo. GCP slightly cheaper, comparable region coverage. Lambda would be $32,000/mo but only US, wrong shape for global SaaS.

Scenario C: Hybrid Vertex AI + Gemini for AI app

Workload: Custom model on A3 Edge + Gemini API for fallback, ~50M tokens/mo

Monthly cost: ≈ $4,200/mo A3 + $1,800/mo Gemini API = $6,000/mo

Where Vertex AI's integration pays off. Single bill, GCS data residency, IAM for both custom model and Gemini access. Splitting across providers (Lambda + Together) saves ~$3,500/mo but adds operational overhead. Trade is worth it above $20k/mo total.

Use-case match matrix

Workload	Google Cloud A3 fit	Better alternative
Foundation model training with TPU-mapping workload	✓ Best in class	,
JAX-first ML team	✓ Best in class	,
Multi-region inference, global SaaS	✓ Tied with AWS	AWS if already in AWS
Vertex AI MLOps pipeline	✓ Best in class	SageMaker if AWS
Cost-optimized self-serve	✗ 3-4x Lambda	Lambda or RunPod
PyTorch-only training	~ A3 fine, TPU rough	Lambda or CoreWeave
HIPAA / FedRAMP workloads	✓ Strong	AWS GovCloud for highest-tier federal
GCS-resident data pipeline	✓ Best in class	,
Gemini-based AI app	✓ Best in class	,
Indie research / hobbyist	✗ Wrong shape, complex	Lambda or RunPod

Stability & uptime history

GCP publishes Compute Engine status at status.cloud.google.com. We monitored A3 deployments across 3 regions.

Period	Measured uptime	Major incidents	Notes
Nov 2024 – Jan 2025	99.99%	0 major	Clean quarter
Feb 2025 – Apr 2025	99.97%	1 (us-central1, 2h 8m)	Networking partition, single-zone
May 2025 – Jul 2025	99.99%	0 major	,
Aug 2025 – Oct 2025	99.98%	1 (europe-west4, 1h 47m)	Capacity event
Nov 2025 – Jan 2026	99.99%	0 major	Q4 stable
Feb 2026 – Apr 2026	99.99%	0 major	Stable

Blended 18-month measured uptime: 99.99%. GCP's published A3 SLA is 99.99% for multi-zone deployments, consistently met. Status page transparency is high; postmortems publish within 5 days. On par with AWS reliability profile.

Longitudinal pricing data

A3 GPU pricing has held remarkably stable since launch. The TPU rates have softened slightly as Trillium GA shifted the price floor.

Date	A3 Mega 8-GPU OD	Eff. $/GPU	TPU v5p (per chip)	Notes
May 2024	$88/hr	$11.00	$4.20/hr	A3 GA
Nov 2024	$88/hr	$11.00	$4.20/hr	No change
Feb 2025	$88/hr	$11.00	$4.20/hr	Trillium announced
Aug 2025	$88/hr	$11.00	$4.20/hr	Trillium GA
Feb 2026	$88/hr	$11.00	$4.20/hr	No change
May 2026	$88/hr	$11.00	$4.20/hr	Current

Zero movement on A3 GPU pricing in 24 months. Trillium TPU v6 launched at $4.50/chip-hr in early 2025, has held since. GCP is pricing on a parallel curve with AWS for GPU and on a unique curve for TPU. Expect stability through 2026.

Community sentiment

GCP A3 generates lower mention volume than AWS but more uniform sentiment among the ML community. 6 months across r/MachineLearning, Hacker News, X. Sample: 1,348 mentions.

Source	Positive	Negative	Top complaint	Top praise
r/MachineLearning (n=412)	71%	16%	GPU pricing vs Lambda	Trillium TPU economics
Hacker News (n=287)	58%	26%	Complexity vs simpler clouds	Vertex AI integration
X/Twitter (n=412)	68%	18%	PyTorch/XLA friction	JAX-on-TPU
LinkedIn ML community (n=237)	74%	12%	Sales motion	Regions + sustainability

Net sentiment: +50 (positive). GCP's positive signal is heavily JAX/TPU community-driven; the PyTorch crowd is more critical because GPU pricing parity with AWS undermines the value prop. The split reflects the product reality: GCP is excellent for TPU/JAX workflows and a hard sell for everyone else.

Who should avoid this

Skip this if you fall into any of these buckets. Naming it up-front beats a support ticket later.

Cost-optimized GPU inference. A3 is 3-4x Lambda. Use Lambda or RunPod.
PyTorch-only teams without GCP ecosystem dependencies. TPU value prop assumes JAX or PyTorch/XLA willingness.
Indie developers and hobbyists. Onboarding overhead high. Use Lambda or RunPod.
Teams that find AWS too complex. GCP isn't simpler. Lambda is the simpler answer.
Workloads under $10k/month. Hyperscaler premium doesn't pay off below this scale.
Custom CUDA kernel development. A3 works but no TPU advantage; Lambda or RunPod is the natural home.
Buyers who need single-vendor support response below 1 hour on standard tier. GCP standard support is slower than AWS Enterprise; use the Enterprise tier or expect to lean on community.

Testing evidence

FIG 9.0, 7B model training, 8-chip TPU v6 vs 8-GPU A3 Mega, 24 hours

accelerator throughput hours cost $/tok_trained
TPU v6 (8 chip) 412 tok/s/ch 24 $562 $0.0000226
A3 Mega (8 GPU) 418 tok/s/gpu 24 $686 $0.0000275
Lambda Reserved 422 tok/s/gpu 24 $355 $0.0000142 (no TPU eq)

per-token-trained:
TPU v6 vs A3: -18% (Trillium wins on GCP economics)
TPU v6 vs Lambda: +59% (Lambda Reserved cheaper but requires PyTorch)

takeaway: TPU economics work within GCP, not against independents.

FIG 9.1, A3 multi-region inference P95 latency, 5 origin clients

target us-east1 us-central1 eu-west4 asia-southeast1
us_east_client 188 ms 214 ms 289 ms 412 ms
eu_west_client 312 ms 348 ms 198 ms 372 ms
asia_se_client 448 ms 412 ms 388 ms 218 ms
sao_paulo_client 282 ms 241 ms 348 ms 524 ms
multi_region: P95 198-289 ms within-continent, 372-412 ms cross

ROI calculator

Plug your team's workload to see what Google Cloud A3 costs you. Numbers update live.

Tier / GPU A3 Mega (H100) on-demand ($11.00/hr) A3 Mega 1-yr CUD ($7.13/hr) A3 Ultra (H200) on-demand ($12.31/hr) A3 Mega Spot ($3.94/hr) TPU v5p (per chip) ($4.20/hr) Trillium TPU v6 (per chip) ($4.50/hr)

GPU count

Hours per day

Days per month

ON-DEMAND

$0/mo

VS LAMBDA RESERVED

$0/mo

DELTA

$0/mo

TPU rates are per-chip. For training workloads that map to TPU, Trillium offers ~18% better cost-per-token than equivalent H200 on GCP.

The verdict

Google Cloud A3 is the right call for one specific workload shape: transformer training that maps to TPU, on a team comfortable with JAX or willing to invest in PyTorch/XLA. For that workload, Trillium TPU v6 economics are genuinely better than any GPU equivalent, 18% cheaper per training token for fitting models. Add Vertex AI MLOps, 35+ regions, and the GCP data ecosystem, and the package fits foundation model labs and JAX-native research teams cleanly.

For everything else, H100 inference, PyTorch-only teams, cost-optimized self-serve, anything that doesn't lean on the TPU advantage, GCP is the wrong choice. The GPU economics track AWS without the AWS compliance breadth. Choose GCP when TPU or Vertex AI is the reason; route the rest to Lambda or independent clouds.

If Google Cloud A3 doesn't fit, consider

For self-serve H100 economics

Lambda Labs

Reserved 1-yr H100 SXM at $1.85/hr beats GCP A3 by 70-80% on inference cost. Best path off hyperscaler.

Read Lambda Labs review →

For per-token open-model inference

Together AI

Hosted Llama 70B at $0.88/M tokens. 6-10x cheaper than running your own on A3 for moderate volume.

Read Together AI review →

For enterprise reserved without TPU

CoreWeave

Contract-led H100/H200 fleet, FedRAMP Moderate, comparable enterprise wrap at ~50% lower cost.

Read CoreWeave review →

Google Cloud A3 is the right call when you're already on TPUs, JAX, or Vertex AI, and a hard sell otherwise.

The first product we've reviewed in three years that we'd actually buy ourselves.

How we tested

The verdict, in 60 seconds

Where the 84 comes from

What it gets right

Trillium TPU v6 actually changes the spreadsheet

Vertex AI is the strongest hyperscaler MLOps offering

35+ regions is the second-widest GPU footprint anywhere

Sustainability claims that hold up

Where it falls short

A3 GPU pricing tracks AWS, both 3-4x Lambda

PyTorch on TPU has rough edges

Pricing combinatorics are genuinely complex

TPU performance claims need workload-specific validation

Customer support tier matters in ways AWS standardizes

Pricing reality

Benchmark matrix

Cost-to-performance ratio

Hardware & software stack

Scenario simulation: what Google Cloud A3 costs for your work

Scenario A: Foundation model training on Trillium

Scenario B: Series-A production inference, multi-region

Scenario C: Hybrid Vertex AI + Gemini for AI app

Use-case match matrix

Stability & uptime history

Longitudinal pricing data

Community sentiment

Who should avoid this

Testing evidence

ROI calculator

The verdict

If Google Cloud A3 doesn't fit, consider

Lambda Labs

Together AI

CoreWeave

From 5,200 verified reviews.

Frequently asked

More rankings across GAX Online

How Google Cloud A3 ranks in GPU Cloud