NVIDIA GB300 (Grace Blackwell 300)

Overview

NVIDIA GB300 (Grace Blackwell 300) is NVIDIA's third-generation Grace Blackwell Superchip, launched 2025 H2. It pairs a B300 Ultra GPU (upgraded Blackwell) with an Arm Grace/Vera CPU via a high-speed NV-HBI interface, delivering 1 GPU + 1 CPU per Superchip. 72× GB300 units form the NVL72 rack, interconnected via NVLink 5 + ConnectX-9 1.6T networking, making it NVIDIA's flagship rack-scale data center AI product.

Generational evolution:

GH200 (2023-Q3): Grace Hopper, H100 + 72-core Arm Grace
GB200 (2024-Q4): Grace Blackwell, B200 + 72-core Arm Grace
GB300 (2025 H2): B300 Ultra + Arm Vera 88-core CPU + ConnectX-9 1.6T

Core Specifications

Item	Spec
Architecture	Grace Blackwell 300 Superchip
GPU Chip	1× B300 Ultra (upgraded Blackwell)
CPU Chip	1× Arm Vera (88 Olympus cores)
NV-HBI	900 GB/s bidirectional CPU-GPU interconnect
CPU-GPU Coherent Memory	Unified addressing (480GB LPDDR5X on Grace + 288GB HBM3E on B300)
GPU Memory	288GB HBM3E (B300 Ultra upgrade)
GPU Bandwidth	10 TB/s
CPU Memory	480GB LPDDR5X
CPU Bandwidth	512 GB/s
FP4 sparse	15 PFLOPS (B300 Ultra single GPU)
FP8 dense	3.75 PFLOPS
BF16 dense	1.875 PFLOPS
TDP (Single Superchip)	1200 W (B300 1000W + Vera 200W)
Form Factor	Board-integrated (non-removable)
Production	2025 H2
Unit Price	~$70,000-80,000 (Superchip module)

GB300 NVL72 Rack

Item	Configuration
Superchip Count	72× GB300
GPU Count	72× B300 Ultra
CPU Count	72× Arm Vera (88 cores × 72 = 6,336 cores)
Total HBM	20.7 TB HBM3E
Total LPDDR5X	34.6 TB
NVLink 5 Intra-domain	130 TB/s full interconnect
ConnectX-9 Egress	72× 1.6T = 115 Tb/s
FP4 sparse Total	1.08 EFLOPS
FP8 dense Total	270 PFLOPS
Rack TDP	~120 kW (including cooling)
Rack Count	8 (standard data center row)
Price	~$3.3M / rack (estimated)

Generational Comparison

Metric	GB300 (2025 H2)	GB200 (2024-Q4)	GH200 (2023-Q3)	Note
GPU	B300 Ultra	B200	H100	Upgrade
CPU	Arm Vera 88-cores	Arm Grace 72-cores	Arm Grace 72-cores	New gen
GPU Memory	288GB HBM3E	192GB HBM3E	96GB HBM3	+50%
GPU Bandwidth	10 TB/s	8 TB/s	3.35 TB/s	+25%
NV-HBI	900 GB/s	900 GB/s	900 GB/s	Same
NVLink Interconnect	130 TB/s	130 TB/s	60 TB/s	Same/2×
Networking	ConnectX-9 1.6T	ConnectX-8 800G	ConnectX-7 400G	2×/4×
FP4 sparse	15 PF (per GPU)	10 PF	N/A (FP8 2 PF)	1.5×
FP8 dense	3.75 PF	2.25 PF	1 PF	1.67×/3.75×
TDP	1200W	1000W	1000W	+20%

NVL72 vs NVL576 Rack-Scale Comparison

Dimension	NVL72 (1 Rack)	NVL576 (8 Racks)
Superchip Count	72	576
GPU Count	72 B300 Ultra	576 B300 Ultra
CPU Count	72 Vera	576 Vera
Total HBM	20.7 TB	165 TB
FP8 dense Total	270 PF	2.16 EF
FP4 sparse Total	1.08 EF	8.64 EF
NVLink Domain	Single rack 130 TB/s	Cross-rack 130 TB/s
Domain Size	72 GPUs	576 GPUs
Rack TDP	120 kW	960 kW
Price	$3.3M	$26M

GB300 NVL576 advantage: 8 racks form a single NVLink domain, 576 GPUs sharing 130 TB/s interconnect, the largest NVLink domain in the industry, critical for ultra-large LLM (trillion-parameter) training.

Arm Vera CPU 88-Core

Dimension	Spec
Architecture	Arm Olympus (Armv9.4)
Cores	88 cores (vs Grace 72 cores)
L3 Cache	Shared 256 MB
LPDDR5X	480GB
Bandwidth	512 GB/s
TDP	200W
PCIe	Gen5 ×32
Features	Enhanced SVE2 + hardware confidential computing
Generation	Armv9.4 successor to Grace v9.0

Vera vs Grace upgrade: Cores +22% (72→88), L3 Cache +33% (192→256MB), Memory +20% (384→480GB). The key improvement is the memory subsystem (critical for LLM inference CPU decode steps).

ConnectX-9 1.6T Networking

Dimension	Spec
Speed	1.6 Tb/s single port
Ports	2-4 per Superchip
Protocol	InfiniBand NDR / NDR400 + RoCE v2
Latency	< 0.5 μs
GPUDirect	GPU-NIC direct DMA
Congestion Control	SHARP v4
PCIe	Gen6 ×16 (GB300 upgraded to Gen6)
2025 Deployment	Mainstream supercomputers (ORNL Frontier successor)

vs ConnectX-8 800G:

Bandwidth 2×
Latency -50%
GPUDirect RDMA 3.0
Supports NVLink over IB (cross-rack NVLink)

Vendor Information

Item	Detail
Company	NVIDIA Corporation
Product Page	https://www.nvidia.com/en-us/data-center/grace-blackwell/
CEO	Jensen Huang
Foundry	TSMC 4NP (B300 Ultra) + TSMC N3 (Vera)
2025 H2 Production	Yes
2026 Roadmap	Rubin + Vera Rubin (RV200)
Price	Superchip ~$70-80K, NVL72 rack ~$3.3M

Key Features

NVL576 domain: 8 racks, 576 GPUs sharing 130 TB/s interconnect (largest in industry)
ConnectX-9 1.6T: 1.6 Tb/s single-port cross-rack networking
Arm Vera 88-core: 1 per Superchip, +30% CPU performance
FP4 15 PFLOPS: Single GPU compute (sparse), optimized for inference
Unified memory: GPU 288GB HBM + CPU 480GB LPDDR5X coherent addressing
Drawbacks: TDP 1200W (single Superchip), software stack CUDA-only compatible

NVL72 Use Cases

✅ Trillion-parameter LLM training (576 GPU domain, 130 TB/s NVLink)
✅ MoE model training (expert parallel + tensor parallel hybrid)
✅ Ultra-large-scale RLHF (576 GPU synchronous)
✅ Multimodal large models (video + text + image)
✅ AI for Science (climate, materials, life sciences)
❌ Small-scale inference (prohibitively expensive)
❌ China market (export controls)

GB300 vs AMD MI400 (2026)

Metric	NVIDIA GB300 (2025 H2)	AMD MI400 (2026)	Difference
GPU	B300 Ultra (single)	MI400 (single)	-
Memory	288GB HBM3E	432GB HBM4	MI400 +50%
Bandwidth	10 TB/s	19.6 TB/s	MI400 2×
FP4 dense	7.5 PF	40 PF	MI400 5×
Interconnect	NVLink 5 1.8 TB/s	UALoF 1.3 TB/s	GB300 1.4×
Networking	ConnectX-9 1.6T	Pensando 800G	GB300 2×
Software	CUDA	ROCm	NVIDIA advantage
TDP	1200W (Superchip)	1000W	MI400 -17%

Note: MI400 is single GPU vs GB300 Superchip (including CPU). In pure GPU comparison, MI400 has clear advantages (open HBM4 + open UALoF), but NVL72 rack + ConnectX-9 remain NVIDIA's strengths.

NVIDIA B200 - Previous-gen flagship GPU
NVIDIA B300 Ultra - Same-gen B300 GPU
NVIDIA H100 - Previous-gen mainstream
NVIDIA Rubin R200 - 2026 next-gen
NVIDIA Groq 3 LPX - Inference LPU
AMD MI400 - Competitor
AMD MI355X - Competitor HBM3E
Huawei Ascend 920 - Domestic comparison

Overview​

Core Specifications​

GB300 NVL72 Rack​

Generational Comparison​

NVL72 vs NVL576 Rack-Scale Comparison​

Arm Vera CPU 88-Core​

ConnectX-9 1.6T Networking​

Vendor Information​

Key Features​

NVL72 Use Cases​

GB300 vs AMD MI400 (2026)​

Related Cards​