Microsoft Maia 200 (Microsoft 2nd-gen AI inference accelerator)#

Product Overview#

Microsoft Maia 200 is Microsoft's second-generation self-developed AI accelerator officially released on January 26, 2026, specifically designed for hyperscale AI inference. It adopts TSMC 3nm process, integrating over 140 billion transistors, delivering 10+ PFLOPS FP4 and 5+ PFLOPS FP8 compute per chip. It introduces native FP8/FP4 tensor cores for the first time in Microsoft's self-developed chips, and features 216GB HBM3e memory with 7TB/s ultra-high bandwidth.

Positioning: Maia 200 is Microsoft's highest-performance self-developed silicon to date, and the most cost-effective inference system ever deployed on Azure—delivering 30% better performance per dollar compared to the latest-generation hardware in the Azure fleet.

Core Specifications#

Item	Parameter
Architecture	Maia 200 SoC (Tile-Cluster-SoC three-tier hierarchical architecture)
Process	TSMC 3nm (N3P)
Transistor Count	Over 140 billion
FP4 Compute	10+ PFLOPS (native tensor cores)
FP8 Compute	5+ PFLOPS (native tensor cores)
HBM Type	HBM3e
HBM Capacity	216 GB
HBM Bandwidth	7 TB/s
On-chip SRAM	272 MB
Scale-up Bandwidth	2.8 TB/s (bidirectional per accelerator)
TDP	750 W (SoC)
Cluster Scale	Up to 6,144 accelerators
Network	Standards-based Ethernet two-tier scale-up network
Launch Date	January 26, 2026
First Deployment	US Central (near Des Moines, Iowa)
Subsequent Deployment	US West 3 (near Phoenix, Arizona)

Architecture Details#

Tile-Cluster-SoC Three-Tier Hierarchical Architecture#

Tile: Basic compute unit, containing tensor cores, SRAM, DMA engines
Cluster: Multiple Tiles connected via on-chip network (NoC), sharing L2 SRAM
SoC (System-on-Chip): Multiple Clusters connected via global NoC, interfacing with HBM3e and high-speed network

Memory Subsystem Optimization#

Optimized for narrow-precision data types: FP4/FP8 have small data width, memory bandwidth is the key bottleneck
Dedicated DMA engines: High-bandwidth data transfer, reducing CPU intervention
272MB on-chip SRAM: Stores hot weights and activation values, reducing HBM access frequency
Dedicated on-chip network (NoC): High-bandwidth, low-latency on-chip communication

Expansion Network Design#

Standards-based Ethernet: No dependency on proprietary network architectures (e.g., NVIDIA NVLink)
Two-tier scale-up network: Achieved through custom transport layer and tightly integrated NIC
Unified Maia AI transport protocol: Seamless communication within node, within rack, and across racks, minimizing network hops
Supports up to 6,144 accelerator clusters: Enables predictable, high-performance collective communication operations

Comparison with Competitors#

Metric	Maia 200	AWS Trainium 3	Google TPU v7	NVIDIA H200
FP4 Compute	10+ PFLOPS	~3.3 PFLOPS	~5 PFLOPS (estimated)	1.98 PFLOPS
FP8 Compute	5+ PFLOPS	~6.6 PFLOPS	~5 PFLOPS	1.97 PFLOPS
HBM Capacity	216 GB	128 GB (estimated)	192 GB	141 GB
HBM Bandwidth	7 TB/s	~3.5 TB/s (estimated)	~4 TB/s	4.8 TB/s
Process	TSMC 3nm	TSMC 4nm (estimated)	TSMC 4nm	TSMC 4NP
Cluster Scale	6,144	16,384 (Trn2 UltraCluster)	9,216 (Ironwood)	576 (NVL576)
Performance per Dollar	+30% (vs Azure prev-gen)	—	—	—

Key Advantage: Maia 200's FP4 performance is 3× that of AWS Trainium 3, and FP8 performance exceeds Google TPU v7.

Azure Deployment & Ecosystem#

First Deployment Regions#

US Central (near Des Moines, Iowa): Starting January 2026
US West 3 (near Phoenix, Arizona): Coming soon
Future expansion: More Azure regions will be deployed sequentially

Supported Workloads#

OpenAI GPT-5.2 series: Providing compute for Microsoft Foundry and Microsoft 365 Copilot
Microsoft Superintelligence Team: Used for synthetic data generation and reinforcement learning, optimizing next-generation self-developed models
Synthetic data pipeline: Unique design accelerating high-quality, domain-specific data generation and filtering

Maia SDK (Preview)#

Triton compiler: Kernel compilation optimzed for Maia 200 architecture
PyTorch support: Seamless migration of existing PyTorch models
NPL low-level programming language: For fine-grained control requirements
Maia simulator and cost calculator: Optimize efficiency early in code lifecycle

Energy Efficiency & TCO#

Metric	Maia 200	Azure previous-gen hardware
Performance per dollar	+30%	Baseline
Power (single accelerator)	750W	~800-1,000W (estimated)
Cooling Solution	2nd-gen closed-loop liquid cooling (HXU)	Air/liquid hybrid
TCO (total cost of ownership)	Reduced (efficiency improvement + Ethernet standard network)	Baseline

Comparison with Previous-Gen Maia 100#

Metric	Maia 100 (2023)	Maia 200 (2026)	Improvement
Process	TSMC 5nm	TSMC 3nm	More advanced
Transistor Count	~50 billion (estimated)	140 billion+	2.8×
FP4 Support	❌ Not supported	✅ Supported	New
FP8 Support	✅ Supported (non-native)	✅ Native tensor cores	Optimized
HBM Capacity	64 GB (estimated)	216 GB	3.4×
HBM Bandwidth	~1.6 TB/s (estimated)	7 TB/s	4.4×
TDP	500W (estimated)	750W	1.5×
Deployment Scale	Thousands (Azure)	6,144+	Expanded

Technical Highlights#

1. Native FP4/FP8 Tensor Cores#

FP4: 4-bit floating point, model memory footprint reduced by 75% (vs FP16), inference throughput improved by 4×
FP8: 8-bit floating point, precision close to FP16, compute power 2× that of FP16
Sparsity optimization: Supports structured sparsity, FP4 sparse mode can reach 20+ PFLOPS

2. Ethernet Standards-Based Network#

No proprietary network: Scale-up design based on standards-based Ethernet, reducing deployment cost and complexity
Custom transport layer: Optimized for AI workloads, performance close to proprietary networks
Two-tier network topology: Minimizes network hops, improving large-scale cluster performance

3. Liquid Cooling Native Design#

2nd-gen HXU: Closed-loop liquid cooling heat exchanger unit, natively supports data center deployment
Chip-level telemetry: Real-time monitoring of temperature, voltage, frequency, improving reliability
Azure control plane integration: Security, telemetry, diagnostics, and management at chip and rack levels

Launch Date & Availability#

Official Launch: January 26, 2026 (Microsoft Executive VP Scott Guthrie announced on official blog)
First Deployment: Starting January 2026, US Central region
Availability: Azure cloud service only (physical chips not sold separately)
- Microsoft Foundry (formerly Azure AI)
- Microsoft 365 Copilot
- Azure Virtual Machines (Maia 200 instances)
Maia SDK Preview: Already open for application

Product Overview#​

Core Specifications#​

Architecture Details#​

Tile-Cluster-SoC Three-Tier Hierarchical Architecture#​

Memory Subsystem Optimization#​

Expansion Network Design#​

Comparison with Competitors#​

Azure Deployment & Ecosystem#​

First Deployment Regions#​

Supported Workloads#​

Maia SDK (Preview)#​

Energy Efficiency & TCO#​

Comparison with Previous-Gen Maia 100#​

Technical Highlights#​

1. Native FP4/FP8 Tensor Cores#​

2. Ethernet Standards-Based Network#​

3. Liquid Cooling Native Design#​

Launch Date & Availability#​

External Links#​