Skip to main content

Microsoft Maia 200 (Microsoft 2nd-gen AI inference accelerator)#

Product Overview#

Microsoft Maia 200 is Microsoft's second-generation self-developed AI accelerator officially released on January 26, 2026, specifically designed for hyperscale AI inference. It adopts TSMC 3nm process, integrating over 140 billion transistors, delivering 10+ PFLOPS FP4 and 5+ PFLOPS FP8 compute per chip. It introduces native FP8/FP4 tensor cores for the first time in Microsoft's self-developed chips, and features 216GB HBM3e memory with 7TB/s ultra-high bandwidth.

Positioning: Maia 200 is Microsoft's highest-performance self-developed silicon to date, and the most cost-effective inference system ever deployed on Azure—delivering 30% better performance per dollar compared to the latest-generation hardware in the Azure fleet.

Core Specifications#

ItemParameter
ArchitectureMaia 200 SoC (Tile-Cluster-SoC three-tier hierarchical architecture)
ProcessTSMC 3nm (N3P)
Transistor CountOver 140 billion
FP4 Compute10+ PFLOPS (native tensor cores)
FP8 Compute5+ PFLOPS (native tensor cores)
HBM TypeHBM3e
HBM Capacity216 GB
HBM Bandwidth7 TB/s
On-chip SRAM272 MB
Scale-up Bandwidth2.8 TB/s (bidirectional per accelerator)
TDP750 W (SoC)
Cluster ScaleUp to 6,144 accelerators
NetworkStandards-based Ethernet two-tier scale-up network
Launch DateJanuary 26, 2026
First DeploymentUS Central (near Des Moines, Iowa)
Subsequent DeploymentUS West 3 (near Phoenix, Arizona)

Architecture Details#

Tile-Cluster-SoC Three-Tier Hierarchical Architecture#

  • Tile: Basic compute unit, containing tensor cores, SRAM, DMA engines
  • Cluster: Multiple Tiles connected via on-chip network (NoC), sharing L2 SRAM
  • SoC (System-on-Chip): Multiple Clusters connected via global NoC, interfacing with HBM3e and high-speed network

Memory Subsystem Optimization#

  • Optimized for narrow-precision data types: FP4/FP8 have small data width, memory bandwidth is the key bottleneck
  • Dedicated DMA engines: High-bandwidth data transfer, reducing CPU intervention
  • 272MB on-chip SRAM: Stores hot weights and activation values, reducing HBM access frequency
  • Dedicated on-chip network (NoC): High-bandwidth, low-latency on-chip communication

Expansion Network Design#

  • Standards-based Ethernet: No dependency on proprietary network architectures (e.g., NVIDIA NVLink)
  • Two-tier scale-up network: Achieved through custom transport layer and tightly integrated NIC
  • Unified Maia AI transport protocol: Seamless communication within node, within rack, and across racks, minimizing network hops
  • Supports up to 6,144 accelerator clusters: Enables predictable, high-performance collective communication operations

Comparison with Competitors#

MetricMaia 200AWS Trainium 3Google TPU v7NVIDIA H200
FP4 Compute10+ PFLOPS~3.3 PFLOPS~5 PFLOPS (estimated)1.98 PFLOPS
FP8 Compute5+ PFLOPS~6.6 PFLOPS~5 PFLOPS1.97 PFLOPS
HBM Capacity216 GB128 GB (estimated)192 GB141 GB
HBM Bandwidth7 TB/s~3.5 TB/s (estimated)~4 TB/s4.8 TB/s
ProcessTSMC 3nmTSMC 4nm (estimated)TSMC 4nmTSMC 4NP
Cluster Scale6,14416,384 (Trn2 UltraCluster)9,216 (Ironwood)576 (NVL576)
Performance per Dollar+30% (vs Azure prev-gen)

Key Advantage: Maia 200's FP4 performance is 3× that of AWS Trainium 3, and FP8 performance exceeds Google TPU v7.

Azure Deployment & Ecosystem#

First Deployment Regions#

  • US Central (near Des Moines, Iowa): Starting January 2026
  • US West 3 (near Phoenix, Arizona): Coming soon
  • Future expansion: More Azure regions will be deployed sequentially

Supported Workloads#

  • OpenAI GPT-5.2 series: Providing compute for Microsoft Foundry and Microsoft 365 Copilot
  • Microsoft Superintelligence Team: Used for synthetic data generation and reinforcement learning, optimizing next-generation self-developed models
  • Synthetic data pipeline: Unique design accelerating high-quality, domain-specific data generation and filtering

Maia SDK (Preview)#

  • Triton compiler: Kernel compilation optimzed for Maia 200 architecture
  • PyTorch support: Seamless migration of existing PyTorch models
  • NPL low-level programming language: For fine-grained control requirements
  • Maia simulator and cost calculator: Optimize efficiency early in code lifecycle

Energy Efficiency & TCO#

MetricMaia 200Azure previous-gen hardware
Performance per dollar+30%Baseline
Power (single accelerator)750W~800-1,000W (estimated)
Cooling Solution2nd-gen closed-loop liquid cooling (HXU)Air/liquid hybrid
TCO (total cost of ownership)Reduced (efficiency improvement + Ethernet standard network)Baseline

Comparison with Previous-Gen Maia 100#

MetricMaia 100 (2023)Maia 200 (2026)Improvement
ProcessTSMC 5nmTSMC 3nmMore advanced
Transistor Count~50 billion (estimated)140 billion+2.8×
FP4 Support❌ Not supportedSupportedNew
FP8 Support✅ Supported (non-native)Native tensor coresOptimized
HBM Capacity64 GB (estimated)216 GB3.4×
HBM Bandwidth~1.6 TB/s (estimated)7 TB/s4.4×
TDP500W (estimated)750W1.5×
Deployment ScaleThousands (Azure)6,144+Expanded

Technical Highlights#

1. Native FP4/FP8 Tensor Cores#

  • FP4: 4-bit floating point, model memory footprint reduced by 75% (vs FP16), inference throughput improved by
  • FP8: 8-bit floating point, precision close to FP16, compute power 2× that of FP16
  • Sparsity optimization: Supports structured sparsity, FP4 sparse mode can reach 20+ PFLOPS

2. Ethernet Standards-Based Network#

  • No proprietary network: Scale-up design based on standards-based Ethernet, reducing deployment cost and complexity
  • Custom transport layer: Optimized for AI workloads, performance close to proprietary networks
  • Two-tier network topology: Minimizes network hops, improving large-scale cluster performance

3. Liquid Cooling Native Design#

  • 2nd-gen HXU: Closed-loop liquid cooling heat exchanger unit, natively supports data center deployment
  • Chip-level telemetry: Real-time monitoring of temperature, voltage, frequency, improving reliability
  • Azure control plane integration: Security, telemetry, diagnostics, and management at chip and rack levels

Launch Date & Availability#

  • Official Launch: January 26, 2026 (Microsoft Executive VP Scott Guthrie announced on official blog)
  • First Deployment: Starting January 2026, US Central region
  • Availability: Azure cloud service only (physical chips not sold separately)
    • Microsoft Foundry (formerly Azure AI)
    • Microsoft 365 Copilot
    • Azure Virtual Machines (Maia 200 instances)
  • Maia SDK Preview: Already open for application