Cambricon MLU590 (Siyuan 590)#

Product Overview#

Cambricon MLU590 (product name Siyuan 590) is Cambricon's third-generation cloud AI training/inference chip, released in 2024, with 2025 mass shipment. It adopts 7nm process + Chiplet packaging, delivering 256 TFLOPS FP16 and 512 TOPS INT8 compute. It is the first Cambricon chip to surpass NVIDIA H20 in energy efficiency ratio (52.3 vs 49.8 TFLOPS/W).

Positioning: Inference + training all-round card, single-card compute is 2× that of MLU370, with only ~50W increase in power consumption. It is a cost-effective choice for domestic large model training/inference.

Core Specifications#

Item	Parameter
Architecture	3rd-generation MLU architecture (Da Vinci-like)
Process	7nm (TSMC, estimated)
Packaging	Chiplet (chiplet technology)
NPU Core Count	128 (or 32 AI large cores, two counting methods)
FP16	256 TFLOPS
FP32	~64 TFLOPS (estimated, 1/4 of FP16)
INT8	512 TOPS
HBM Capacity	48 GB (estimated, pending official confirmation)
HBM Bandwidth	~400 GB/s (estimated, pending official confirmation)
TDP	250 W (typical) / 300 W (max)
Interconnect	MLU-Link 3.0 (8-way high-speed interconnect, max 16 chips form supercompute node)
Board Form Factor	PCIe Gen5 ×16 / OAM
Mass Production	2024 release, 2025 mass shipment
Unit Price (Estimated)	~$8,000–10,000

⚠️ Specification Note: HBM capacity and bandwidth are estimated values (not fully disclosed by official sources). Subject to Cambricon's subsequent official data sheet.

Comparison with MLU370#

Metric	MLU370	MLU590	Improvement
Process	7nm	7nm (Chiplet)	Same process, packaging upgrade
FP16	128 TFLOPS	256 TFLOPS	2×
INT8	256 TOPS	512 TOPS	2×
TDP	~200W	250–300W	+25–50%
Interconnect	MLU-Link 2.0	MLU-Link 3.0	Bandwidth improved
Energy Efficiency	~40 TFLOPS/W	52.3 TFLOPS/W	+31%

Comparison with Competitors (2024–2025 Domestic)#

Metric	MLU590	NVIDIA H20	Ascend 910C	Gap
FP16	256 TFLOPS	~300 TFLOPS	~780 TFLOPS	-15% vs H20, -67% vs 910C
INT8	512 TOPS	~600 TOPS	~1,600 TOPS	Disadvantage
Energy Efficiency	52.3 TFLOPS/W	49.8 TFLOPS/W	Not disclosed	+5% vs H20
Software Ecosystem	CANN	CUDA	CANN	Ecosystem disadvantge
Price	~$8–10K	~$20K+	~$12K	Price advantage

Energy efficiency breakthrough: MLU590 achieves 52.3 TFLOPS/W in ResNet-50 training, first time surpassing H20's 49.8 TFLOPS/W (Chinese Academy of Sciences Institute of Computing Technology test data).

MLU-Link 3.0 Interconnect#

Item	Parameter
Protocol	MLU-Link 3.0 (Cambricon self-developed)
Max Interconnect	8-way (direct) / 16 chips (supercompute node)
vs NVLink 5	Lower bandwidth, but open standard
Cluster Expansion	Supports PyTorch DistributedDataParallel

CANN Software Stack#

Layer	Tool	Description
AI Framework	CANN Runtime	PyTorch / TensorFlow compatible
Graph Compiler	BangC Compiler	Similar to XLA, automatic operator fusion
Quantization Tool	CANN Quant	INT8 / FP8 post-training quantization
Communication Library	CNCL	Collective communication (similar to NCCL)
Model Library	ModelZoo	Pre-optimized ResNet / BERT / GPT

Suitable Scenarios#

✅ Domestic large model training (below 100B parameters, price-performance advantage)#
✅ Inference as a Service (energy efficiency surpasses H20)#
✅ Government/SOE AI projects (supply chain security)#
✅ Computer vision (ResNet-50 optimzed)#
❌ Trillion-parameter LLM training (compute disadvantge)#
❌ CUDA ecosystem strong dependency (requires migration to CANN)#

Product Evolution#

Product	Release	FP16 TFLOPS	Status
MLU270	2020	16 TFLOPS	EOL
MLU370	2022	128 TFLOPS	Current mainstream
MLU590	2024	256 TFLOPS	Current flagship
MLU690	2025+	~512 TFLOPS (estimated)	Next generation

Key Features#

Chiplet packaging: 7nm + Chiplet, yield and cost optimzed
Energy efficiency leadership: 52.3 TFLOPS/W, surpasses H20
MLU-Link 3.0: 8-way interconnect, supports medium-scale clusters
Inference + training all-round: Single card handles both scenarios
Weaknesses: FP16 compute still lower than H20/910C, software ecosystem 5 years vs CUDA 18 years#

Cambricon MLU370 - Previous generation
Cambricon MLU690 - Next generation (estimated)#
Huawei Ascend 910C - Domestic competitor
NVIDIA H20 - Compliance-version competitor#

Product Overview#​

Core Specifications#​

Comparison with MLU370#​

Comparison with Competitors (2024–2025 Domestic)#​

MLU-Link 3.0 Interconnect#​

CANN Software Stack#​

Suitable Scenarios#​

Product Evolution#​

Key Features#​

Related Cards#​

References#​