Skip to main content

d-Matrix CORSair (Digital In-Memory Computing Inference Accelerator)#

Product Overview#

d-Matrix CORSair is a data center AI inference accelerator released by d-Matrix on November 19, 2024 at the SC24 Supercomputing conference. It adopts the industry's first Digital In-Memory Compute (DIMC) architecture. Each card features 2GB high-performance on-chip SRAM and up to 256GB off-chip LPDDR5X memory, delivering 2,400 TFLOPS FP8 compute, with memory bandwidth up to 150 TB/s (DIMC architecture advantage).

Core Innovation: CORSair breaks the traditional "compute-memory separation" architecture, performing matrix operations as close to storage as possible, fundamentally solving the memory bandwidth bottleneck in large model inference. Compared to GPU solutions, it achieves up to 10× interaction speed, 3× TCO performance ratio, and 3× energy efficiency improvement.

Core Specifications#

ItemParameter
ArchitectureDIMC (Digital In-Memory Computing) + Chiplet
Compute (FP8)2,400 TFLOPS (8-bit peak)
Compute (FP4)Not disclosed (supports Micro-scaling/MX formats)
On-chip Memory2GB high-performance SRAM
Off-chip MemoryUp to 256GB LPDDR5X
Memory Bandwidth150 TB/s (DIMC architecture)
InterconnectDMX Link (inter-die) + DMX Bridge (cross-card)
Form FactorPCIe Gen5 full-height full-length (FHFL)
TDPNot disclosed (supports air/liquid cooling)
ProcessNot disclosed (Chiplet multi-die packaging)
Announcement DateNovember 19, 2024 (SC24)
Market ReleaseQ2 2025 (broad release)
Early SamplingQ4 2024 onwards

DIMC Architecture Detailed#

Traditional Architecture vs DIMC Architecture#

DimensionTraditional GPU Architectured-Matrix DIMC Architecture
Compute & MemorySeparated (HBM connected to GPU cores via interface)Integrated (matrix operations completed inside memory)
Memory Bandwidth BottleneckSevere (HBM3e max 8TB/s)Broken (DIMC provides 150TB/s)
Data Transfer EnergyHigh (accounts for 60-80% of total energy)Low (compute executed where data is stored)
Inference LatencyHigher (1-10 ms/token)Extremely low (1 ms/token @ Llama 3 8B)

Chiplet Packaging Design#

  • Each card contains 2 dies
  • Each die consists of 4 Chiplets
  • Each Chiplet equipped with:
    • 2GB SRAM (high-performance memory)#
    • LPDDR5X channels (connecting to 256GB off-chip memory)#
    • RISC-V scheduling engine (managing task scheduling and data flow)#
  • DMX Link: High-efficiency die-to-die interconnect within package
  • DMX Bridge: Cross-card interconnect (two cards can be connected via DMX Bridge)#

Inference Performance#

Real-world Workload Performance#

ModelCORSair Performancevs GPU Solution
Llama 3 8B60,000 tokens/sec (single server)10× interaction speed
Llama 3 70B30,000 tokens/sec (single rack)3× TCO performance ratio
Latency (8B)1 ms/token10× reduction
Latency (70B)2 ms/token3× reduction
Energy EfficiencyNot disclosed (TDP not disclosed)3× improvement

Key Advantage: CORSair is specifically optimzed for inference-intensive enterprise AI workloads, solving the "high latency, high power consumption, high cost" pain points of large model inference.

Comparison with Competitors#

Metricd-Matrix CORSairNVIDIA H200NVIDIA B200Groq LPU
FP8 Compute2,400 TFLOPS1,970 TFLOPS4,500 TFLOPS~1,000 TFLOPS (estimated)
Memory Capacity256GB LPDDR5X141GB HBM3e192GB HBM3e~100GB (estimated)
Memory Bandwidth150TB/s (DIMC)4.8TB/s8TB/s~10TB/s (SRAM)
Inference Latency1-2 ms/token5-10 ms/token3-5 ms/token<1 ms/token
TDPNot disclosed700W1,000W~500W (estimated)
CostLow (LPDDR5X vs HBM)HighHighMedium
PositioningInference (in-memory compute)Training+inferenceTraining+inferenceInference (SRAM)

Software Stack & Ecosystem#

Aviator Software Stack#

  • Unified management: All d-Matrix hardware innovations managed through Aviator software stack
  • Ecosystem compatibility: Provides AI developers with consistent user experience and toolchain as existing ecosystem
  • PyTorch support: Seamless migration of existing PyTorch models
  • ONNX support: Supports ONNX format model import
  • Micro-scaling/MX format: Industry's first native support for block floating-point numerical formats (become OCP standard)#

OEM/System Integrator Partnerships#

  • Supermicro: Liquid-cooled/air-cooled AI servers with CORSair
  • GigIO: High-density inference nodes
  • Liquid: Liquid cooling solutions
  • Market release: Q2 2025 broad release#

Comparison with Previous-Generation Products#

Metricd-Matrix previous-gen (2023)CORSair (2025)Improvement
ArchitectureEarly DIMC prototypeProduction DIMC + ChipletMature
Compute (FP8)~500 TFLOPS (estimated)2,400 TFLOPS4.8×
Memory Capacity~64GB (estimated)256GB
Memory Bandwidth~10TB/s (DIMC)150TB/s15×
Form FactorEngineering sample cardPCIe Gen5 FHFL (production)Commercialized
Software StackEarly versionAviator (mature)Complete

Launch Date & Availability#

  • Official Announcement: November 19, 2024 (SC24 Supercomputing conference)#
  • Early Sampling: Q4 2024 onwards (early access customers)#
  • Broad Market Release: Q2 2025 (already released)#
  • Availability:
    • Direct purchase (d-Matrix official website)#
    • Through OEM partners (Supermicro, GigIO, Liquid)#
    • Cloud service (expected 2026)#

Technical Highlights#

1. Digital In-Memory Computing (DIMC)#

  • Breaks memory wall: Performs matrix operations inside memory, eliminateng data transfer bottleneck
  • 150TB/s effective bandwidth: Far exceeds HBM3e (4.8TB/s) and HBM4 (8TB/s)#
  • Low latency: 1-2 ms/token (70B model), suitable for real-time interactive scenarios#

2. Chiplet Architecture#

  • Modular design: 4 Chiplets form one die, 2 dies form one card
  • Flexible expansion: Multi-card interconnect via DMX Link and DMX Bridge
  • Cost optimzation: LPDDR5X is 3-5× cheaper than HBM3e#

3. Block Floating-Point Format Support#

  • Micro-scaling/MX format: Industry's first native support (become OCP standard)#
  • Precision-efficiency balance: MXFP8 precision close to FP16, compute 2× that of FP16
  • OCP standard: Standard format jointly promoted by Meta, AMD, Intel, and other companies#

Suitable Scenarios#

  • Real-time dialog AI: 1-2 ms/token latency, suitable for customer service, assistants, etc.
  • Large-scale inference service: 60,000 tokens/sec (8B model), supports high concurrency
  • Enterprise-grade AI workloads: TCO 3× lower than GPU solutions
  • Edge inference: PCIe Gen5 standard form factor, suitable for edge data center deployment#