:::warning Partial Information
Some specifications on this page are based on Qualcomm official press releases and media reports. Key parameters such as FP16/BF16/FP8 compute, TDP, memory capacity, etc. have not been publicly disclosed. This page will be updated after Qualcomm releases the complete technical white paper.
:::
Product Overview#
Qualcomm AI250 is a chip solution for data center AI inference released by Qualcomm Technologies in October 2025, and is the upgraded version of AI200. It adopts a near-memory computing architecture, achieving >10× memory bandwidth improvement through reconstructed memory access paths, while significantly reducing power consumption. It delivers a leapfrog improvement in energy efficiency and performance for AI inference workloads, suitable for applications with stringent real-time requirements. Expected mass deployment in 2027.
Strategic Position: Qualcomm AI250 adopts an innovative near-memory computing architecture, representing Qualcomm's differentiated competitive product in the data center AI chip market. Compared to traditional architectures (CPU/GPU/ASIC), near-memory computing architecture significantly reduces memory access latency and power consumption, representing an important direction for next-generation AI inference chips.
Core Specifications (Partial)#
| Item | Parameter |
|---|
| Architecture | Near-Memory Computing |
| Process | Not disclosed (estimated 3nm) |
| FP16/BF16 | Not disclosed |
| FP8 | Not disclosed |
| INT8 | Not disclosed |
| Memory | Not disclosed (estimated 1-2 TB) |
| Memory Bandwidth | >10× improvement (vs traditional architecture) |
| TDP | Not disclosed (but significantly reduced) |
| Release Date | October 2025 |
| Commercial Availability | 2027 |
| Positioning | Data center AI inference (high-end) |
Near-Memory Computing Architecture#
| Dimension | Description |
|---|
| Architecture Feature | Compute units placed close to memory, reducing data movement |
| Bandwidth Improvement | >10× (vs traditional architecture) |
| Power Reduction | Significant (memory access accounts for high proportion of power) |
| Latency Reduction | Significant (reduces memory access latency) |
| Suitable Scenarios | Large language model inference, real-time AI applications |
Comparison with AI200#
| Metric | Qualcomm AI250 | Qualcomm AI200 | Improvement |
|---|
| Architecture | Near-memory computing | Traditional architecture | Innovation |
| Memory bandwidth | >10× improvement | Not disclosed | Significant |
| Power consumption | Significantly reduced | Not disclosed | Optimized |
| Commercial availability | 2027 | 2026 | 1 year later |
| Positioning | High-end inference | Mid-range inference | AI250 higher-end |
Suitable Scenarios#
- ✅ Large language model (LLM) inference (near-memory computing optimzed)#
- ✅ Real-time AI applications (low latency)#
- ✅ Multi-modal model (LMM) inference#
- ✅ Energy-sensitive (low power consumption)#
- ❌ Model training (positioned for inference)#
- ❌ 2026 deployment (mass deployment in 2027)#
External Links#