/dq/media/media_files/2025/10/28/untitled-design-2-2025-10-28-13-44-47.png)
Qualcomm Technologies has unveiled its latest solutions for data center AI inference: the Qualcomm AI200 and Qualcomm AI250 accelerator cards and racks. These products focus on providing performance and superior memory capacity for generative AI inference, aiming for a lower total cost of ownership (TCO) for enterprises.
Focus on Generative AI and TCO
The AI200 and AI250 solutions are specifically designed for demanding AI workloads, particularly Large Language Models (LLMs) and Large Multimodal Models (LMMs). Qualcomm is leveraging its experience in Neural Processing Unit (NPU) technology to offer rack-scale performance.
The Qualcomm AI200 is a rack-level AI inference solution built to provide lower TCO. It offers high memory capacity, supporting 768 GB of LPDDR per card. This capacity allows for better flexibility and scale when running large AI models.
AI250 introduces novel memory architecture
The Qualcomm AI250 stands out by introducing a new memory architecture based on near-memory computing. This design aims to significantly improve memory bandwidth and power consumption for AI inference tasks. Qualcomm states the AI250 delivers over ten times higher effective memory bandwidth compared to previous generations. This enables disaggregated AI inferencing, which allows hardware resources to be used more effectively, meeting customer needs for performance and cost control.
Rack solutions and security
Both the AI200 and AI250 solutions feature complete rack infrastructure. Key features include:
Direct liquid cooling for thermal management.
PCIe for scaling within a rack (scale up).
Ethernet for connecting multiple racks (scale out).
Confidential computing to secure AI workloads.
A rack-level power consumption of 160 kW.
Software and developer support
Qualcomm has developed a complete AI software stack tailored for AI inference. This software supports major machine learning (ML) frameworks and inference engines. It also includes support for LLM/LMM inference techniques, such as disaggregated serving.
Developers can onboard models easily and utilize one-click deployment for Hugging Face models through Qualcomm Technologies’ Efficient Transformers Library and Qualcomm AI Inference Suite. The software package also includes tools, libraries, APIs, and pre-built AI applications to help operationalize AI.
Availability and future plans
Qualcomm plans a multi-generation data center AI inference roadmap. The Qualcomm AI200 is expected to be available commercially in 2026, followed by the Qualcomm AI250 in 2027. The company has committed to an annual release cadence for its data center AI products.
/dq/media/agency_attachments/UPxQAOdkwhCk8EYzqyvs.png)
Follow Us