Open Compute Project accelerating deployment of next-gen AI clusters

At its inception, the OCP Marketplace AI Portal already has many vendors showcasing their AI wares, a significant resource for AI cluster builders. 

author-image
DQI Bureau
New Update
OCP
Listen to this article
0.75x 1x 1.5x
00:00 / 00:00

Open Compute Project Foundation (OCP), the non-profit organization bringing hyperscale innovations to all, has taken the next important step in establishing itself as the premiere open organization  accelerating deployment of open systems for AI,  with the opening of an AI portal on the OCP Marketplace. 

Advertisment

This site will become the one location for AI Cluster designers and builders to find the latest available AI Infrastructure products, white papers covering upcoming innovations and standardizations, best practice documents, and reference material needed to successfully design and build AI Clusters. At its inception, the OCP Marketplace AI Portal already has many vendors showcasing their AI wares, a significant resource for AI cluster builders. 

With hyperscale operators encountering unprecedented challenges with compute density, power distribution, interconnect and cooling, in building AI clusters composed of racks consuming as much as 1MW, OCP's collaborative community of over 400 corporate members and 6,000 active engineers is developing open standards to address bottlenecks that threaten to constrain AI infrastructure growth.

“Looking ahead, OCP aims to remain the premier organization for AI infrastructure by focusing on three pillars: 
* standardizing silicon, power, cooling, and interconnects; 
* supporting complete open system development; and 
* providing education through technical workshops, the OCP Marketplace and Academy. 

Advertisment

As AI and HPC continue to redefine computing requirements, OCP's role in fostering development of open, sustainable, and scalable infrastructure appears increasingly vital to the industry's ability to deliver on AI's transformative potential while managing its environmental impact," said George Tchaparian, CEO at the Open Compute Project Foundation.

There are significant shared problems being worked by the OCP Community that include: standardizing rack architectures supporting power envelopes of 250 kW to 1 MW, defining advanced cooling solutions (e.g., liquid cooling) for high-density nodes, building high-voltage, high-efficiency power delivery systems, allowing for multiple, evolving scale-up and scale-out interconnect fabrics for performance, and comprehensive management frameworks for near-autonomous operations. 

The OCP Community, with its Open Systems for AI strategic initiative, endeavors to meet these challenges, and has recently published a Blueprint for Scalable AI Infrastructure and  held a workshop on  AI  Physical Infrastructure .

Advertisment

Alongside the opening of the AI portal on the OCP Marketplace, Meta has completed its contribution of the specification for its Catalina AI Compute Shelf , which is specifically configured to deliver a high-density AI system that supports Nvidia GB200. Catalina is ORv3 based supporting up to 140kW including the Meta Wedge fabric switches for the Nvidia NVL72 architecture. 

This contribution by Meta complements the previous contribution by Nvidia of its MGX-based GB200-NVL72 Platform covering its reinforced OCP ORv3 rack architecture, and its 1RU liquid-cooled MGX compute and switch trays. 

The OCP Open Systems for AI strategic initiative was launched January 2024, in recognition that AI is today’s market most prominent data center use case driving innovations, followed by HPC and the emerging edge. OCP’s greatest strength is its community-driven model. 

Advertisment

By uniting leaders, innovators, and experts from across the technology spectrum, OCP is tackling the multi-dimensional design challenges of AI infrastructure. This initiative brings together the work of the OCP Community to deliver the next generation of datacenters and IT equipment to meet AI's scale and workload diversity.

“The AI capable data center build out is now in its third year, with 1st generation systems being deployed and the next generation on the drawing board. Due to the speed with which the market had to move, the 1st generation systems were mostly designed in silos resulting in higher costs due to fragmentations. It is the right time for an organization like the OCP to be facilitating a community to determine commonalities leading to standardizations that can help accelerate the market for future generations of AI cluster deployments,” said Ashish Nadkarni, Group VP and GM, Worldwide Infrastructure at IDC.

nvidia hyperscale AI chips