Artificial Intelligence

Reduce Job Completion Time for AI Workloads

The AI revolution is underway

Generative AI has astonished the world, prompting people to recognize the potential intelligence of computers. While all hyperscalers are dedicated to the AI race, the traffic patterns and demands of generative AI significantly differ from those of traditional data centers. The constraint is transitioning from computational resources to networking resources.

Unleashing the Potential of Networks in AI Advancements

AI training involves the collaboration of tens of thousands of GPUs to execute computations and deliver results while simultaneously exchanging data in parallel. The data volume is expected to reach a trillion-grade scale, and networking efficiency is emerging as the key metric for measuring training performance. To support the AI training cluster and optimize job completion times, we require high-radix interfaces with low and consistent latency, as well as stable interconnectivity. These elements are crucial for facilitating the syndication of communication among tens of thousands of GPUs.

Optimizing AI Cluster Interconnect with Dual Platform Selection

UfiSpace presents two distinct architectures for enhancing the AI cluster interconnect. The S9300 data center switch series, built on the Broadcom XGS platform, offers high-radix interfaces, high-speed switching, and flexible traffic control. The S9700 DDC solution, leveraging the Broadcom DNX platform, employs internal cell-based switching to achieve perfect load balancing and congestion control. Both solutions are designed to scale out on demand, provide support for hardware-based link failover, and implement end-to-end congestion control. These features contribute to a stable, efficient, and low-latency AI/ML clusters.

 

Learn More