Speak With An Expert

Mail icon to contact Liqid about our composable infrastructure technology

Orange Silicon Valley composed one of the fastest single-node GPU supercomputers in the world with Liqid Matrix… Again

Posted on
July 16, 2021
Written By

Once again, the two companies have collaborated to provide evidence that Liqid composed single-socket GPU systems are among the world’s fastest and most efficient for AI-driven computing.

Traditional supercomputing architecture silos GPU resources, creating inefficiency and datacenter sprawl. Liqid Matrix-based composable disaggregated infrastructure (CDI) software unlocks cloud-flexibility and agility for on-prem supercomputing deployments.

Liqid is excited today to discuss the details of our ongoing work with Orange Silicon Valley (OrangeSV) to deliver one of the industry’s fastest, most and adaptive data center-scale performance for AI workloads!

These new results build on earlier work by OrangeSV and Liqid focused on pooling GPU resources via Liqid composable software (See OrangeSV’s Medium article on its work with Liqid one of the world’s first adaptive AI supercomputing blocks with a Liqid heart).  

Their previous work demonstrated Liqid’s ability to compose a multi-GPU, single-node supercomputer, utilizing any off-the-shelf server and NVIDIA RTX 8000 GPUs. Last year, the composable system powered by Liqid Matrix achieved one of the fastest Imagenet deep learning performance benchmarks ever documented.

Building upon that achievement, OrangeSV and Liqid recently delivered new single-server performance results that continue to demonstrate the superior performance for AI workloads.

Natural Language Processing: OrangeSV and Liqid take AI training to the next level with CDI  

This time OrangeSV obtained NVIDIA A100 GPUs from NVIDIA and a Liqid PCIe Gen 4.0 composable Infrastructure stack and Liqid Matrix composable software.

Working Liqid and Orange’s in-house Innovation Data & AI team, OrangeSV conducted natural language processing (NPL) training with a standardized English-to-German translation task.  The OrangeSV team used Facebook AI’s fairsec language model tool, which is powered by Transformer by PyTorch. The open source framework for accelerated machine learning developed by the social media company is considered the state-of-the-art for AI+ML architectures.

For this exercise, OrangeSV selected one Dell Server and -- using Liqid Matrix CDI software and composable fabric -- assigned 16x A100 GPUs (40GB), 8x GPUs per JBOG (Just a Bunch of GPUs) with Liqid’s 16TB NVM Express for Deep Learning Cache, where training data was stored.

  • Server Specs:  Two-way AMD 7H12 64 Core, 1 TB memory
  • The Liqid Matrix™ composable disaggregated infrastructure software platform  
  • All NVIDIA A100 GPUs had peer-to-peer enabled across Liqid’s PCIe Fabric.

Transformer for pyTorch NPL test results:

Orange executed the standard WMT14_en_de benchmark to collect a standard baseline:

GPU : Nvidia A100 running WMT14_en_de training throughput benchmark using transformer over PyTorch ; Batch Size = 10240

Using this composable configuration, OrangeSV achieved a training throughput of 935,343 tokens/sec and reached minimal validation loss under 1 hour and 49 mins on the WMT14 en de transformer translation task, which is a standardized translation task from English to German. This accomplishment was significant because the company achieved its highest training speed so far using commercially available general-purpose GPUs.

Orange also ran full training to achieve minimal validation loss, which was reached within two hours with a batch size of 10,240 tokens, eventually achieving a maximum batch size of 16,000. This allowed to achieve a throughput of 935,343 tokens/sec. Also, we were able to reach the minimal validation loss (the objective function the Neiral Net is trying to minimize) under 1 hour and 49 mins.

Flexible, fast, and highly efficient: The value of composability speaks for itself in NPL training

Based on the current results, OrangeSV concluded (Liqid concurred) that the company has effectively built upon previous work to again be able to declare they’ve configured one of the fastest single-node deep learning supercomputers by leveraging composable architecture and commercial off-the-shelf general purpose GPUs.

The ability to build in this kind of performance from the ground up is essential to the continued evolution of AI applications upstream. By offering adaptive data performance that is disaggregated and change ready, IT organizations can prepare for and keep pace with the next wave of AI+ML innovation, regardless of bare-metal resource requirements. Compose servers and GPUs across fabrics for efficient, adaptive data performance that can handle whatever new workloads the data demands.

To learn more about OrangeSV’s testing models and see further results here. Find out why a composable disaggregated system from Liqid is the right infrastructure for your ongoing AI requirements, download this free white paper outlining the benefits of CDI bring to organizations that must adapt in real-time to a constantly evolving data ecosystem. If you would like speak with a Liqid CDI expert, go here.

Written by
Posted on
July 16, 2021
in
Composable Supercomputing/HPC
category

Would you like to learn more?

Speak with one of our sales experts to learn more about how we aim to deliver complete composability. For other inquiries, you can drop us a line. We'll get back to you as soon as possible.