References

CEITEC VUT INTRODUCES NEW MULTI-GPU SYSTEM FOR ADVANCED MACHINE LEARNING APPLICATIONS

NVIDIA DGX supercomputers are helping CEITEC VUT scientists in Brno process data from manufacturing machines and robots in real time. They will also be available for small and medium-sized enterprises to test their AI applications.

// Science segment supply

Unique Systems for Scientists and Industrial Enterprises

The CEITEC research center at the Brno University of Technology (VUT) has introduced newly installed computing systems, the Nvidia DGX A100 and Nvidia DGX H100, which will expand research and artificial intelligence application possibilities. These systems combine two generations of Nvidia DGX technology, ensuring exceptional computing capacity and flexibility for various research and industrial applications.

Prof. Ing. Pavel Václavek, Ph.D., head of the Cybernetics and Robotics research group and coordinator of the research program Industrial Cybernetics, Instrumentation, and Systems Integration at CEITEC VUT, highlights the use of these new systems within Digital Europe projects, such as EDIH-DIGIMAT, focused on the digitization and robotization of manufacturing companies, and AI TEF AI-MATTERS, a network of testing environments for AI verification in the industrial sector.

“As part of our EDIH and TEF services, we provide companies with the opportunity to experiment with AI, educate themselves, and test artificial intelligence applications on top-notch systems that are part of the newly installed supercomputer,” explains Prof. Václavek. “This allows small and medium-sized enterprises with up to 499 employees to use advanced technologies at a 100% subsidized price. Our goal is also to integrate the DGX system with other technologies of our RICAIP Testbed Brno so that we can process data from manufacturing machines and robots in real time.”

// professional tuning

High Performance and Optimized Environment

The new NVIDIA DGX A100 and NVIDIA DGX H100 systems, each with eight interconnected GPU accelerators and a total memory of 640 GB, provide powerful tools for massively parallel computing, which is crucial for processing large datasets obtained from manufacturing technologies.

Both computing nodes are interconnected by an InfiniBand network with a transfer speed of up to 200 Gb/s, ensuring extremely fast and efficient communication between systems. In addition to high performance, these systems also provide a robust software layer, including pre-installed and fine-tuned environments for machine learning, enabling easy and quick deployment into operation.

Another advantage is the direct connection to an online database of the most commonly used frameworks and libraries for AI, allowing users to easily download and use various software tools in the form of containers, accelerating the development and implementation of AI applications.

“Thanks to these systems, we can offer companies and our scientists access to the latest technologies, enabling faster and more efficient research,” adds Prof. Václavek. Following the installation of the campus 5G network, this is another addition to the RICAIP Testbed Brno infrastructure in this field.

CEITEC VUT thus confirms its position as a leading research institution in the field of research and the use of the latest technologies to support science and industry.

"As part of our EDIH and TEF services, we provide companies with the opportunity to experiment with AI, educate themselves, and test artificial intelligence applications on top-notch systems that are part of the newly installed supercomputer."
Prof. Ing. Pavel Václavek, Ph.D.
// Hardware

Technical Specifications

NVIDIA DGX systems are not just top-of-the-line hardware; they also come with innovative enhancements for easier infrastructure management and AI implementation. They are equipped with a fine-tuned Docker environment and the DGX OS operating system, and additionally, they now offer the NVIDIA Base Command tool, which enables efficient management of the entire infrastructure. This simplifies the deployment and implementation of AI applications for research and development teams.

The system also includes the NVIDIA AI Enterprise (NVAIE) software stack, which provides a complete set of tools for developing and optimizing AI applications. This combination of technologies facilitates and accelerates the process of developing and deploying AI solutions in various environments.

PARAMETRNVIDIA DGX H100 640 GBNVIDIA DGX A100 640 GB
GPUs8× NVIDIA H100 SXM5 80 GB8× NVIDIA A100 SXM4 80 GB
GPU memory640 GB total640 GB total
CPU2x Intel Xeon Platinum 8480C CPU, (112 jader) 2.00 GHz2× AMD Epyc 7742 (128 jader, 2.25GHz)
Výkon (tensor operace)32 PetaFLOPS (FP8)5 PetaFLOPS (FP16)
# CUDA jader135 16855 296
# Tensor jader4 2243 456
Multi-instantce GPU56 instancí56 instancí
RAM2 TB2 TB
HDDOS: 2× 1.92 TB NVMe
data: 30 TB (8× 3.84 TB) NVMe
OS: 2× 1.92 TB NVMe
data: 30 TB (8× 3.84 TB) NVMe
Network8x ConnectX-7 400Gb/s InfiniBand
4x ConnectX-7 200Gb/s Ethernet
8x ConnectX-7 200Gb/s InfiniBand
4x ConnectX-7 200Gb/s Ethernet
Max. power consumption10,2 kW6.5 kW
Form factorrack, 8Urack, 6U
Specifiactiondatasheetdatasheet

Author

Petr Plodik

Leave a comment

Your email address will not be published. Required fields are marked *