References
NVIDIA DGX H100 MUNI

CERIT-SC at Masaryk University in Brno got the most advanced AI computing system

Masaryk University has become a pioneer in artificial intelligence (AI) and computing technology by installing the latest and most advanced NVIDIA DGX H100 system. It is the first solution of its kind in the region, bringing extreme computing power and innovative research capabilities.

Who uses the DGX H100 system

CERIT-SC at Masaryk University and their newly installed NVIDIA DGX H100 system open the door to close collaboration with scientists from all over the region through e-INFRA CZ. This prestigious network brings together leading research centres and institutions from the Czech Republic that focus on advanced computing technologies and research in the field of artificial intelligence. We have been involved in the e-INFRA CZ project with previous infrastructure deliveries, more here.

CERIT-SC is part of the national e-infrastructure, which is a complex system of interconnected network, computing and storage capacities and related services for the research community in the Czech Republic. CERIT-SC complements the other two components of the national e-infrastructure – the CESNET association and the IT4Innovations supercomputer centre.

Scientists connected to e-INFRA CZ will have access to NVIDIA DGX H100 resources at Masaryk University and will be able to use its computing capacity for their projects. This collaboration will provide an environment for innovative research and development in AI and accelerate progress in areas such as machine learning, big data analytics and AI application development.

e-INFRA CZ is a unique e-infrastructure for research and development in the Czech Republic, which represents a transparent environment providing comprehensive capacity and resources for the transfer, storage and processing of scientific data to all entities engaged in research and development, regardless of the sector in which it is carried out. It creates a communication, information, storage and computing base for research and development at national and international level and provides a comprehensive portfolio of ICT services without which modern research and development cannot be realised.

Why DGX H100

  • Being able to train big deep learning models thanks to GPUs memory size.
  • Faster processing of large datasets
  • Ability to work on multiple projects simultaneously

Areas of application

  • speach recognition analysis
  • 3D image reconstruction
  • detection of neurodegenerative diseases

AI / DL Engines used

  • TensorFlow / Keras
  • PyTorch, PT Lightning
  • CUDA / cuDNN

The purchase and installation of the NVIDIA DGX H100 system is an important milestone for CERIT-SC and the entire e-INFRA CZ infrastructure. This system takes our AI capabilities to a whole new level and opens up a wide range of new research opportunities for our students and scientists.

RNDr. Lukáš Hejtmánek, Ph.D.

What makes the NVIDIA DGX system special?

The NVIDIA DGX H100 is a unique solution for artificial intelligence and research thanks to its powerful hardware and innovative architecture that enables processing of massive amounts of data with high speed and accuracy.

 

This system is equipped with the latest NVIDIA H100 GPUs (graphics processing units), which provide up to 10x more performance compared to the previous generation NVIDIA A100 Ampere GPUs.

 

It offers extreme computing power of up to 32 PFLOPS (petaFLOPS), making it one of the most powerful AI and research solutions on the market.

 

The system supports the latest frameworks and libraries for machine and deep learning, such as TensorFlow, PyTorch, Caffe and others, allowing scientists and researchers to use a wide range of tools for their projects.

 

This system is designed to minimize the time required for AI application development with its powerful infrastructure, optimized algorithms and fine-tuned software with professional support from Nvidia engineers.

 

One of the exceptional features of the NVIDIA DGX H100 is its parallel processing capability, which enables rapid development and training of complex AI models.

 

NVIDIA DGX H100 supports high memory bandwidth and fast data transfers, enabling efficient handling of large datasets and complex analyses.

 

The system is equipped with advanced cooling and energy management, ensuring reliable operation even under high load and contributing to efficient use of electricity.

NVIDIA DGX A100
NVIDIA DGX A100

Technical specification

NVIDIA DGX systems aren’t just cutting-edge hardware, they also come with innovative enhancements for easier infrastructure management and AI implementation. They feature a fine-tuned Docker environment and DGX OS, in addition to the new NVIDIA Base Command tool that enables efficient management of the entire infrastructure. This simplifies the deployment and implementation of AI applications for research and development teams.

The system also includes the NVIDIA AI Enterprise (NVAIE) software stack, which provides a complete set of tools for developing and optimizing AI applications. This combination of technologies facilitates and accelerates the process of developing and deploying AI solutions across the entire infrastructure.

  • 8x NVIDIA H100 GPU with 640 gigabytes of total GPU memory18x NVIDIA NVLink per GPU, 900 GB/s bidirectional throughput between GPUs.
  • Two Intel Xeon Platinum 8480C processors, 112 cores and 2 TB of system memoryPowerful processors for the most demanding AI tasks
  • 30 Terabyte NVMe SSDHigh-speed storage for maximum performance
  • 4x NVIDIA NVSwitch7.2 terabytes per second of bi-directional GPU-to-GPU bandwidth, 1.5x more than the previous generation
  • 10x NVIDIA ConnectX-7 400 GB/s network interface1 TB/s of peak bidirectional network bandwidth
  • Space- and energy-efficient solution with high computing power density8U (rack unit) size and maximum system power consumption of 10.2 kW at a theoretical power of 32 petaFLOPS
  • Complete software layer for AI application developmentAI Enterprise – optimized software suite for AI
    Base Command – sw for orchestration, planning and cluster management
    Operating System – DGX OS / Ubuntu / Red Hat Enterprise Linux / Rocky
PARAMETERNVIDIA DGX H100 640 GB
GPUs8 × NVIDIA H100 SXM5 80 GB
GPU Memory640 GB
CPUDual Intel Xeon Platinum 8480C CPU @ 2 / 3.8T GHz
Performance (TF8)32 PFLOPs (FP8)
# CUDA Cores135 168
# Tensor Cores4 224
MIG56 Instances
RAM2TB
HDDOS 2 × 1.92 TB NVMe, DATA 30 TB (8 × 3.84 TB) NVMe
Network8x single-port ConnectX-7 VPI 400 Gb/s InfiniBand/ 200Gb/s Ethernet
2x dual-port ConnectX-7 VPI 400 Gb/s InfiniBand/ 200Gb/s Ethernet
Max. Consumption10,2 kW Max
Form Factor8U
Technical specificationDatasheet
  

Author

Petr Plodik

Comment (1)

  1. Mikolas
    18/08/2023

    Great story!

Leave a comment

Your email address will not be published. Required fields are marked *