AI servers for research and development of LLMs at Charles University, Prague
With regard to the number and quality of research centres and research infastructures, the Czech Republic is at the forefront of the European Union. One such centre is LINDAT/CLARIAH-CZ, or the Digital Research Infrastructure for Language Technology, Arts and Humanities.
What is LINDAT/CLARIAH-CZ
LINDAT/CLARIAH-CZ is a Czech data centre providing certified storage and computer language processing services. It is a unique large research infrastructure, dealing mainly with linguistic but also with other digital resources and tools for their processing.
LINDAT/CLARIAH-CZ also offers know-how, software tools for the processing of language and other digital resources and the development of language technologies for industry and services, including use in new cultural and creative industries. LINDAT/CLARIAH-CZ engages in international collaborations between similar research infrastructures and directly between institutions in all humanities disciplines and emphasises digital and interdisciplinary processing methods, including advanced machine learning and artificial intelligence.
The project is led by the team of Prof. Jan Hajič, PhD, Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics.
About the solution
Charles University requested improvements to the IT infrastructure used to develop the necessary language technologies. These new technologies nowadays almost exclusively use machine learning and artificial intelligence methods, which are highly computationally intensive in the learning phase and cannot be operated without specialized hardware, i.e. without a large cluster using powerful graphics cards (referred to as “GPUs”).
Within the framework of the ongoing Operational Programme Science, Research, Education “LINDAT/CLARIAH-CZ – Expansion of the repository, services and computing cluster of research infrastructure, Charles University announced a contract for the supply of servers to strengthen the application cloud for the operation of services provided in high availability mode and to increase the capacity of the fast data storage Lustre. The contract also included the delivery of several powerful laptops for application development. We have won the contract, met all the conditions and offered a solution based on servers from Hewlett Packard Enterprise (HPE).
Optimised tailor-made solutions
The solution consisted of the delivery of several different HPE ProLiant servers and HPE Apollo systems using AMD processors.
HPE ProLiant servers are at the heart of systems that automate environment management and optimize performance for a specific type of workload or job to deliver results in less time. The ProLiant server family offers versatility and a wide range of features that make it suitable for any solution – from 5G, edge and AI to hyper-converged infrastructure, containerization and various CPU and GPU architectures.
HPE Apollo systems are specifically designed to support demanding HPC computing operations (e.g., modeling or simulation). This helps companies speed up development, especially by processing large volumes of data and using digital models to simulate the real world. The systems are also adapted for artificial intelligence to optimise the system’s ability to learn and provide the highest quality output.
Servers for neural network processing |
---|
2 x HPE ProLiant XL675d/Apollo 6500 server |
each of them equipped with AMD CPU (32 cores in total) and each of them equipped with 8 interconnected NVidia A100 graphics cards, 40GB in SXM4 version |
110 592 CUDA cores in total |
640 GB of GPU memory in total |
Total FP16 power – 624 TFlops |
HPE ProLiant XL675d Gen10 Plus
HPE ProLiant DL385 Gen10 Plus server
Hybrid CPU-GPU server |
---|
10x HPE ProLiant DL385 Gen10+ |
equipped with AMD CPU (Total 32 cores) and each server is also equipped with 3x NVidia A40 graphics accelerator, 48GB |
322 560 CUDA cores in total |
1,440 GB total GPU memory |
Total FP16 power – 2,244 TFlops |
High density CPU server: 2x 4-node |
---|
2x HPE Apollo n2600 Gen10+ / XL225n Gen10+ high density server |
8-node total (each NODe equipped with 32 computing cores) |
256 GB RAM |
1x MDS server for LustreFS | 4x OSS server for LustreFS |
---|---|
HPE ProLiant DL325 Gen10+ v2 | HPE ProLiant DL325 Gen10+ v2 |
16 computing cores | 16 computing cores |
128GB RAM | 128GB RAM, 30.72TB NVMe SSD capacity |
HPE ProLiant XL225n Gen10 Plus
Installation and commissioning
The implementation of this project was carried out at 2 different sites, which placed greater demands on organisational and logistical planning.
The challenge was to integrate our solution into various racks within the data halls of Charles University with respect to the required power and cooling requirements of individual servers. Thanks to our experience from different types of projects, everything went smoothly and after verifying the functionality of all components, the project was successfully handed over to the customer for deployment.