NVIDIA GPU selector
- Home
- NVIDIA GPU selector
Regurarly updated
Always updated with the latest NVIDIA accelerators and information.
All info on one page
One pager information with all main parameters and AI use-cases.
Benchmarks
We provide real application benchmarks of NVIDIA GPUs for comparison.
GPU selector
Quick guide
This guide will help to choose your AI accelerator quickly.
Expert guide
Performance benchmarks and technical parameters for AI experts.
Older GPUs
You can also find technical specifications and details of older NVIDIA GPUs.
The most powerful GPU - NVIDIA GB200
Compute
AI training and inferencing, data analytics, HPC
General Purpose
Visualization, rendering, AI, virtual workstations
High-Density VDI
Virtual applications, virtual desktops, virtual workstations
Choose your GPU use case
Expert guide
You can scroll the table to the right to see all current NVIDIA GPUs including the most powerfull NVIDIA B200 and NVIDIA H200. You can also select GPU cards you would like to display and compare by clicking on Column visibility button. Some of the GPUs are hidden by default.
GPU | A2 | A10 | A16 | A30 | A40 | L4 | L40 | L40S | V100 SXM2 | PCIe | A100 SXM4 | PCIe | H100 PCIe | H100 NVL | H200 NVL | H100 | H200 SXM5 | GH200 | B200 SXM5 | GB200 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Architecture | Ampere | Ampere | Ampere | Ampere | Ampere | Ada Lovelace | Ada Lovelace | Ada Lovelace | Volta | Ampere | Hopper | Hopper | Hopper | Hopper | Grace+Hopper | Blackwell | Grace+Blackwell |
Card chip | GA107 | GA102 | GA107 | GA100 | GA102 | AD104 | AD102 | AD102 | GV100 | GA100 | GH100 | GH100 | GH100 | GH100 | 1xGrace+1xH100 | B200 | 1xGrace+2xB200 |
# CUDA cores | 1 280 | 9 216 | 4x 1 280 | 6 912 | 10 752 | 7 680 | 18 176 | 18 176 | 5 120 | 6 912 | 14 592 | 16 896 | 16 896 | ||||
# Tensor cores | 40 | 288 | 4x 40 | 224 | 336 | 240 | 568 | 568 | 640 | 432 | 456 | 528 | 528 | ||||
FP64 (TFlops) | 0,07 | 0,97 | 0,271 | 5,2 | 1,179 | 0,49 | 1,41 | 1,41 | 7,8 | 7 | 9,69 | 25,6 | 30 | 34 | 34 | 34 | — | — |
FP64 Tensor (TFlops) | — | — | — | 10,3 | — | — | — | — | — | 19,5 | 51 | 60 | 67 | 67 | 67 | 40 | 90 |
FP32 (TFlops) | 4,5 | 31,2 | 4x 4,5 | 10,3 | 37,4 | 30,3 | 90,52 | 91,6 | 15,7 | 14 | 19,5 | 51 | 60 | 67 | 67 | 67 | ||
TF32 Tensor (TFlops) | 18* | 125* | 4x 18* | 165* | 150* | 120* | 181* | 366* | 125 | 112 | 312* | 756* | 835* | 989* | 989* | 989* | 2 200* | 5 000* |
FP16 Tensor (TFlops) | 35,9* | 250* | 4x 35,9* | 330* | 299* | 242* | 362* | 733* | — | 624* | 1 513* | 1 671* | 1 979* | 1 979* | 1 979* | 4 500* | 10 000* |
INT8 Tensor (TOPS) | 71,8* | 500* | 4x 71,8* | 661* | 599* | FP8 485* | 724* | 1 466* | — | 1 248* | 3 026* | 3 341* | 3 958* | 3 958* | 3 958* | 9 000* | 20 000* |
FP8 (TFlops) | — | — | — | — | — | — | — | 1 466* | — | — | — | — | — | — | — | 9 000* | 20 000* |
FP4 (TFlops) | — | — | — | — | — | — | — | 1 466* | — | — | — | — | — | — | — | 18 000* | 40 000* |
GPU memory | 16 GB | 24 GB | 4x 16 GB | 24 GB | 48 GB | 24 GB | 48 GB | 48 GB | 16 or 32 GB | 40 or 80 GB | 80 GB | 94 GB | 141 GB | 80 | 141 GB | 96 GB or 144 GB | 180 GB | 360 GB |
Memory technology | GDDR6 | GDDR6 | GDDR6 | HBM2 | GDDR6 | GDDR6 | GDDR6 | GDDR6 | HBM2 | HBM2 | HBM3 | HBM3 | HBM3e | HBM3 | HBM3e | HBM3 or HBM3e | HBM3e | HBM3e |
Memory throughput | 200 GB/s | 600 GB/s | 4x 200 GB/s | 933 GB/s | 696 GB/s | 300 GB/s | 864 GB/s | 864 GB/s | 900 GB/s | 1 935 | 2 039 GB/s | 2 | 3.9 TB/s | 4.8 TB/s | 3.3 | 4.8 TB/s | 4 or 4.9 TB/s | 8 TB/s | 16 TB/s |
Multi-Instance GPU | vGPU | vGPU | vGPU | 4 instance | vGPU | vGPU | vGPU | vGPU | vGPU | 7 instances | 7 instances | 7 instances | 7 instances | 7 instances | 7 instances | 7 instances | 14 instances |
NVENC | NVDEC | JPEG engines | 1 | 2 | 1 | 1 | 4 | 8 | 0 | 4 | 1 | 1 | 2 | 2 | 4 | 4 | 3 | 3 | 3 | 3 | 4 | Yes | Yes | 0 | 5 | 5 | 0 | 7 | 7 | 0 | 7 | 7 | 0 | 7 | 7 | 0 | 7 | 7 | 0 | 7 | 7 | 0 | 14 | 14 | |
GPU link | PCIe 4 | PCIe 4 | PCIe 4 | NVLink 3 | NVLink 3 | PCIe 4 | PCIe 4 | PCIe 4 | PCIe 3 | NVLink 3 | NVLink 4 | NVLink 4 | NVLink 4 | NVLink 4 | NVLink 5 | NVLink 5 | NVLink 5 |
Power consumption | 40-60 W | 150 W | 250 W | 165 W | 300 W | 40-72W | 300 W | 350 W | 300W | 250W | 400W | 300W | 350W | 400W | 600W | 700W | 1000W2 | 1000W | 2 |
Form factor | PCIe gen4 1-slot LP | PCIe gen4 2-slot FHFL | PCIe gen4 2-slot FHFL | PCIe gen4 2-slot FHFL | PCIe gen4 2-slot FHFL | PCIe gen4 1-slot LP | PCIe gen4 2-slot FHFL | PCIe gen4 2-slot FHFL | SXM2 | PCIe gen4 2-slot FHFL | SXM4 | PCIe gen4 2-slot FHFL | PCIe gen5 2-slot FHFL | PCIe gen5 2-slot FHFL | PCIe gen5 2-slot FHFL | SXM5 card | Superchip | SXM5 card | Superchip |
Spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | |
Announcement | 2021 | 2021 | 2021 | 2021 | 2020 | 2023 | 2022 | 2023 | 2017 | 2020 | 2022 | 2023 | 2024 | 2022 | 2023 | 2023 | 2024 | 2024 |
Availability | | | ||||||||||||||||
GPU | A2 | A10 | A16 | A30 | A40 | L4 | L40 | L40S | V100 SXM2 | PCIe | A100 SXM4 | PCIe | H100 PCIe | H100 NVL | H200 NVL | H100 | H200 SXM51) | GH200 | B200 SXM51) | GB2001) |
1) preliminary numbers
2) the total power consumption of CPU, GPU and memory on the superchip
Availability: – good (on stock or 4-6 weeks), – medium (around 10 weeks), – bad (15 weeks+), – not available
NVIDIA GPU accelerators – the core of AI and HPC in the modern data center.
Solving the world’s most important scientific, industrial, and business challenges with AI and HPC. Visualizing complex content to create cutting-edge products, tell immersive stories, and reimagine cities of the future. Designed for the age of elastic computing, rises to all these challenges, providing unmatched acceleration at every scale.
Benchmarks
- A16, A100, V100, RTX4000 Ada (CTU Prague)
- A100, L40s (NVIDIA)
- DGX H100, DGX B200 (NVIDIA)
NVIDIA A16, A100, V100, RTX4000 Ada by CTU FEE in Prague
PyTorch training time GPU comparison
MnasNET
Time (lower is better)
ResNET
Time (lower is better)
DesNET
Time (lower is better)
NVIDIA A100 vs. NVIDIA L40s application benchmarks
Benchmarks | # GPUs | Precision | Metric | A1001 | L40S | L40S/A100 | |
---|---|---|---|---|---|---|---|
DL Training | GPT 7B2 (GBS=512) |
8 | FP16/FP8 | Samples/sec | 13.5 | 15.7 | 1.2x |
ResNet-50 V1.5 Training (BS=32) |
1 | FP16 | Images/sec | 2707 | 2748 | 1.0x | |
BERT Large Pre-Training Phase 1 (BS=128, seq 512) |
1 | FP16 | Sequences/sec | 579 | 472 | 0.8x | |
BERT Large Pre-Training Phase 2 (BS=8, seq 512) |
1 | FP16 | Sequences/sec | 152 | 161 | 1.1x | |
DL Inference | ResNet-50 V1.5 Inference (BS=32) |
1 | INT8 | Images/sec | 23439 | 34588 | 1.5x |
BERT Large Inference (BS=8, seq 128) |
1 | INT8 | Sequences/sec | 3011 | 4090 | 1.3x | |
BERT Large Inference (BS=8, seq 384) |
1 | INT8 | Sequences/sec | 1116 | 1598 | 1.4x | |
BERT Large Inference (BS=128, seq 128) |
1 | INT8 | Sequences/sec | 5065 | 5273 | 1.0x | |
BERT Large Inference (BS=128, seq 384) |
1 | INT8 | Sequences/sec | 1445 | 1558 | 1.1x | |
Stable Diffusion | Demo Diffusion 2.1 Inference (BS=1, 512x512) |
1 | FP16 | Pipeline Latency (ms) | 827 | 743 | 1.1x |
Demo Diffusion 2.1 Inference (BS=1, 1024x1024) |
1 | FP16 | Pipeline Latency (ms) | 4186 | 3582 | 1.2x | |
Stable Diffusion XL (BS=1, PyTorch native) |
1 | FP16 | Pipeline Latency (ms) | 10450 | 11194 | 0.9x | |
Stable Diffusion XL (BS=1, PyTorch optimized) |
1 | FP16 | Pipeline Latency (ms) | 7353 | 7382 | 1.0x | |
Stable Diffusion XL (BS=1, TRT optimized) |
1 | FP16 | Pipeline Latency (ms) | 5251 | 5547 | 1.0x | |
DL Inference | GPT2 Inference (BS=1) |
1 | FP16 | Samples/sec | 1333 | 1828 | 1.4x |
GPT2 Inference (BS=32) |
1 | FP16 | Samples/sec | 6502 | 7578 | 1.2x | |
GPT2 Inference (BS=128) |
1 | FP16 | Samples/sec | 6850 | 6701 | 1.0x | |
DLRM (BS=1) |
1 | TF32 | Records/sec | 6495 | 9458 | 1.5x | |
DLRM (BS=64) |
1 | TF32 | Records/sec | 319131 | 517072 | 1.6x | |
DLRM (BS=2048) |
1 | TF32 | Records/sec | 4668287 | 6980429 | 1.5x | |
ViT Inference (BS=32, seq 224) |
1 | FP16 | Samples per Second | 1556 | 1477 | 1.0x | |
ViT Inference (BS=32, seq 384) |
1 | FP16 | Samples per Second | 501 | 404 | 0.8x | |
HF Swin Base Inference (BS=1,Seq 224) |
1 | INT8 | Samples per Second | 633 | 920 | 1.5x | |
HF Swin Base Inference (BS=32,Seq 224) |
1 | INT8 | Samples per Second | 2998 | 3564 | 1.2x | |
HF Swin Large Inference (BS=1,Seq 384) |
1 | INT8 | Samples per Second | 411 | 345 | 1.2x | |
HF Swin Large Inference (BS=32,Seq 384) |
1 | INT8 | Samples per Second | 478 | 570 | 0.8x |
NVIDIA B200 GPUs theoretical performance in DGX systems
// Still not sure what is the best GPU for you? We are ready to assist you.