Regurarly updated

Always updated with the latest NVIDIA accelerators and information.

All info on one page

One pager information with all main parameters and AI use-cases.

Benchmarks

We provide real application benchmarks of NVIDIA GPUs for comparison.

// choose your best accelerator

GPU selector

Quick guide

This guide will help to choose your AI accelerator quickly.

Expert guide

Performance benchmarks and technical parameters for AI experts.

Older GPUs

You can also find technical specifications and details of older NVIDIA GPUs.

// NVIDIA BLACKWELL - GRACE SUPERCHIP

The most powerful GPU - NVIDIA GB200

NVDIA B200 superchip
GPU memory per card
Up to 1 GB
Compute
Compute

AI training and inferencing,
data analytics, HPC

General Purpose
General Purpose

Visualization, rendering, AI,
virtual workstations

High-Density VDI
High-Density VDI

Virtual applications, virtual desktops,
virtual workstations

// GPU selector

Expert guide

You can scroll the table to the right to see all current NVIDIA GPUs including the most powerfull NVIDIA B200 and NVIDIA H200. You can also select GPU cards you would like to display and compare by clicking on Column visibility button. Some of the GPUs are hidden by default.

GPUA2A10A16A30A40L4L40L40SV100 SXM2 | PCIeA100 SXM4 | PCIeH100 PCIeH100 NVLH200 NVLH100 | H200 SXM5GH200B200 SXM5GB200
ArchitectureAmpereAmpereAmpereAmpereAmpereAda LovelaceAda LovelaceAda LovelaceVoltaAmpereHopperHopperHopperHopperGrace+HopperBlackwellGrace+Blackwell
Card chipGA107GA102GA107GA100GA102AD104AD102AD102GV100GA100GH100GH100GH100GH1001xGrace+1xH100B2001xGrace+2xB200
# CUDA cores1 2809 2164x 1 2806 91210 7527 68018 17618 1765 1206 91214 59216 89616 896
# Tensor cores402884x 40224336240568568640432456528528
FP64 (TFlops)0,070,970,2715,21,1790,491,411,417,8 | 79,6925,630343434
FP64 Tensor (TFlops)10,319,551606767674090
FP32 (TFlops)4,531,24x 4,510,337,430,390,5291,615,7 | 1419,55160676767
TF32 Tensor (TFlops)18*125*4x 18*165*150*120*181*366*125 | 112312*756*835*989*989*989*2 200*5 000*
FP16 Tensor (TFlops)35,9*250*4x 35,9*330*299*242*362*733*624*1 513*1 671*1 979*1 979*1 979*4 500*10 000*
INT8 Tensor (TOPS)71,8*500*4x 71,8*661*599*FP8 485*724*1 466*1 248*3 026*3 341*3 958*3 958*3 958*9 000*20 000*
FP8 (TFlops)1 466*9 000*20 000*
FP4 (TFlops)1 466*18 000*40 000*
GPU memory16 GB24 GB4x 16 GB24 GB48 GB24 GB48 GB48 GB16 or 32 GB40 or 80 GB80 GB94 GB141 GB80 | 141 GB96 GB or 144 GB180 GB360 GB
Memory technologyGDDR6GDDR6GDDR6HBM2GDDR6GDDR6GDDR6GDDR6HBM2HBM2HBM3HBM3HBM3eHBM3 | HBM3eHBM3 or HBM3eHBM3eHBM3e
Memory throughput200 GB/s600 GB/s4x 200 GB/s933 GB/s696 GB/s300 GB/s864 GB/s864 GB/s900 GB/s1 935 | 2 039 GB/s23.9 TB/s4.8 TB/s3.3 | 4.8 TB/s4 or 4.9 TB/s8 TB/s16 TB/s
Multi-Instance GPUvGPUvGPUvGPU4 instancevGPUvGPU
vGPUvGPUvGPU7 instances7 instances7 instances7 instances7 instances7 instances7 instances14 instances
NVENC | NVDEC |
JPEG engines
1 | 21 | 14 | 80 | 4 | 11 | 22 | 4 | 43 | 33 | 3 | 4Yes | Yes0 | 5 | 5 0 | 7 | 70 | 7 | 70 | 7 | 70 | 7 | 70 | 7 | 70 | 14 | 14
GPU linkPCIe 4PCIe 4PCIe 4NVLink 3NVLink 3PCIe 4PCIe 4PCIe 4PCIe 3NVLink 3NVLink 4NVLink 4NVLink 4NVLink 4NVLink 5NVLink 5NVLink 5
Power consumption40-60 W150 W250 W165 W300 W40-72W300 W350 W300W | 250W400W | 300W350W400W600W700W1000W21000W2
Form factorPCIe gen4
1-slot LP
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
PCIe gen4
1-slot LP
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
SXM2 | PCIe gen4 2-slot FHFLSXM4 | PCIe gen4
2-slot FHFL
PCIe gen5
2-slot FHFL
PCIe gen5
2-slot FHFL
PCIe gen5
2-slot FHFL
SXM5 cardSuperchipSXM5 cardSuperchip
Spec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheet
Announcement20212021202120212020202320222023201720202022202320242022 | 2023202320242024
AvailabilityNot available | Not availableNot available
GPUA2A10A16A30A40L4L40L40SV100 SXM2 | PCIeA100 SXM4 | PCIeH100 PCIeH100 NVLH200 NVLH100 | H200 SXM51)GH200B200 SXM51)GB2001)
*theoretical performance with Sparsity function
1) preliminary numbers
2) the total power consumption of CPU, GPU and memory on the superchip

Availability: Not available – good (on stock or 4-6 weeks), Not available – medium (around 10 weeks), Not available – bad (15 weeks+), Not available – not available

NVIDIA GPU accelerators – the core of AI and HPC in the modern data center.
Solving the world’s most important scientific, industrial, and business challenges with AI and HPC. Visualizing complex content to create cutting-edge products, tell immersive stories, and reimagine cities of the future. Designed for the age of elastic computing, rises to all these challenges, providing unmatched acceleration at every scale.

// GPU selector

Benchmarks

NVIDIA A16, A100, V100, RTX4000 Ada by CTU FEE in Prague

PyTorch training time GPU comparison

MnasNET

Time (lower is better)

MnasNET 1 benchmarks

 

ResNET

Time (lower is better)

ResNET comparison

 

DesNET

Time (lower is better)

Desnet benchmark

NVIDIA A100 vs. NVIDIA L40s application benchmarks


Benchmarks # GPUs Precision Metric A1001 L40S L40S/A100
DL Training GPT 7B2
(GBS=512)
8 FP16/FP8 Samples/sec 13.5 15.7 1.2x
ResNet-50 V1.5 Training
(BS=32)
1 FP16 Images/sec 2707 2748 1.0x
BERT Large Pre-Training Phase 1
(BS=128, seq 512)
1 FP16 Sequences/sec 579 472 0.8x
BERT Large Pre-Training Phase 2
(BS=8, seq 512)
1 FP16 Sequences/sec 152 161 1.1x
DL Inference ResNet-50 V1.5 Inference
(BS=32)
1 INT8 Images/sec 23439 34588 1.5x
BERT Large Inference
(BS=8, seq 128)
1 INT8 Sequences/sec 3011 4090 1.3x
BERT Large Inference
(BS=8, seq 384)
1 INT8 Sequences/sec 1116 1598 1.4x
BERT Large Inference
(BS=128, seq 128)
1 INT8 Sequences/sec 5065 5273 1.0x
BERT Large Inference
(BS=128, seq 384)
1 INT8 Sequences/sec 1445 1558 1.1x
Stable Diffusion Demo Diffusion 2.1 Inference
(BS=1, 512x512)
1 FP16 Pipeline Latency (ms) 827 743 1.1x
Demo Diffusion 2.1 Inference
(BS=1, 1024x1024)
1 FP16 Pipeline Latency (ms) 4186 3582 1.2x
Stable Diffusion XL
(BS=1, PyTorch native)
1 FP16 Pipeline Latency (ms) 10450 11194 0.9x
Stable Diffusion XL
(BS=1, PyTorch optimized)
1 FP16 Pipeline Latency (ms) 7353 7382 1.0x
Stable Diffusion XL
(BS=1, TRT optimized)
1 FP16 Pipeline Latency (ms) 5251 5547 1.0x
DL Inference GPT2 Inference
(BS=1)
1 FP16 Samples/sec 1333 1828 1.4x
GPT2 Inference
(BS=32)
1 FP16 Samples/sec 6502 7578 1.2x
GPT2 Inference
(BS=128)
1 FP16 Samples/sec 6850 6701 1.0x
DLRM
(BS=1)
1 TF32 Records/sec 6495 9458 1.5x
DLRM
(BS=64)
1 TF32 Records/sec 319131 517072 1.6x
DLRM
(BS=2048)
1 TF32 Records/sec 4668287 6980429 1.5x
ViT Inference
(BS=32, seq 224)
1 FP16 Samples per Second 1556 1477 1.0x
ViT Inference
(BS=32, seq 384)
1 FP16 Samples per Second 501 404 0.8x
HF Swin Base Inference
(BS=1,Seq 224)
1 INT8 Samples per Second 633 920 1.5x
HF Swin Base Inference
(BS=32,Seq 224)
1 INT8 Samples per Second 2998 3564 1.2x
HF Swin Large Inference
(BS=1,Seq 384)
1 INT8 Samples per Second 411 345 1.2x
HF Swin Large Inference
(BS=32,Seq 384)
1 INT8 Samples per Second 478 570 0.8x

NVIDIA B200 GPUs theoretical performance in DGX systems

LM Inference training

// Still not sure what is the best GPU for you? We are ready to assist you.

NEED A CONSULTATION?