Regurarly updated

Always updated with the latest NVIDIA accelerators and information.

All info on one page

One pager information with all main parameters and AI use-cases.

Benchmarks

We provide real application benchmarks of NVIDIA GPUs for comparison.

// choose your best accelerator

GPU selector

Quick guide

This guide will help to choose your AI accelerator quickly.

Expert guide

Performance benchmarks and technical parameters for AI experts.

Older GPUs

You can also find technical specifications and details of older NVIDIA GPUs.

// NVIDIA BLACKWELL - GRACE SUPERCHIP

The most powerful GPU - NVIDIA GB200

NVDIA B200 superchip
GPU memory per card
Up to 1 GB
Compute
Compute

AI training and inferencing,
data analytics, HPC

General Purpose
General Purpose

Visualization, rendering, AI,
virtual workstations

High-Density VDI
High-Density VDI

Virtual applications, virtual desktops,
virtual workstations

// GPU selector

Expert guide

You can scroll the table to the right to see all current NVIDIA GPUs including the most powerfull NVIDIA B200 and NVIDIA H200. You can also select GPU cards you would like to display and compare by clicking on Column visibility button. Some of the GPUs are hidden by default.

GPUA2A10A16A30A40L4L40L40SV100 SXM2 | PCIeA100 SXM4 | PCIeH100 PCIe | H100 NVLH100 | H200 SXM5GH200B100B200GB200
ArchitectureAmpereAmpereAmpereAmpereAmpereAda LovelaceAda LovelaceAda LovelaceVoltaAmpereHopperHopperGrace+HopperBlackwellBlackwellGrace+Blackwell
Card chipGA107GA102GA107GA100GA102AD104AD102AD102GV100GA100GH100GH1001xGrace+1xH100B100B2001xGrace+2xB200
# CUDA cores1 2809 2164x 1 2806 91210 7527 68018 17618 1765 1206 91214 59216 896
# Tensor cores402884x 40224336240568568640432456528
FP64 (TFlops)0,070,970,2715,21,1790,491,411,417,8 | 79,6926 | 343434
FP64 Tensor (TFlops)10,319,551 | 576767304090
FP32 (TFlops)4,531,24x 4,510,337,430,390,5291,615,7 | 1419,551 | 576767
TF32 Tensor (TFlops)18*125*4x 18*165*150*120*181*366*125 | 112312*756* | 989*989*989*1 800*2 200*5 000*
FP16 Tensor (TFlops)35,9*250*4x 35,9*330*299*242*362*733*624*1 513* | 1 979*1 979*1 979*3 500*4 500*10 000*
INT8 Tensor (TOPS)71,8*500*4x 71,8*661*599*FP8 485*724*1 466*1 248*3 026* | 3 958*3 958*3 958*7 000*9 000*20 000*
FP8 (TFlops)1 466*7 000*9 000*20 000*
FP4 (TFlops)1 466*14 000*18 000*40 000*
GPU memory16 GB24 GB4x 16 GB24 GB48 GB24 GB48 GB48 GB16 or 32 GB40 or 80 GB80 GB | 94 GB80 | 141 GB96 GB or 144 GB180 GB180 GB360 GB
Memory technologyGDDR6GDDR6GDDR6HBM2GDDR6GDDR6GDDR6GDDR6HBM2HBM2HBM3HBM3 | HBM3eHBM3 or HBM3eHBM3eHBM3eHBM3e
Memory throughput200 GB/s600 GB/s4x 200 GB/s933 GB/s696 GB/s300 GB/s864 GB/s864 GB/s900 GB/s1 935 | 2 039 GB/s2 | 3,3 TB/s3,3 | 4.8 TB/s4 or 4.9 TB/s8 TB/s8 TB/s16 TB/s
Multi-Instance GPUvGPUvGPUvGPU4 instancevGPUvGPU
vGPUvGPUvGPU7 instances7 instances7 instances7 instances7 instances7 instances14 instances
NVENC | NVDEC |
JPEG engines
1 | 21 | 14 | 80 | 4 | 11 | 22 | 4 | 43 | 33 | 3 | 4Yes | Yes0 | 5 | 5 0 | 7 | 70 | 7 | 70 | 7 | 70 | 14 | 140 | 14 | 14
GPU linkPCIe 4PCIe 4PCIe 4NVLink 3NVLink 3PCIe 4PCIe 4PCIe 4PCIe 3NVLink 3NVLink 4NVLink 4NVLink 5NVLink 5NVLink 5NVLink 5
Power consumption40-60 W150 W250 W165 W300 W40-72W300 W350 W300W | 250W400W | 300W350W | 400W700W1000W2700W1000W2
Form factorPCIe gen4
1-slot LP
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
PCIe gen4
1-slot LP
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
SXM2 | PCIe gen4 2-slot FHFLSXM4 | PCIe gen4
2-slot FHFL
PCIe gen5
2-slot FHFL
SXM5 cardSuperchipSXM5 cardSXM5 cardSuperchip
Spec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheet
Announcement202120212021202120202023202220232017202020222022 | 20232023202420242024
Availability | | |
GPUA2A10A16A30A40L4L40L40SV100 SXM2 | PCIeA100 SXM4 | PCIeH100 PCIe | H100 NVLH100 | H200 SXM51)GH200B1001)B2001)GB2001)
*theoretical performance with Sparsity function
1) preliminary numbers
2) the total power consumption of CPU, GPU and memory on the superchip

Availability: Not available – good (on stock or 4-6 weeks), Not available – medium (around 10 weeks), Not available – bad (15 weeks+), Not available – not available

NVIDIA GPU accelerators – the core of AI and HPC in the modern data center.
Solving the world’s most important scientific, industrial, and business challenges with AI and HPC. Visualizing complex content to create cutting-edge products, tell immersive stories, and reimagine cities of the future. Designed for the age of elastic computing, rises to all these challenges, providing unmatched acceleration at every scale.

// GPU selector

Benchmarks

NVIDIA A16, A100, V100, RTX4000 Ada by CTU FEE in Prague

PyTorch training time GPU comparison

MnasNET

Time (lower is better)

MnasNET 1 benchmarks

 

ResNET

Time (lower is better)

ResNET comparison

 

DesNET

Time (lower is better)

Desnet benchmark

NVIDIA A100 vs. NVIDIA L40s application benchmarks


Benchmarks # GPUs Precision Metric A1001 L40S L40S/A100
DL Training GPT 7B2
(GBS=512)
8 FP16/FP8 Samples/sec 13.5 15.7 1.2x
ResNet-50 V1.5 Training
(BS=32)
1 FP16 Images/sec 2707 2748 1.0x
BERT Large Pre-Training Phase 1
(BS=128, seq 512)
1 FP16 Sequences/sec 579 472 0.8x
BERT Large Pre-Training Phase 2
(BS=8, seq 512)
1 FP16 Sequences/sec 152 161 1.1x
DL Inference ResNet-50 V1.5 Inference
(BS=32)
1 INT8 Images/sec 23439 34588 1.5x
BERT Large Inference
(BS=8, seq 128)
1 INT8 Sequences/sec 3011 4090 1.3x
BERT Large Inference
(BS=8, seq 384)
1 INT8 Sequences/sec 1116 1598 1.4x
BERT Large Inference
(BS=128, seq 128)
1 INT8 Sequences/sec 5065 5273 1.0x
BERT Large Inference
(BS=128, seq 384)
1 INT8 Sequences/sec 1445 1558 1.1x
Stable Diffusion Demo Diffusion 2.1 Inference
(BS=1, 512x512)
1 FP16 Pipeline Latency (ms) 827 743 1.1x
Demo Diffusion 2.1 Inference
(BS=1, 1024x1024)
1 FP16 Pipeline Latency (ms) 4186 3582 1.2x
Stable Diffusion XL
(BS=1, PyTorch native)
1 FP16 Pipeline Latency (ms) 10450 11194 0.9x
Stable Diffusion XL
(BS=1, PyTorch optimized)
1 FP16 Pipeline Latency (ms) 7353 7382 1.0x
Stable Diffusion XL
(BS=1, TRT optimized)
1 FP16 Pipeline Latency (ms) 5251 5547 1.0x
DL Inference GPT2 Inference
(BS=1)
1 FP16 Samples/sec 1333 1828 1.4x
GPT2 Inference
(BS=32)
1 FP16 Samples/sec 6502 7578 1.2x
GPT2 Inference
(BS=128)
1 FP16 Samples/sec 6850 6701 1.0x
DLRM
(BS=1)
1 TF32 Records/sec 6495 9458 1.5x
DLRM
(BS=64)
1 TF32 Records/sec 319131 517072 1.6x
DLRM
(BS=2048)
1 TF32 Records/sec 4668287 6980429 1.5x
ViT Inference
(BS=32, seq 224)
1 FP16 Samples per Second 1556 1477 1.0x
ViT Inference
(BS=32, seq 384)
1 FP16 Samples per Second 501 404 0.8x
HF Swin Base Inference
(BS=1,Seq 224)
1 INT8 Samples per Second 633 920 1.5x
HF Swin Base Inference
(BS=32,Seq 224)
1 INT8 Samples per Second 2998 3564 1.2x
HF Swin Large Inference
(BS=1,Seq 384)
1 INT8 Samples per Second 411 345 1.2x
HF Swin Large Inference
(BS=32,Seq 384)
1 INT8 Samples per Second 478 570 0.8x

NVIDIA B200 GPUs theoretical performance in DGX systems

LM Inference training
// GPU selector

Older NVIDIA GPUs

GPUV100 SXM2 | PCIeT4A2A10A30L40
ArchitectureVoltaTuringAmpereAmpereAmpereAda Lovelace
Card chipGV100TU104GA107GA102GA100AD102
# CUDA cores5 1202 5601 2809 2166 91218 176
# Tensor cores64032040288224568
FP64 (TFlops)7,8 | 70,070,975,21,413
FP64 Tensor (TFlops)10,3
FP32 (TFlops)15,7 | 148,14,531,210,390,52
TF32 Tensor (TFlops)125 | 112125 | 11218*125*165*181*
FP16 Tensor (TFlops)6535,9*250*330*362*
INT8 Tensor (TOPS)13071,8*500*661*724*
GPU memory16 GB or 32 GB16 GB16 GB24 GB24 GB48 GB
Memory technologyHBM2GDDR6GDDR6GDDR6HBM2GDDR6
Memory throughput900 GB/s300 GB/s200 GB/s600 GB/s933 GB/s864 GB/s
Multi-Instance GPUvGPUvGPUvGPUvGPU4 instancevGPU
NVENC | NVDEC |
JPEG engines
Yes | Yes1 | 21 | 10 | 4 | 13 | 3
GPU linkPCIe 3PCIe 3PCIe 4PCIe 4NVLink 3PCIe 4
Power consumption300W | 250W70 W40-60 W150 W165 W300 W
Form factorSXM2 | PCIe gen3 2-slot FHFLPCIe gen3 1-slot LPPCIe gen4
1-slot LP
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
Spec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheet
Announcement201720182021202120212022
Availability
GPUV100 SXM2 | PCIeT4A2A10A30L40
*theoretical performance with Sparsity function

Availability: Not available – not available

// Still not sure what is the best GPU for you? We are ready to assist you.

NEED A CONSULTATION?