Regurarly updated

Always updated with the latest NVIDIA accelerators and information.

All info on one page

One pager information with all main parameters and AI use-cases.

Benchmarks

We provide real application benchmarks of NVIDIA GPUs for comparison.

// choose your best accelerator

GPU selector

Quick guide

This guide will help to choose your AI accelerator quickly.

Expert guide

Performance benchmarks and technical parameters for AI experts.

Older GPUs

You can also find technical specifications and details of older NVIDIA GPUs.

// NVIDIA BLACKWELL - GRACE SUPERCHIP

The most powerful GPU - NVIDIA GB200

NVDIA B200 superchip
GPU memory per card
Up to 1 GB
Compute
Compute

AI training and inferencing,
data analytics, HPC

General Purpose
General Purpose

Visualization, rendering, AI,
virtual workstations

High-Density VDI
High-Density VDI

Virtual applications, virtual desktops,
virtual workstations

// GPU selector

Expert guide

You can scroll the table to the right to see all current NVIDIA GPUs including the most powerfull NVIDIA B200 and NVIDIA H200. You can also select GPU cards you would like to display and compare by clicking on Column visibility button. Some of the GPUs are hidden by default.

GPUL4A16A40L40SRTX PRO 6000 Blackwell SEA100 PCIe | SXM4H100 PCIeH100 SXM5H100 NVLH200 SXM5H200 NVLB200B300
ArchitectureAda LovelaceAmpereAmpereAda LovelaceBlackwellAmpereHopperHopperHopperHopperHopperBlackwellBlackwell
Card chipAD104GA107GA102AD102GB202GA100GH100GH100GH100GH100GH100B200B300
# CUDA cores7 6804x 1 28010 75218 17624 0646 91214 59216 89616 89616 89616 896TBATBA
# Tensor cores2404x 40336568?752432456528528528528TBATBA
GPU memory24 GB4x 16 GB48 GB48 GB96 GB80 | 40 GB80 GB80 GB94 GB141 GB141 GB192 GB288 GB
Memory technologyGDDR6GDDR6GDDR6GDDR6GDDR7HBM2HBM3HBM3HBM3eHBM3eHBM3eHBM3eHBM3e
Memory throughput300 GB/s4x 200 GB/s696 GB/s864 GB/s1.6 TB/s1 935 | 2 039 GB/s2 TB/s3.3 TB/s3.9 TB/s4.8 TB/s4.8 TB/s8 TB/s10 TB/s
FP64 (TFlops)0,490,2711,1791,413?9.72630303430TBATBA
FP64 Tensor (TFlops)19.55160606760371.2
FP32 (TFlops)30,34x 4,537,491,612619.551606067607572
TF32 Tensor (TFlops)120*4x 18*150*366*251312* | 624*756*989*835*989*835*2 200*2 200*
FP16 Tensor (TFlops)242*4x 35,9*299*733*503.8312* | 624*1 513*1 979*1 671*1 979*1 671*4 500*4 500*
INT8 Tensor (TOPS)485*4x 71,8*599*1 466*1 007.6624* | 1 248*3 026*3 958*3 341*3 958*3 341*9 000*280*
FP8 Tensor (TFlops)485*1 466*2 015.2*3 026*3 958*3 341*3 958*3 341*9 000*9 000*
FP4 Tensor (TFlops)4 030.4*18 000*18 000*
Multi-Instance GPUvGPU
vGPUvGPUvGPU4 instances7 instances7 instances7 instances7 instances7 instances7 instancesTBATBA
NVENC | NVDEC |
JPEG engines
2 | 4 | 44 | 81 | 23 | 3 | 44 | 4 | 40 | 5 | 50 | 7 | 70 | 7 | 70 | 7 | 70 | 7 | 70 | 7 | 7TBATBA
GPU linkPCIe 4PCIe 4NVLink 3PCIe 4PCIe 5NVLink 3NVLink 4NVLink 4NVLink 4NVLink 4NVLink 4NVLink 5NVLink 5
Power consumption40-72W250 W300 W350 W600 W300 W | 400 W350W700 W400 W700W600W1 000W1 400W
Form factorPCIe gen4
1-slot LP
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
PCIe gen4
2-slot FHFL
PCIe gen5
2-slot FHFL
SXM4 | PCIe gen4
2-slot FHFL
PCIe gen5
2-slot FHFL
SXM5 cardPCIe gen5
2-slot FHFL
SXM5 cardPCIe gen5
2-slot FHFL
SXM5 cardSXM5 card
Spec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheetspec sheet
Announcement2023202120202023202520202022202220242023202320242025
Availability |
GPUL4A16A40L40SRTX PRO 6000 Blackwell Server EditionA100 SXM4 | PCIeH100 PCIeH100 SXM5H100 NVLH100 | H200 SXM5H200 NVLB200B300
*theoretical performance with Sparsity function
1) preliminary numbers
2) the total power consumption of CPU, GPU and memory on the superchip

Availability: Not available – good (on stock or 4-6 weeks), Not available – medium (around 10 weeks), Not available – bad (15 weeks+), Not available – not available

NVIDIA GPU accelerators – the core of AI and HPC in the modern data center.
Solving the world’s most important scientific, industrial, and business challenges with AI and HPC. Visualizing complex content to create cutting-edge products, tell immersive stories, and reimagine cities of the future. Designed for the age of elastic computing, rises to all these challenges, providing unmatched acceleration at every scale.

// GPU selector

Benchmarks

NVIDIA A16, A100, V100, RTX4000 Ada by CTU FEE in Prague

PyTorch training time GPU comparison

MnasNET

Time (lower is better)

MnasNET 1 benchmarks

 

ResNET

Time (lower is better)

ResNET comparison

 

DesNET

Time (lower is better)

Desnet benchmark

NVIDIA A100 vs. NVIDIA L40s application benchmarks


Benchmarks # GPUs Precision Metric A1001 L40S L40S/A100
DL Training GPT 7B2
(GBS=512)
8 FP16/FP8 Samples/sec 13.5 15.7 1.2x
ResNet-50 V1.5 Training
(BS=32)
1 FP16 Images/sec 2707 2748 1.0x
BERT Large Pre-Training Phase 1
(BS=128, seq 512)
1 FP16 Sequences/sec 579 472 0.8x
BERT Large Pre-Training Phase 2
(BS=8, seq 512)
1 FP16 Sequences/sec 152 161 1.1x
DL Inference ResNet-50 V1.5 Inference
(BS=32)
1 INT8 Images/sec 23439 34588 1.5x
BERT Large Inference
(BS=8, seq 128)
1 INT8 Sequences/sec 3011 4090 1.3x
BERT Large Inference
(BS=8, seq 384)
1 INT8 Sequences/sec 1116 1598 1.4x
BERT Large Inference
(BS=128, seq 128)
1 INT8 Sequences/sec 5065 5273 1.0x
BERT Large Inference
(BS=128, seq 384)
1 INT8 Sequences/sec 1445 1558 1.1x
Stable Diffusion Demo Diffusion 2.1 Inference
(BS=1, 512x512)
1 FP16 Pipeline Latency (ms) 827 743 1.1x
Demo Diffusion 2.1 Inference
(BS=1, 1024x1024)
1 FP16 Pipeline Latency (ms) 4186 3582 1.2x
Stable Diffusion XL
(BS=1, PyTorch native)
1 FP16 Pipeline Latency (ms) 10450 11194 0.9x
Stable Diffusion XL
(BS=1, PyTorch optimized)
1 FP16 Pipeline Latency (ms) 7353 7382 1.0x
Stable Diffusion XL
(BS=1, TRT optimized)
1 FP16 Pipeline Latency (ms) 5251 5547 1.0x
DL Inference GPT2 Inference
(BS=1)
1 FP16 Samples/sec 1333 1828 1.4x
GPT2 Inference
(BS=32)
1 FP16 Samples/sec 6502 7578 1.2x
GPT2 Inference
(BS=128)
1 FP16 Samples/sec 6850 6701 1.0x
DLRM
(BS=1)
1 TF32 Records/sec 6495 9458 1.5x
DLRM
(BS=64)
1 TF32 Records/sec 319131 517072 1.6x
DLRM
(BS=2048)
1 TF32 Records/sec 4668287 6980429 1.5x
ViT Inference
(BS=32, seq 224)
1 FP16 Samples per Second 1556 1477 1.0x
ViT Inference
(BS=32, seq 384)
1 FP16 Samples per Second 501 404 0.8x
HF Swin Base Inference
(BS=1,Seq 224)
1 INT8 Samples per Second 633 920 1.5x
HF Swin Base Inference
(BS=32,Seq 224)
1 INT8 Samples per Second 2998 3564 1.2x
HF Swin Large Inference
(BS=1,Seq 384)
1 INT8 Samples per Second 411 345 1.2x
HF Swin Large Inference
(BS=32,Seq 384)
1 INT8 Samples per Second 478 570 0.8x

NVIDIA B200 GPUs theoretical performance in DGX systems

LM Inference training

// Still not sure what is the best GPU for you? We are ready to assist you.

NEED A CONSULTATION?