NVIDIA GPU selector

Regurarly updated

Always updated with the latest NVIDIA accelerators and information.

All info on one page

One pager information with all main parameters and AI use-cases.

Benchmarks

We provide real application benchmarks of NVIDIA GPUs for comparison.

// choose your best accelerator

GPU selector

Quick guide

This guide will help to choose your AI accelerator quickly.

Expert guide

Performance benchmarks and technical parameters for AI experts.

Older GPUs

You can also find technical specifications and details of older NVIDIA GPUs.

// NVIDIA BLACKWELL - GRACE SUPERCHIP

The most powerful GPU - NVIDIA GB200

GPU memory per card

Up to 1 GB

Compute

AI training and inferencing,
data analytics, HPC

General Purpose

Visualization, rendering, AI,
virtual workstations

High-Density VDI

Virtual applications, virtual desktops,
virtual workstations

// Quick Guide

Choose your GPU use case

// GPU selector

Expert guide

You can scroll the table to the right to see all current NVIDIA GPUs including the most powerfull NVIDIA B200 and NVIDIA H200. You can also select GPU cards you would like to display and compare by clicking on Column visibility button. Some of the GPUs are hidden by default.

GPU	A2	A10	A16	A30	A40	L4	L40	L40S	V100 SXM2 \| PCIe	A100 SXM4 \| PCIe	H100 PCIe	H100 \| H200 SXM5	H100 NVL	GH200	B200 SXM5	GB200
Architecture	Ampere	Ampere	Ampere	Ampere	Ampere	Ada Lovelace	Ada Lovelace	Ada Lovelace	Volta	Ampere	Hopper	Hopper	Hopper	Grace+Hopper	Blackwell	Grace+Blackwell
Card chip	GA107	GA102	GA107	GA100	GA102	AD104	AD102	AD102	GV100	GA100	GH100	GH100	GH100	1xGrace+1xH100	B200	1xGrace+2xB200
# CUDA cores	1 280	9 216	4x 1 280	6 912	10 752	7 680	18 176	18 176	5 120	6 912	14 592	16 896	16 896
# Tensor cores	40	288	4x 40	224	336	240	568	568	640	432	456	528	528
FP64 (TFlops)	0,07	0,97	0,271	5,2	1,179	0,49	1,41	1,41	7,8 \| 7	9,69	26	34	30	34	—	—
FP64 Tensor (TFlops)	—	—	—	10,3	—	—	—	—	—	19,5	51	67	60	67	40	90
FP32 (TFlops)	4,5	31,2	4x 4,5	10,3	37,4	30,3	90,52	91,6	15,7 \| 14	19,5	51	67	60	67
TF32 Tensor (TFlops)	18*	125*	4x 18*	165*	150*	120*	181*	366*	125 \| 112	312*	756*	989*	835*	989*	2 200*	5 000*
FP16 Tensor (TFlops)	35,9*	250*	4x 35,9*	330*	299*	242*	362*	733*	—	624*	1 513*	1 979*	1 671*	1 979*	4 500*	10 000*
INT8 Tensor (TOPS)	71,8*	500*	4x 71,8*	661*	599*	FP8 485*	724*	1 466*	—	1 248*	3 026*	3 958*	3 341*	3 958*	9 000*	20 000*
FP8 (TFlops)	—	—	—	—	—	—	—	1 466*	—	—	—	—	—	—	9 000*	20 000*
FP4 (TFlops)	—	—	—	—	—	—	—	1 466*	—	—	—	—	—	—	18 000*	40 000*
GPU memory	16 GB	24 GB	4x 16 GB	24 GB	48 GB	24 GB	48 GB	48 GB	16 or 32 GB	40 or 80 GB	80 GB	80 \| 141 GB	94 GB	96 GB or 144 GB	180 GB	360 GB
Memory technology	GDDR6	GDDR6	GDDR6	HBM2	GDDR6	GDDR6	GDDR6	GDDR6	HBM2	HBM2	HBM3	HBM3 \| HBM3e	HBM3	HBM3 or HBM3e	HBM3e	HBM3e
Memory throughput	200 GB/s	600 GB/s	4x 200 GB/s	933 GB/s	696 GB/s	300 GB/s	864 GB/s	864 GB/s	900 GB/s	1 935 \| 2 039 GB/s	2	3.3 \| 4.8 TB/s	3.9 TB/s	4 or 4.9 TB/s	8 TB/s	16 TB/s
Multi-Instance GPU	vGPU	vGPU	vGPU	4 instance	vGPU	vGPU	vGPU	vGPU	vGPU	7 instances	7 instances	7 instances	7 instances	7 instances	7 instances	14 instances
NVENC \| NVDEC \| JPEG engines	1 \| 2	1 \| 1	4 \| 8	0 \| 4 \| 1	1 \| 2	2 \| 4 \| 4	3 \| 3	3 \| 3 \| 4	Yes \| Yes	0 \| 5 \| 5	0 \| 7 \| 7	0 \| 7 \| 7	0 \| 7 \| 7	0 \| 7 \| 7	0 \| 14 \| 14
GPU link	PCIe 4	PCIe 4	PCIe 4	NVLink 3	NVLink 3	PCIe 4	PCIe 4	PCIe 4	PCIe 3	NVLink 3	NVLink 4	NVLink 4	NVLink 4	NVLink 5	NVLink 5	NVLink 5
Power consumption	40-60 W	150 W	250 W	165 W	300 W	40-72W	300 W	350 W	300W \| 250W	400W \| 300W	350W	700W	400W	1000W²	1000W	²
Form factor	PCIe gen4 1-slot LP	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen4 1-slot LP	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	SXM2 \| PCIe gen4 2-slot FHFL	SXM4 \| PCIe gen4 2-slot FHFL	PCIe gen5 2-slot FHFL	SXM5 card	PCIe gen5 2-slot FHFL	Superchip	SXM5 card	Superchip
Spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet		spec sheet
Announcement	2021	2021	2021	2021	2020	2023	2022	2023	2017	2020	2022	2022 \| 2023	2024	2023	2024	2024
Availability												\|
GPU	A2	A10	A16	A30	A40	L4	L40	L40S	V100 SXM2 \| PCIe	A100 SXM4 \| PCIe	H100 PCIe	H100 \| H200 SXM5¹⁾	H100 NVL	GH200	B200 SXM5¹⁾	GB200¹⁾

*theoretical performance with Sparsity function
1) preliminary numbers
2) the total power consumption of CPU, GPU and memory on the superchip

Availability: – good (on stock or 4-6 weeks), – medium (around 10 weeks), – bad (15 weeks+), – not available

NVIDIA GPU accelerators – the core of AI and HPC in the modern data center.
Solving the world’s most important scientific, industrial, and business challenges with AI and HPC. Visualizing complex content to create cutting-edge products, tell immersive stories, and reimagine cities of the future. Designed for the age of elastic computing, rises to all these challenges, providing unmatched acceleration at every scale.

// GPU selector

Benchmarks

A16, A100, V100, RTX4000 Ada (CTU Prague)
A100, L40s (NVIDIA)
DGX H100, DGX B200 (NVIDIA)

NVIDIA A16, A100, V100, RTX4000 Ada by CTU FEE in Prague

PyTorch training time GPU comparison

MnasNET

Time (lower is better)

ResNET

Time (lower is better)

DesNET

Time (lower is better)

NVIDIA A100 vs. NVIDIA L40s application benchmarks

	Benchmarks	# GPUs	Precision	Metric	A100¹	L40S	L40S/A100
DL Training	GPT 7B² (GBS=512)	8	FP16/FP8	Samples/sec	13.5	15.7	1.2x
	ResNet-50 V1.5 Training (BS=32)	1	FP16	Images/sec	2707	2748	1.0x
	BERT Large Pre-Training Phase 1 (BS=128, seq 512)	1	FP16	Sequences/sec	579	472	0.8x
	BERT Large Pre-Training Phase 2 (BS=8, seq 512)	1	FP16	Sequences/sec	152	161	1.1x
DL Inference	ResNet-50 V1.5 Inference (BS=32)	1	INT8	Images/sec	23439	34588	1.5x
	BERT Large Inference (BS=8, seq 128)	1	INT8	Sequences/sec	3011	4090	1.3x
	BERT Large Inference (BS=8, seq 384)	1	INT8	Sequences/sec	1116	1598	1.4x
	BERT Large Inference (BS=128, seq 128)	1	INT8	Sequences/sec	5065	5273	1.0x
	BERT Large Inference (BS=128, seq 384)	1	INT8	Sequences/sec	1445	1558	1.1x
Stable Diffusion	Demo Diffusion 2.1 Inference (BS=1, 512x512)	1	FP16	Pipeline Latency (ms)	827	743	1.1x
	Demo Diffusion 2.1 Inference (BS=1, 1024x1024)	1	FP16	Pipeline Latency (ms)	4186	3582	1.2x
	Stable Diffusion XL (BS=1, PyTorch native)	1	FP16	Pipeline Latency (ms)	10450	11194	0.9x
	Stable Diffusion XL (BS=1, PyTorch optimized)	1	FP16	Pipeline Latency (ms)	7353	7382	1.0x
	Stable Diffusion XL (BS=1, TRT optimized)	1	FP16	Pipeline Latency (ms)	5251	5547	1.0x
DL Inference	GPT2 Inference (BS=1)	1	FP16	Samples/sec	1333	1828	1.4x
	GPT2 Inference (BS=32)	1	FP16	Samples/sec	6502	7578	1.2x
	GPT2 Inference (BS=128)	1	FP16	Samples/sec	6850	6701	1.0x
	DLRM (BS=1)	1	TF32	Records/sec	6495	9458	1.5x
	DLRM (BS=64)	1	TF32	Records/sec	319131	517072	1.6x
	DLRM (BS=2048)	1	TF32	Records/sec	4668287	6980429	1.5x
	ViT Inference (BS=32, seq 224)	1	FP16	Samples per Second	1556	1477	1.0x
	ViT Inference (BS=32, seq 384)	1	FP16	Samples per Second	501	404	0.8x
	HF Swin Base Inference (BS=1,Seq 224)	1	INT8	Samples per Second	633	920	1.5x
	HF Swin Base Inference (BS=32,Seq 224)	1	INT8	Samples per Second	2998	3564	1.2x
	HF Swin Large Inference (BS=1,Seq 384)	1	INT8	Samples per Second	411	345	1.2x
	HF Swin Large Inference (BS=32,Seq 384)	1	INT8	Samples per Second	478	570	0.8x

NVIDIA B200 GPUs theoretical performance in DGX systems

// GPU selector

Older NVIDIA GPUs

GPU	V100 SXM2 \| PCIe	T4	A2	A10	A30	L40
Architecture	Volta	Turing	Ampere	Ampere	Ampere	Ada Lovelace
Card chip	GV100	TU104	GA107	GA102	GA100	AD102
# CUDA cores	5 120	2 560	1 280	9 216	6 912	18 176
# Tensor cores	640	320	40	288	224	568
FP64 (TFlops)	7,8 \| 7	—	0,07	0,97	5,2	1,413
FP64 Tensor (TFlops)	—	—	—	—	10,3	—
FP32 (TFlops)	15,7 \| 14	8,1	4,5	31,2	10,3	90,52
TF32 Tensor (TFlops)	125 \| 112	125 \| 112	18*	125*	165*	181*
FP16 Tensor (TFlops)	—	65	35,9*	250*	330*	362*
INT8 Tensor (TOPS)	—	130	71,8*	500*	661*	724*
GPU memory	16 GB or 32 GB	16 GB	16 GB	24 GB	24 GB	48 GB
Memory technology	HBM2	GDDR6	GDDR6	GDDR6	HBM2	GDDR6
Memory throughput	900 GB/s	300 GB/s	200 GB/s	600 GB/s	933 GB/s	864 GB/s
Multi-Instance GPU	vGPU	vGPU	vGPU	vGPU	4 instance	vGPU
NVENC \| NVDEC \| JPEG engines	Yes \| Yes		1 \| 2	1 \| 1	0 \| 4 \| 1	3 \| 3
GPU link	PCIe 3	PCIe 3	PCIe 4	PCIe 4	NVLink 3	PCIe 4
Power consumption	300W \| 250W	70 W	40-60 W	150 W	165 W	300 W
Form factor	SXM2 \| PCIe gen3 2-slot FHFL	PCIe gen3 1-slot LP	PCIe gen4 1-slot LP	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL
Spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet
Announcement	2017	2018	2021	2021	2021	2022
Availability
GPU	V100 SXM2 \| PCIe	T4	A2	A10	A30	L40

*theoretical performance with Sparsity function

Availability: – not available

// Still not sure what is the best GPU for you? We are ready to assist you.

NVIDIA GPU selector

Regurarly updated

All info on one page

Benchmarks

GPU selector

Quick guide

Expert guide

Older GPUs

The most powerful GPU - NVIDIA GB200

Compute

General Purpose

High-Density VDI

Choose your GPU use case

NVIDIA B200

NVIDIA H200

NVIDIA L40S

NVIDIA L4

NVIDIA H100

NVIDIA A100

NVIDIA A30

NVIDIA A16

Expert guide

Benchmarks

NVIDIA A16, A100, V100, RTX4000 Ada by CTU FEE in Prague

PyTorch training time GPU comparison

MnasNET

ResNET

DesNET

NVIDIA A100 vs. NVIDIA L40s application benchmarks

NVIDIA B200 GPUs theoretical performance in DGX systems

Older NVIDIA GPUs

NEED A CONSULTATION?

AI servers

Internal Links

External Links

Legal