NVIDIA GPU selector

Regurarly updated

Always updated with the latest NVIDIA accelerators and information.

All info on one page

One pager information with all main parameters and AI use-cases.

Benchmarks

We provide real application benchmarks of NVIDIA GPUs for comparison.

// choose your best accelerator

GPU selector

Quick guide

This guide will help to choose your AI accelerator quickly.

Expert guide

Performance benchmarks and technical parameters for AI experts.

Older GPUs

You can also find technical specifications and details of older NVIDIA GPUs.

// NVIDIA BLACKWELL - GRACE SUPERCHIP

The most powerful GPU - NVIDIA GB200

GPU memory per card

Up to 1 GB

Compute

AI training and inferencing,
data analytics, HPC

General Purpose

Visualization, rendering, AI,
virtual workstations

High-Density VDI

Virtual applications, virtual desktops,
virtual workstations

// Quick Guide

Choose your GPU use case

// GPU selector

Expert guide

You can scroll the table to the right to see all current NVIDIA GPUs including the most powerfull NVIDIA B200 and NVIDIA H200. You can also select GPU cards you would like to display and compare by clicking on Column visibility button. Some of the GPUs are hidden by default.

GPU	A2	A10	A16	A30	A40	L4	L40	L40S	RTX PRO 6000 Blackwell SE	V100 SXM2 \| PCIe	A100 SXM4 \| PCIe	H100 PCIe	H100 NVL	H200 NVL	H100 \| H200 SXM5	GH200	B200 SXM5	GB200
Architecture	Ampere	Ampere	Ampere	Ampere	Ampere	Ada Lovelace	Ada Lovelace	Ada Lovelace	Blackwell	Volta	Ampere	Hopper	Hopper	Hopper	Hopper	Grace+Hopper	Blackwell	Grace+Blackwell
Card chip	GA107	GA102	GA107	GA100	GA102	AD104	AD102	AD102	GB202	GV100	GA100	GH100	GH100	GH100	GH100	1xGrace+1xH100	B200	1xGrace+2xB200
# CUDA cores	1 280	9 216	4x 1 280	6 912	10 752	7 680	18 176	18 176	24 064	5 120	6 912	14 592	16 896	—	16 896	TBA	TBA	—
# Tensor cores	40	288	4x 40	224	336	240	568	568	752	640	432	456	528	—	528	TBA	TBA	—
FP64 (TFlops)	0,07	0,97	0,271	5,2	1,179	0,49	1,41	1,41	1.968	7,8 \| 7	9,69	25,6	30	34	34	34	31.04	—
FP64 Tensor (TFlops)	—	—	—	10,3	—	—	—	—	—	—	19,5	51	60	67	67	67	—	90
FP32 (TFlops)	4,5	31,2	4x 4,5	10,3	37,4	30,3	90,52	91,6	126.0	15,7 \| 14	19,5	51	60	67	67	67	62.08	180
TF32 Tensor (TFlops)	18*	125*	4x 18*	165*	150*	120*	181*	366*	—	125 \| 112	312*	756*	835*	989*	989*	989*	2 200*	5 000*
FP16 Tensor (TFlops)	35,9*	250*	4x 35,9*	330*	299*	242*	362*	733*	—	—	624*	1 513*	1 671*	1 979*	1 979*	1 979*	4 500*	10 000*
INT8 Tensor (TOPS)	71,8*	500*	4x 71,8*	661*	599*	FP8 485*	724*	1 466*	—	—	1 248*	3 026*	3 341*	3 958*	3 958*	3 958*	9 000*	20 000*
FP8 (TFlops)	—	—	—	—	—	—	—	1 466*	—	—	—	—	—	—	3 958*	—	9 000*	20 000*
FP4 (TFlops)	—	—	—	—	—	—	—	1 466*	4 000*	—	—	—	—	—	—	—	18 000*	40 000*
GPU memory	16 GB	24 GB	4x 16 GB	24 GB	48 GB	24 GB	48 GB	48 GB	96 GB	16 or 32 GB	40 or 80 GB	80 GB	94 GB	141 GB	80 \| 141 GB	96 GB or 144 GB	180 GB	360 GB
Memory technology	GDDR6	GDDR6	GDDR6	HBM2	GDDR6	GDDR6	GDDR6	GDDR6	GDDR7	HBM2	HBM2	HBM3	HBM3	HBM3e	HBM3 \| HBM3e	HBM3 or HBM3e	HBM3e	HBM3e
Memory throughput	200 GB/s	600 GB/s	4x 200 GB/s	933 GB/s	696 GB/s	300 GB/s	864 GB/s	864 GB/s	1.6 TB/s	900 GB/s	1 935 \| 2 039 GB/s	2	3.9 TB/s	4.8 TB/s	3.3 \| 4.8 TB/s	4 or 4.9 TB/s	8 TB/s	16 TB/s
Multi-Instance GPU	vGPU	vGPU	vGPU	4 instance	vGPU	vGPU	vGPU	vGPU	4 instances	vGPU	7 instances	7 instances	7 instances	7 instances	7 instances	7 instances	7 instances	14 instances
NVENC \| NVDEC \| JPEG engines	1 \| 2	1 \| 1	4 \| 8	0 \| 4 \| 1	1 \| 2	2 \| 4 \| 4	3 \| 3	3 \| 3 \| 4	4 \| 4 \| 4	Yes \| Yes	0 \| 5 \| 5	0 \| 7 \| 7	0 \| 7 \| 7	0 \| 7 \| 7	0 \| 7 \| 7	0 \| 7 \| 7	0 \| 14 \| 14
GPU link	PCIe 4	PCIe 4	PCIe 4	NVLink 3	NVLink 3	PCIe 4	PCIe 4	PCIe 4	PCIe 5	PCIe 3	NVLink 3	NVLink 4	NVLink 4	NVLink 4	NVLink 4	NVLink 5	NVLink 5	NVLink 5
Power consumption	40-60 W	150 W	250 W	165 W	300 W	40-72W	300 W	350 W	600 W	300W \| 250W	400W \| 300W	350W	400W	600W	700W	1000W²	1000W	1200W²
Form factor	PCIe gen4 1-slot LP	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen4 1-slot LP	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen5 2-slot FHFL	SXM2 \| PCIe gen3 2-slot FHFL	SXM4 \| PCIe gen4 2-slot FHFL	PCIe gen5 2-slot FHFL	PCIe gen5 2-slot FHFL	PCIe gen5 2-slot FHFL	SXM5 card	Superchip	SXM5 card	Superchip
Spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	—	spec sheet
Announcement	2021	2021	2021	2021	2020	2023	2022	2023	2025	2017	2020	2022	2023	2024	2022 \| 2023	2023	2024	2024
Availability															\|
GPU	A2	A10	A16	A30	A40	L4	L40	L40S	RTX PRO 6000 Blackwell SE	V100 SXM2 \| PCIe	A100 SXM4 \| PCIe	H100 PCIe	H100 NVL	H200 NVL	H100 \| H200 SXM5¹⁾	GH200	B200 SXM5¹⁾	GB200¹⁾

*theoretical performance with Sparsity function
1) preliminary numbers
2) the total power consumption of CPU, GPU and memory on the superchip

Availability: – good (on stock or 4-6 weeks), – medium (around 10 weeks), – bad (15 weeks+), – not available

NVIDIA GPU accelerators – the core of AI and HPC in the modern data center.
Solving the world’s most important scientific, industrial, and business challenges with AI and HPC. Visualizing complex content to create cutting-edge products, tell immersive stories, and reimagine cities of the future. Designed for the age of elastic computing, rises to all these challenges, providing unmatched acceleration at every scale.

// GPU selector

Benchmarks

A16, A100, V100, RTX4000 Ada (CTU Prague)
A100, L40s (NVIDIA)
DGX H100, DGX B200 (NVIDIA)

NVIDIA A16, A100, V100, RTX4000 Ada by CTU FEE in Prague

PyTorch training time GPU comparison

MnasNET

Time (lower is better)

ResNET

Time (lower is better)

DesNET

Time (lower is better)

NVIDIA A100 vs. NVIDIA L40s application benchmarks

	Benchmarks	# GPUs	Precision	Metric	A100¹	L40S	L40S/A100
DL Training	GPT 7B² (GBS=512)	8	FP16/FP8	Samples/sec	13.5	15.7	1.2x
	ResNet-50 V1.5 Training (BS=32)	1	FP16	Images/sec	2707	2748	1.0x
	BERT Large Pre-Training Phase 1 (BS=128, seq 512)	1	FP16	Sequences/sec	579	472	0.8x
	BERT Large Pre-Training Phase 2 (BS=8, seq 512)	1	FP16	Sequences/sec	152	161	1.1x
DL Inference	ResNet-50 V1.5 Inference (BS=32)	1	INT8	Images/sec	23439	34588	1.5x
	BERT Large Inference (BS=8, seq 128)	1	INT8	Sequences/sec	3011	4090	1.3x
	BERT Large Inference (BS=8, seq 384)	1	INT8	Sequences/sec	1116	1598	1.4x
	BERT Large Inference (BS=128, seq 128)	1	INT8	Sequences/sec	5065	5273	1.0x
	BERT Large Inference (BS=128, seq 384)	1	INT8	Sequences/sec	1445	1558	1.1x
Stable Diffusion	Demo Diffusion 2.1 Inference (BS=1, 512x512)	1	FP16	Pipeline Latency (ms)	827	743	1.1x
	Demo Diffusion 2.1 Inference (BS=1, 1024x1024)	1	FP16	Pipeline Latency (ms)	4186	3582	1.2x
	Stable Diffusion XL (BS=1, PyTorch native)	1	FP16	Pipeline Latency (ms)	10450	11194	0.9x
	Stable Diffusion XL (BS=1, PyTorch optimized)	1	FP16	Pipeline Latency (ms)	7353	7382	1.0x
	Stable Diffusion XL (BS=1, TRT optimized)	1	FP16	Pipeline Latency (ms)	5251	5547	1.0x
DL Inference	GPT2 Inference (BS=1)	1	FP16	Samples/sec	1333	1828	1.4x
	GPT2 Inference (BS=32)	1	FP16	Samples/sec	6502	7578	1.2x
	GPT2 Inference (BS=128)	1	FP16	Samples/sec	6850	6701	1.0x
	DLRM (BS=1)	1	TF32	Records/sec	6495	9458	1.5x
	DLRM (BS=64)	1	TF32	Records/sec	319131	517072	1.6x
	DLRM (BS=2048)	1	TF32	Records/sec	4668287	6980429	1.5x
	ViT Inference (BS=32, seq 224)	1	FP16	Samples per Second	1556	1477	1.0x
	ViT Inference (BS=32, seq 384)	1	FP16	Samples per Second	501	404	0.8x
	HF Swin Base Inference (BS=1,Seq 224)	1	INT8	Samples per Second	633	920	1.5x
	HF Swin Base Inference (BS=32,Seq 224)	1	INT8	Samples per Second	2998	3564	1.2x
	HF Swin Large Inference (BS=1,Seq 384)	1	INT8	Samples per Second	411	345	1.2x
	HF Swin Large Inference (BS=32,Seq 384)	1	INT8	Samples per Second	478	570	0.8x

NVIDIA B200 GPUs theoretical performance in DGX systems

// Still not sure what is the best GPU for you? We are ready to assist you.

NVIDIA GPU selector

Regurarly updated

All info on one page

Benchmarks

GPU selector

Quick guide

Expert guide

Older GPUs

The most powerful GPU - NVIDIA GB200

Compute

General Purpose

High-Density VDI

Choose your GPU use case

NVIDIA B200

NVIDIA H200

NVIDIA L40S

NVIDIA L4

NVIDIA A16

Expert guide

Benchmarks

NVIDIA A16, A100, V100, RTX4000 Ada by CTU FEE in Prague

PyTorch training time GPU comparison

MnasNET

ResNET

DesNET

NVIDIA A100 vs. NVIDIA L40s application benchmarks

NVIDIA B200 GPUs theoretical performance in DGX systems

NEED A CONSULTATION?

AI servers

Internal Links

External Links

Legal