Accelerating the Most Important Work of Our Time
NVIDIA A100 Tensor Vi xử lý Core GPU delivers unprecedented acceleration at every scale to tướng power the world’s highest-performing elastic data centers for AI, data analytics, and HPC. Powered by the NVIDIA Ampere Architecture, A100 is the engine of the NVIDIA data center platform. A100 provides up to tướng 20X higher performance over the prior generation and can be partitioned into seven GPU instances to tướng dynamically adjust to tướng shifting demands. The A100 80GB debuts the world’s fastest memory bandwidth at over 2 terabytes per second (TB/s) to tướng run rẩy the largest models and datasets.
The Most Powerful End-to-End AI and HPC Data Center Platform
A100 is part of the complete NVIDIA data center solution that incorporates building blocks across hardware, networking, software, libraries, and optimized AI models and applications from NGC™. Representing the most powerful end-to-end AI and HPC platform for data centers, it allows researchers to tướng rapidly deliver real-world results and deploy solutions into production at scale.
Bạn đang xem: a100
Making of Ampere Video
Deep Learning Training
Up to tướng 3X Higher AI Training on Largest Models
DLRM on HugeCTR framework, precision = FP16 | NVIDIA A100 80GB batch size = 48 | NVIDIA A100 40GB batch size = 32 | NVIDIA V100 32GB batch size = 32.
AI models are exploding in complexity as they take on next-level challenges such as conversational AI. Training them requires massive compute power and scalability.
NVIDIA A100 Tensor Cores with Tensor Float (TF32) provide up to tướng 20X higher performance over the NVIDIA Volta with zero code changes and an additional 2X boost with automatic mixed precision and FP16. When combined with NVIDIA® NVLink®, NVIDIA NVSwitch™, PCI Gen4, NVIDIA® InfiniBand®, and the NVIDIA Magnum IO™ SDK, it’s possible to tướng scale to tướng thousands of A100 GPUs.
A training workload lượt thích BERT can be solved at scale in under a minute by 2,048 A100 GPUs, a world record for time to tướng solution.
For the largest models with massive data tables lượt thích deep learning recommendation models (DLRM), A100 80GB reaches up to tướng 1.3 TB of unified memory per node and delivers up to tướng a 3X throughput increase over A100 40GB.
NVIDIA’s leadership in MLPerf, setting multiple performance records in the industry-wide benchmark for AI training.
Deep Learning Inference
A100 introduces groundbreaking features to tướng optimize inference workloads. It accelerates a full range of precision, from FP32 to tướng INT4. Multi-Instance GPU (MIG) technology lets multiple networks operate simultaneously on a single A100 for optimal utilization of compute resources. And structural sparsity tư vấn delivers up to tướng 2X more performance on top of A100’s other inference performance gains.
On state-of-the-art conversational AI models lượt thích BERT, A100 accelerates inference throughput up to tướng 249X over CPUs.
On the most complex models that are batch-size constrained lượt thích RNN-T for automatic speech recognition, A100 80GB’s increased memory capacity doubles the size of each MIG and delivers up to tướng 1.25X higher throughput over A100 40GB.
NVIDIA’s market-leading performance was demonstrated in MLPerf Inference. A100 brings 20X more performance to tướng further extend that leadership.
Up to tướng 249X Higher AI Inference Performance
BERT-Large Inference | CPU only: Xeon Gold 6240 @ 2.60 GHz, precision = FP32, batch size = 128 | V100: NVIDIA TensorRT™ (TRT) 7.2, precision = INT8, batch size = 256 | A100 40GB and 80GB, batch size = 256, precision = INT8 with sparsity.
Up to tướng 1.25X Higher AI Inference Performance
Over A100 40GB
RNN-T Inference: Single Stream
MLPerf 0.7 RNN-T measured with (1/7) MIG slices. Framework: TensorRT 7.2, dataset = LibriSpeech, precision = FP16.
To unlock next-generation discoveries, scientists look to tướng simulations to tướng better understand the world around us.
NVIDIA A100 introduces double precision Tensor Cores to tướng deliver the biggest leap in HPC performance since the introduction of GPUs. Combined with 80GB of the fastest GPU memory, researchers can reduce a 10-hour, double-precision simulation to tướng under four hours on A100. HPC applications can also leverage TF32 to tướng achieve up to tướng 11X higher throughput for single-precision, dense matrix-multiply operations.
For the HPC applications with the largest datasets, A100 80GB’s additional memory delivers up to tướng a 2X throughput increase with Quantum Espresso, a materials simulation. This massive memory and unprecedented memory bandwidth makes the A100 80GB the ideal platform for next-generation workloads.
Xem thêm: hình ảnh gấu trắng cute
11X More HPC Performance in Four Years
Top HPC Apps
Geometric mean of application speedups vs. P100: Benchmark application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec], MILC [Apex Medium], NAMD [stmv_nve_cuda], PyTorch (BERT-Large Fine Tuner], Quantum Espresso [AUSURF112-jR]; Random Forest FP32 [make_blobs (160000 x 64 : 10)], TensorFlow [ResNet-50], VASP 6 [Si Huge] | GPU node with dual-socket CPUs with 4x NVIDIA P100, V100, or A100 GPUs.
Up to tướng 1.8X Higher Performance for HPC Applications
Quantum Espresso measured using CNT10POR8 dataset, precision = FP64.
High-Performance Data Analytics
2X Faster than thở A100 40GB on Big Data Analytics Benchmark
Big data analytics benchmark | 30 analytical retail queries, ETL, ML, NLP on 10TB dataset | V100 32GB, RAPIDS/Dask | A100 40GB and A100 80GB, RAPIDS/Dask/BlazingSQL
Data scientists need to tướng be able to tướng analyze, visualize, and turn massive datasets into insights. But scale-out solutions are often bogged down by datasets scattered across multiple servers.
Accelerated servers with A100 provide the needed compute power—along with massive memory, over 2 TB/sec of memory bandwidth, and scalability with NVIDIA® NVLink® and NVSwitch™, —to tackle these workloads. Combined with InfiniBand, NVIDIA Magnum IO™ and the RAPIDS™ suite of open-source libraries, including the RAPIDS Accelerator for Apache Spark for GPU-accelerated data analytics, the NVIDIA data center platform accelerates these huge workloads at unprecedented levels of performance and efficiency.
On a big data analytics benchmark, A100 80GB delivered insights with a 2X increase over A100 40GB, making it ideally suited for emerging workloads with exploding dataset sizes.
7X Higher Inference Throughput with Multi-Instance GPU (MIG)
BERT Large Inference
BERT Large Inference | NVIDIA TensorRT™ (TRT) 7.1 | NVIDIA T4 Tensor Vi xử lý Core GPU: TRT 7.1, precision = INT8, batch size = 256 | V100: TRT 7.1, precision = FP16, batch size = 256 | A100 with 1 or 7 MIG instances of 1g.5gb: batch size = 94, precision = INT8 with sparsity.
A100 with MIG maximizes the utilization of GPU-accelerated infrastructure. With MIG, an A100 GPU can be partitioned into as many as seven independent instances, giving multiple users access to tướng GPU acceleration. With A100 40GB, each MIG instance can be allocated up to tướng 5GB, and with A100 80GB’s increased memory capacity, that size is doubled to tướng 10GB.
MIG works with Kubernetes, containers, and hypervisor-based server virtualization. MIG lets infrastructure managers offer a right-sized GPU with guaranteed quality of service (QoS) for every job, extending the reach of accelerated computing resources to tướng every user.
Get the Most From Your Systems
An NVIDIA-Certified System, comprising of A100 and NVIDIA Mellanox SmartnNICs and DPUs is validated for performance, functionality, scalability, and security allowing enterprises to tướng easily deploy complete solutions for AI workloads from the NVIDIA NGC catalog.
NVIDIA A100 for HGX
Ultimate performance for all workloads.
Xem thêm: t16
NVIDIA A100 for PCIe
Highest versatility for all workloads.
See the Latest MLPerf Benchmark Data
Inside the NVIDIA Ampere Architecture
Learn what’s new with the NVIDIA Ampere architecture and its implementation in the NVIDIA A100 GPU.