Five technological breakthroughs brought on by the new NVidia Pascal architecture

Almost a decade ago, NVIDIA introduced the G80 GPU and The NVIDIA CUDA parallel computing platform for the use of GPUs to accelerate computationally-intensive workloads.

Today, NVIDIA Tesla GPUs are used by Datacentres to speed up HPC and Big Data applications, while also enabling leading-edge Artificial Intelligence (AI) and Deep Learning systems.

NVIDIA, after months of speculations and anticipation, finally announced the next generation of GeForce Video Cards, the GeForce 1000 series. Powered by the new Pascal architecture, NVIDIA is claiming that the 1000 series will set a new high bar for performance. The 1000s will offer a collection of new features which will set it apart from its predecessors. The first two high-end cards, GTX 1080 and GTX 1070, will be the straightforward successors of the 900 series cards, GTX 980 and 970, respectively.

The GTX 1080 and The GTX 1070 are both powered by the Pascal architecture. As their Website claims, the NVIDIA Pascal architecture came up with five technological breakthroughs.

THE 16 NANOMETER FinFET

The New Pascal GPU is the world’s largest FinFET chip ever built, with 150 billion transistors built on bleeding edge 16-nanometre FinFET fabrication technology. It is engineered to give the best energy while delivering the fastest performance for workloads with near infinite computing needs.

EXPONENTIAL PERFORMANCE LEAP

The Pascal compute architecture transforms a computer into a supercomputer that delivers unprecedented performance. It delivers over 5 Teraflops of double precision performance for HPC workloads. Pascal-powered systems provide over 12xleap in neural network training performance compared to the current GPU architectures.

MAXIMUM APPLICATION SCALABILITY

The NVIDIA NVLink high speed bidirectional interconnect could be integrated, thanks to the new Pascal architecture. As more multi-GPU systems are being deployed at different levels, from workstations to supercomputers, servers, to solve bigger and complex problems. Multiple groups of multi-GPU Systems are being interconnected using InfiniBand and 100 GB Ethernet to form much larger and powerful systems.

This NVIDIA NV Link high speed bidirectional interconnect is designed to scale various applications across multiple GPUs while delivering a 5x acceleration in interconnect bandwidth compared to today’s best-in-class solution.

CoWos AND HBM2

The Pascal architecture unifies the processor and data into a single package which helps to deliver unprecedented efficiency in computing. The Tesla P100 is the world’s first GPU architecture to support HBM2 memory. The  HBM2 memory provides considerable space savings compared to traditional GDDR5 as the HBM2 memory is stacked memory and is located on the same physical package as the GPU. The CoWos (Chip-on-Wafer-on-Substrate)along with HBM2 provides a 3xboost or three times the memory bandwidth of the Maxwell GM200 GPU(GTX900 series).

ARTIFICIAL INTELLIGENCE ALGORITHMS

The ability of a machine to learn and make decisions on its own is the holy grail of computing. Highly sophisticated deep neural networks are needed to process massive amounts of data. In 2012,it took 2000 CPUs(16000 CPU Cores) for Google Brain(Google’s Deep Learning Project)to recognize cats by watching movies on YouTube. But Around the same time, a study showed that 12 NVIDIA GPUs could deliver the deep learning performance of 2000 CPUs.

It is now widely recognized within academia and industry that GPUs are the state of the art in training deep neural networks and NVIDIA is at the forefront.

The new Tesla P100 also includes features that increase the performance for deep learning. Like the Maxwell GPU architecture, the Pascal GP100 GPU also includes support for 16-bit storage and arithmetic. The Tesla P100 with its 3584 processing cores delivers over 21 TFLOPS of the FP16 processing power of Deep Learning applications. The high-speed NVLink interconnect significantly increases the available performance to 170 TFLOPS/sec for training highly complex multilayered DNNs.