What is a GPU? Are GPUs Needed for Deep Learning?

Diving into a technical explanation on what is a GPU

Towards AI Team

Author(s): Buse Yaren Tekin, Roberto Iriondo

In the age of the intelligent era, more and more innovative concepts emerge daily. One of them is undoubtedly the field of “artificial intelligence.” In this case, specifically, “deep learning,” a sub-branch of AI, will encounter us when we need to dive deeper into complex problems, and it is used in many areas today. We develop deep learning models to achieve specific tasks, and in some cases, surpass human, repetitive capabilities. In this article, we will explore how the GPU is used for deep learning with code examples.

This tutorial’s code is available on Github and its full implementation as well on Google Colab.

“In the era where artificial intelligence and algorithms make more decisions in our lives and in organizations, the time has come for people to tap into their intuition as an adjunct to today’s technical capabilities. Our inner wisdom can embed empirical data with humanity.”

– Abhishek Ratna [12]

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device [1].

The graphics processing unit is a computer chip that performs rapid mathematical equations to render images. Specific or integrated may be part of a graphics card.

In embedded systems, it is quite possible to see GPUs in many areas, such as personal computers or workstations. Depending on the parallel processing structure. GPUs come with advantages over graphics processors contained in image and video processing. GPUs are becoming more popular and needed for artificial intelligence (AI) these days.

Source: Photo by Christian Wiediger on Unsplash

While using deep learning methods, the GPU, which is frequently mentioned, is continually trying the possibilities that exist in the background. Hence, this situation is not very abstract. While we talked about the GPU’s more thematic area above, we will discuss using a GPU and what it means for deep learning.

In the field of deep learning, more speed and performance are needed when it comes to training models. To be more specific, think about the complex structure of artificial neural networks. While these neural networks work primarily with large datasets, an increase in uptime will be observed while training the training set. Besides, as the dataset grows, the training time can sometimes even take up extended periods.

Suppose we have a sample dataset with image content. Algorithmically, take action for each data regarding feed-forward and backpropagation of ANNs in this dataset. If we do not have a GPU, the machine will take on more of the processing load, and therefore it will take a very long time to give us the processing result.

Therefore, one of the most essential hardware we will need when developing a deep learning model is the GPU.

CPU vs. GPU Architecture

A central processing unit (CPU), also called a central processor, microprocessor, main processor, or just processor, executes instructions comprising a computer program.

The traditional CPU (Central Processing Unit) bases on approaches that do not bring the best solution to perform parallel computing tasks due to their cost and scalability issues. As seen in the image above, the CPU performs the basic arithmetic, logic, control, and Input /Output (I / O) operations specified by the program’s instructions [6].

Photo by Oleg Gospodarec on Unsplash

In contrast, a GPU (graphics processing unit) is a specialized type of microprocessor, primarily designed for quick image rendering. GPUs appeared to respond to graphically intense applications that burden the CPU and degraded computer performance.

CPUs and GPUs are entirely interchangeable (depending upon the task at hand), and they both accomplish their job in different ways. Brain and power are usually given as examples of this pair. Because the CPU is the central processing unit, it is called the computer’s brain. Of course, the GPU would not make any sense without a CPU. Since the CPU is assumed to be the brain, it deals with different types of computations, while the GPU must focus on a specific task.

While the CPU can solve the processes, it will do one after the other, and the GPU can efficiently solve multiple tasks synchronously at the same time. In a way, these two complement each other.

In addition to switching devices, multiple GPUs can be used synchronously to increase a deep learning model’s performance. This hardware variant is called Multi GPU. We can run multiple GPUs in parallel or continue without parallel. When the GPU is powered up without parallel, each GPU will work separately. Therefore parallel Multi GPU is advised to improve performance.

The use of multiple GPUs is very flexible in PyTorch and TensorFlow frameworks, which are popular deep learning models.

PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab [8].

TensorFlow is an open-source framework that flexibly enables model parallelism.

TensorFlow is a free and open-source software library for machine learning. It can be used across a range of tasks but focuses on training and inference of deep neural networks [10].

🎥 Let’s examine how an image is processed in Adobe Illustrator to see the differences between the GPU’s working mechanism and the CPU. 🎥

Adobe Illustrator CC: NVIDIA GPUs vs. CPU

NVIDIA CEO Jensen Huang describes a three-pronged attempt to bring accelerated and AI computing to Arm CPU platforms at the GPU Technology Conference (GTC) Keynote: “Computing for the Age of AI.” It also summarizes what NVIDIA is doing to advance the AI era. On a Mythbusters Demo, Adam Savage and Jamie Hyneman demonstrate the power of GPU computing.

Photo by Lars Kienle on Unsplash

As a piece of additional information, besides the CPU and GPU, there is a data processing unit called DPU. Admittedly, perhaps the only element on the computer for many years has been the CPU. The graphics processing unit, which we call the GPU, has now taken on the role, and recently, DPUs have been created by building multi-core CPUs programmable with software in data centers [5].

When developing deep learning models, we must first make sure that our computer has a GPU and available.

⚙️OS: Windows 10 Pro
⚙️CUDA Toolkit: 10
⚙️cuDNN: 7.4
⚙️TensorFlow GPU: 1.14.0
⚙️Keras: 2.2.5

There are many options for software technology. We use one of our favorite machine learning frameworks called TensorFlow, which has much documentation.

Determine the processor suitable for the machine by visiting the CUDA GPUs. For example, in the image below, we have learned the processing capacity by checking the machine I work with actively.

Controlling the Quadro RTX 5000 Processor

To use the GPU for deep learning tasks, we need to install the appropriate cuDNN tool and CUDA toolkit for our machine. Otherwise, we will not be able to use the GPU. To use the GPU with TensorFlow, installing the TensorFlow-GPU library is necessary if loading with Conda. The appropriate CUDA and cuDNN versions will also be displayed during the process.

We will need to install the CUDA and cuDNN tool that matches the version of TensorFlow we will be using. As a warning, if we download different versions, we will encounter many errors. In TensorFlow 2.x versions, log or shape errors can be received. We have noticed that after multiple installations, TensorFlow-GPU works fine with 1.14.0 or 1.15.0.

Available TensorFlow Versions for CUDA [Res] [4]

The compilations required for TensorFlow-gpu==1.14.0 show 7.4 for cuDNN and version 10 for the CUDA Toolkit.

When installing with Conda, we will receive approval for the installation. We can also see the CUDA and cuDNN versions installed while giving this approval. In this way, we will see that we are on the right track.

Image by the author

However, the Keras library must be installed in addition to TensorFlow. Keras is an open-source software library that provides a Python interface for neural networks. Keras acts as an interface for the TensorFlow library [11].

📚 Deep Learning with Python, written by François Chollet, the creator of Keras, is very successful for those who want to work in this field. To create a GPU setup environment with Deep Learning, visit the article Creating a Deep Learning Environment with TensorFlow GPU.

Virtual environments are often used to avoid incorrect installation in the base environment of the machine. In this step, a virtual environment of a specific Python version is set up.

conda create -n virtualenv python=3.6conda activate virtualenv

Conda or pip commands are used to load TensorFlow GP. In this step, TensorFlow GPU will be installed.

pip install tensorflow-gpu

If no version is specified when installing the TensorFlow GPU, the latest version will be installed. If a specific version is to be installed, it is done by writing the version.

pip install tensorflow-gpu==1.15.0

After the TensorFlow GPU is installed, we should run the following line of code for control purposes.

import tensorflow as tf
tf.test.gpu_device_name()

To check the TensorFlow GPU version we have installed, run the pip show command.

pip show tensorflow-gpu

To check the availability of the GPU, the following snippet is run.

%tensorflow_version 2.ximport tensorflow as tfdevice_name = tf.test.gpu_device_name()if device_name != ‘/device:GPU:0’:raise SystemError(‘GPU device not found’)print(‘Found GPU at: {}’.format(device_name))

When using the TensorFlow 2.4.1 version, the TensorFlow 2.x version is shown. This case is used for 1.x versions as follows.

%tensorflow_version 1.ximport tensorflow as tfdevice_name = tf.test.gpu_device_name()if device_name != ‘/device:GPU:0’:raise SystemError(‘GPU device not found’)print(‘Found GPU at: {}’.format(device_name))

Computers are likely to have more than one GPU. By default, our machine or our media uses the GPU, thanks to Google Colab. The other device can be used instead of this GPU used instead. It will work for device / GPU: 1 in the following snippet.

import tensorflow as tftf.device(‘/device:GPU:1’)

A parallel GPU usage has been reached. Also, we must mention that it is possible to set the GPU even on our terminal before going into the notebook.

from tensorflow.python.client import device_libdevice_lib.list_local_devices()

When considered for Colab Notebook, four devices will be offered. These are CPU, GPU, XLA_CPU, and XLA_GPU. Four devices are shown in the list here. Two of them are a concept excluding CPU and GPU.

As mentioned in TensorFlow’s documentation, XLA stands for “accelerated linear algebra.” Tensorflow’s relatively new optimizing compiler can further speed up our ML models’ GPU operations by combining what used to be multiple CUDA kernels into one [13].

import tensorflow as tftry: tf.device(‘/job:localhost/replica:0/task:0/device:GPU:1’)except RuntimeError as e: print(e)

If the physical GPU did not exist for device 1, we suggest trying the code below. To learn the name of the device used, the test GPU will run when it is a GPU.

import tensorflow as tftf.test.gpu_device_name()

Model training is sometimes tested to check the GPU operating speed. A convolutional network is run on certain layers by loading data from the MNIST dataset.

import tensorflow as tfmnist = tf.keras.datasets.fashion_mnist(training_images, training_labels), (test_images, test_labels) = mnist.load_data()training_images=training_images / 255.0test_images=test_images / 255.0model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),tf.keras.layers.Dense(128, activation=tf.nn.relu),tf.keras.layers.Dense(10, activation=tf.nn.softmax)])model.compile(optimizer=’adam’, loss=’sparse_categorical_crossentropy’, metrics=[‘accuracy’])model.fit(training_images, training_labels, epochs=5)test_loss = model.evaluate(test_images, test_labels)

Here, as the number of data increases and the problem becomes more complex, the difference between training will become much wider. Since there is not a very large dataset in this example line of code, there are no big differences between CPU and GPU when processing data. However, there will be a significant difference when processing big data.

import tensorflow as tfsess=tf.Session(config=tf.ConfigProto(log_device_placement=True))

Along with TensorFlow, the Keras framework, which has been mentioned frequently recently, TensorFlow is now used as tf.keras. It is the framework that should be used in deep learning environments.

pip install keras==2.2.5

Bandwidth refers to the external bandwidth between a GPU and its associated system. It is a measure of the data transfer speed across the bus that connects the two (for example, PCIe or Thunderbolt). Bandwidth does not refer to the internal bandwidth of a GPU, which is a measure of the data transfer speed between components within the GPU [7] [9].

The CPU uses too much memory when a model is being trained during deep learning. Principally, this situation will be encountered when processing large datasets. However, when looking at the GPU, when training the model, while processing with memory called VRAM, the CPU’s remaining memory will be allocated to other tasks. Thus, even in complex problems, the process can be completed with few cycles.

GPU vs. CPU Performance [Res] [2]

The high-end NVIDIA GPUs have much, much wider buses and higher memory clock rates than any CPU. The maximum memory bandwidth is the maximum rate at which data can be read from or stored into a processor’s semiconductor memory (in GB/s).

The theoretical maximum memory bandwidth for Intel Core X-Series Processors can be calculated by multiplying the memory frequency (one half since double data rate x 2), multiplied by the number of the bytes of width, and multiplied by the number of the channels supported for the processor [2].

For example:
For DDR4 2933 the memory supported in some core-x -series is (1466.67 X 2) X 8 (# of bytes of width) X 4 (# of channels) = 93,866.88 MB/s bandwidth, or 94 GB/s [Res] [2][3].

A lower-than-expected memory bandwidth may be seen due to many system variables, such as software workloads and system power states.

High-end NVIDIA GPUs have much wider buses and higher memory clock speeds than any CPU. Considering the Intel processor Core i7 with the highest memory bandwidth, it appears to have a memory bus with a width of 192 bits and a memory speed (effectively) up to 800 MHz. The fastest NVIDIA GPU is GTX 285 [Res] [3].

Leave a Reply