- What is a Neural Network?
- What Is Data Normalization?
- How does forward propagation and backpropagation work in deep learning?
- What is the role of Activation Functions in a Neural Network?
- What is Cost Function?
- What is Gradient Descent?
- Why should we use Batch Normalization?
- Why does a Convolutional Neural Network (CNN) work better with image data?
- Why do RNNs work better with text data?
- What are exploding and vanishing gradients?
- What does Backpropagation mean?
- What are the Softmax and ReLU functions?
- What are Hyperparameters?
- What is Dropout?
- What is overfitting?
- What is underfitting?
- How are weights initialized in a Neural Network?
- What is Pooling with respect to CNN?
- What is an LSTM?
- What is an Epoch?
- What is Generative Adversarial Network?
- What is an auto-encoder?
- What is Bagging?
- What is Boosting?
- What is transfer learning?
- What is Data Augmentation in Deep Learning?
- Why is a deep neural network better than a shallow neural network?
- Is it possible to calculate the learning rate for a model a priori?

# Deep Learning Interview Questions and Answers (2023)

In this post, we put together the top Deep Learning interview questions and answers for beginner, intermediate and experienced candidates. These most important questions are categorized for quick browsing before the interview or to act as a detailed guide on different topics in Deep Learning interviewers look for.

### Deep Learning Interview Questions

#### What is a Neural Network?

View answer

Artificial neural networks, usually simply called neural networks, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain.

#### What Is Data Normalization?

View answer

Normalization in deep learning refers to the practice of transforming your data so that all features are on a similar scale, usually ranging from 0 to 1. This is especially useful when the features in a dataset are on very different scales.

#### How does forward propagation and backpropagation work in deep learning?

View answer

Forward propagation (or forward pass) refers to the calculation and storage of intermediate variables (including outputs) for a neural network in order from the input layer to the output layer. We now work step-by-step through the mechanics of a neural network with one hidden layer.

Backpropagation refers to the method of calculating the gradient of neural network parameters. In short, the method traverses the network in reverse order, from the output to the input layer, according to the chain rule from calculus. The algorithm stores any intermediate variables (partial derivatives) required while calculating the gradient with respect to some parameters.

#### What is the role of Activation Functions in a Neural Network?

View answer

An activation function is a function that is added into an artificial neural network in order to help the network learn complex patterns in the data. When comparing with a neuron-based model that is in our brains, the activation function is at the end deciding what is to be fired to the next neuron. That is exactly what an activation function does in an ANN as well. It takes in the output signal from the previous cell and converts it into some form that can be taken as input to the next cell.

#### What is Cost Function?

View answer

The cost function of a neural network will be the sum of errors in each layer. This is done by finding the error at each layer first and then summing the individual error to get the total error.

#### What is Gradient Descent?

View answer

Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks. Training data helps these models learn over time, and the cost function within gradient descent specifically acts as a barometer, gauging its accuracy with each iteration of parameter updates.

#### Why should we use Batch Normalization?

View answer

Batch normalization solves a major problem called internal covariate shift. It helps by making the data flowing between intermediate layers of the neural network look, this means you can use a higher learning rate. It has a regularizing effect which means you can often remove dropout.

#### Why does a Convolutional Neural Network (CNN) work better with image data?

View answer

Network CNNs are fully connected feed forward neural networks. CNNs are very effective in reducing the number of parameters without losing on the quality of models. Images have high dimensionality (as each pixel is considered as a feature) which suits the previously described abilities of CNNs.

#### Why do RNNs work better with text data?

View answer

RNN is a class of artificial neural network where connections between nodes form a directed graph along a sequence. It is basically a sequence of neural network blocks that are linked to each other like a chain. Each one is passing a message to a successor.

This architecture allows RNN to exhibit temporal behavior and capture sequential data which makes it a more ‘natural’ approach when dealing with textual data since text is naturally sequential.

#### What are exploding and vanishing gradients?

View answer

The vanishing gradient problem describes a situation encountered in the training of neural networks where the gradients used to update the weights shrink exponentially. As a consequence, the weights are not updated anymore, and learning stalls.

The exploding gradient problem describes a situation in the training of neural networks where the gradients used to update the weights grow exponentially. This prevents the backpropagation algorithm from making reasonable updates to the weights, and learning becomes unstable.

#### What does Backpropagation mean?

View answer

Backpropagation is used to adjust how accurately or precisely a neural network processes certain inputs. Backpropagation as a technique uses gradient descent: It calculates the gradient of the loss function at output, and distributes it back through the layers of a deep neural network.

#### What are the Softmax and ReLU functions?

View answer

Softmax is a very interesting activation function because it not only maps our output to a [0,1] range but also maps each output in such a way that the total sum is 1. The output of Softmax is therefore a probability distribution.

The ReLU function is another non-linear activation function that has gained popularity in the deep learning domain. ReLU stands for Rectified Linear Unit. The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time.

#### What are Hyperparameters?

View answer

Hyperparameters are parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning. The prefix 'hyper_' suggests that they are 'top-level' parameters that control the learning process and the model parameters that result from it.

#### What is Dropout?

View answer

Dropout is a regularization method that approximates training a large number of neural networks with different architectures in parallel.

During training, some number of layer outputs are randomly ignored or “dropped out.” This has the effect of making the layer look-like and be treated-like a layer with a different number of nodes and connectivity to the prior layer. In effect, each update to a layer during training is performed with a different “view” of the configured layer.

#### What is overfitting?

View answer

Overfitting is a concept in data science, which occurs when a statistical model fits exactly against its training data. When this happens, the algorithm unfortunately cannot perform accurately against unseen data, defeating its purpose. Generalization of a model to new data is ultimately what allows us to use machine learning algorithms every day to make predictions and classify data.

#### What is underfitting?

View answer

Underfitting refers to a model that can neither model the training data nor generalize to new data. An underfit machine learning model is not a suitable model and will be obvious as it will have poor performance on the training data.

#### How are weights initialized in a Neural Network?

View answer

Neural network models are fit using an optimization algorithm called stochastic gradient descent that incrementally changes the network weights to minimize a loss function, hopefully resulting in a set of weights for the mode that is capable of making useful predictions.

#### What is Pooling with respect to CNN?

View answer

Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters to learn and the amount of computation performed in the network. The pooling layer summarises the features present in a region of the feature map generated by a convolution layer.

#### What is an LSTM?

View answer

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems. This is a behavior required in complex problem domains like machine translation, speech recognition, and more.

#### What is an Epoch?

View answer

The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset. One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters.

#### What is Generative Adversarial Network?

View answer

A generative adversarial network (GAN) is a machine learning (ML) model in which two neural networks compete with each other to become more accurate in their predictions. GANs typically run unsupervised and use a cooperative zero-sum game framework to learn.

#### What is an auto-encoder?

View answer

An autoencoder is an unsupervised learning technique for neural networks that learns efficient data representations (encoding) by training the network to ignore signal “noise.” Autoencoders can be used for image denoising, image compression, and, in some cases, even generation of image data.

#### What is Bagging?

View answer

Bagging, also known as Bootstrap aggregating, is an ensemble learning technique that helps to improve the performance and accuracy of machine learning algorithms. It is used to deal with bias-variance trade-offs and reduces the variance of a prediction model. Bagging avoids overfitting of data and is used for both regression and classification models, specifically for decision tree algorithms.

#### What is Boosting?

View answer

Boosting is an ensemble learning method that combines a set of weak learners into a strong learner to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and then trained sequentially—that is, each model tries to compensate for the weaknesses of its predecessor. With each iteration, the weak rules from each individual classifier are combined to form one, strong prediction rule.

#### What is transfer learning?

View answer

The reuse of a previously learned model on a new problem is known as transfer learning.

It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems and from the huge jumps in skill that they provide on related problems.

#### What is Data Augmentation in Deep Learning?

View answer

Data augmentation is a set of techniques to artificially increase the amount of data by generating new data points from existing data. This includes making small changes to data or using deep learning models to generate new data points.

#### Why is a deep neural network better than a shallow neural network?

View answer

For the same level of accuracy, deeper networks can be much more efficient in terms of computation and number of parameters. Deeper networks are able to create deep representations, at every layer, the network learns a new, more abstract representation of the input. A shallow network has less number of hidden layers.

#### Is it possible to calculate the learning rate for a model a priori?

View answer

For simple models, it could be possible to set the best learning rate value a priori. However, for complex models, it is not possible to calculate the best learning rate through theoretical deductions that can actually make accurate predictions. Observations and experiences do play a vital role in defining the optimal learning rate.