Neural Networks:How do they learn? Simply explained.....

 Hey guys! We have always wondered how neural networks actually make predictions. The predictions may be either correct or incorrect. This is where we will be learning the underlying superpower of mathematics: Calculus. After understanding the math behind the mechanism of neural networks it will be a piece of cake for you all to grab the idea behind how neural networks work.

As seen in the previous post neural networks comprises of multiple layers of neurons that will enable the ability to identify the features to be seen in an image. If you wish to look into the previous post on what neural network is, follow the link below. 

https://aidiariesofyoda.blogspot.com/2023/03/neural-networks-simply-explained.html

Each neuron assigns a random score (weight) to each pixel in the image and have their weighted sum of their product. Later they decide to activate /deactivate the neuron based on the value returned by the activation function on the weighted sum.


The most activated neuron in the output layer will give us the result. (Ex: decides whether it’s baby Yoda or not). This whole process is known as forward propagation.

But how do they learn?

We know that neural networks mimic the human brain. Humans learn through multiple trial and error methodology. Similarly neural networks also learn through making predictions and based on loss value they update their weights and biases during training.  The loss value tells us how much our predicted value deviates from the actual value. The loss value is like the feedback. This feedback will be passed backwards. Let’s consider the mean square error as our loss function. 

The above loss value will be calculated for a particular pixel at the input. Likewise, we will get the summation of all loss values to calculate the cost. If our network predicted correctly the summation cost value will be small else if it is a big value, it tells us that our network hasn’t learn enough to predict correctly.

In order to modify the weights and minimize the loss function in an efficient way such that the predicted value matches the actual value we use optimizers like Gradient Descent, Stochastic Gradient Descent, Adam.

We will look into the concept using Gradient Descent. It is an algorithm that finds the best fit line for given data.

Since we should reduce the error value, we can understand that we should reduce the weight value by a certain value such that the loss value is reduced, and the predicted value matches the expected output. Our aim is to find which weights and bias values minimize the cost function.

It’s just like a ball rolling down a hill towards the valley!

The purpose of derivatives used here is to understand how much change takes places in loss with respect to the weights. Their sign may indicate the direction the slope moves towards the global minima faster by taking the negative of the gradient of a function (because gradient of a function gives the direction in which the function increases).  When the sign is negative the slope moves towards right and when it’s positive the slope will move towards left towards the global minima. The learning rate decides how fast the loss converges to the global minimum. The choice of learning rate should be optimal because high learning rates could create big leaps towards the directed path and low learning rate is not very effective.

We all know that derivatives resemble the slope of a given point in a function. In order to reduce the loss value, we should adjust our weights such that the slope come closer to the global minima. We can achieve this by following the below steps.

Step 1: Start at a random point in the cost function

Step 2: Take small steps in the decreasing direction towards global minima

Step 3: Repeat until reach the global minima


This whole process of adjusting the weights and biases until the predicted value matches the actual value is known as the backpropagation. Let's see how the weights are adjusted. 

Calculus to the rescue: The chain ⛓ rule....

The weights are adjusted with the help of derivatives and chain rule. During each iteration this process of adjusting the weights will continue until the loss value reaches the global minima and the predicted values match the actual values. I hope you understand how neural networks actually learn with the help of calculus. If you have any queries let me know in the comment section.

May the force be with you! ✨

Post a Comment