CNN Part 1: Understanding the working of Convolutional Neural Network

Today, if you want to analyze an image or video then Convolutional Neural Network is one of the popular choices available on the internet. In this series of lessons, we will cover what the CNN is all about, we will also cover TensorBoard and Keras libraries. For better understanding and hands-on experience on these libraries and CNN, we will build a fully functional model from scratch and I will guide you at every step. So lets start the journey to Convolutional Neural Network

You will learn:

  • Convolutional Neural Network
  • Preprocessing the data
  • Pickling
  • Google Colab
  • Tensorflow
  • Keras
  • Tensorboard

So let’s get started

What is CNN

A neural network consists of a numerical input layer, some hidden layers, and the output layer, now if these hidden layers are the convolutional layer, then the neural network is said to be a Convolutional Neural Network,

I know this sounds technical or more precisely mathematical. but bear with me, I will explain each and everything in a simple way.

First of all, let’s see how this model looks like

CNN 2 layer NN

Here there is an Input layer, 2 Hidden layers, and one an output layer,

To understand it better I would like you to imagine a scenario. This is kind of a story but trust me this is the best and simplest way I can teach you about CNN.

Some beautiful day you and your friends plan to go to a jungle. but for some reason, you are not able to make it, and you wait at the home for the friends to come by and tell you the stories of this huge mighty jungle.

Now tour is over and they are coming to you but one by one.  As the jungle is huge they get scatter go to different places. and now they are describing you what a zoo looks like

One friend said that there were many lions.

The second one said I saw a rhino.

the third one said I saw nothing except trees,

And this process goes on, some told you about furious cheetahs, or some show their excitement because they saw an elephant.

You listened to each and every person and Now you have a whole image of a jungle, Note that they individually don’t have a whole picture as you do, but as they told you each and everything and you compiled everything, you have mode diverse picture of that place.

Now If it was not a familiar place and your friends told you the story in the same way then you could make an image of that place too and when someone shows you that place, then most probably you were able to figure out.

And that’s a CNN is all about, the only thing that left is relating above story with CNN models.

In Convolutional neural network, the input layer is the object image data which is in the form of numbers, then we pass it to the hidden layer, Hidden layer consist of filters,

Here filters are analogical to your friends, and they draw a conclusion over the different parts of an image, now they all send their conclusion to the final layer means you and you made a final conclusion of look and feel about image, This is an example of one Hidden layer CNN.

CNN 1 layer NN

Now imagine if all of your friends told the same story to some other group of person individually the same way they told you, and that group come to you again individually and define about the place.

Note that Now each one of them has a full picture of that place like you do in the above example and they are telling you with many details. Now for sure this time you have a far better understanding of what it looks like. This is an example of two Hidden layer CNN.

CNN 2 layer NN

Hope you are understanding.

Now, We read above that neural network process the data in numbers so let’s see how it works.

Every pixel in an image consists of three colors. Red Blue and Green (RBG) and the mixing of these colors define the color of that pixel, Each color in RBG ranges from 0-255. So, the color of the individual pixel looks like this (145,124,213).

But when the color is not the factor of determining what’s in the image, then we choose to convert the image into grayscale, the range of color of each pixel lies between 0-255. Thus, 0 is black and 255 is white, and between them, there are 253 shades of gray. So, the color of the individual pixel looks like this (152)

Now lets understand How the Convolution work.

When the image data is converted to the numerical data, then a window is opened to analyse the data, in the example below, the window size is 3×3 then the dot product operation is applied between the window matrix and the random filter matrix. and the resultant is saved in another matrix which then serve as a new matrix for next convolution layer.

In this example it is shown that how R-B-G matrix of an image get convoluted and form a new matrix.

R-B-G matrix convoluted with filters to form a new matrix for next convolutional layer.

But before passing it to next layer we generally pass it to the MaxPooling2D layer. It is used to filter out the max value in the defined window, Here the window size used is 2X2 and again it’s up to you to choose it. Lets understand with an example.

MaxPooling2D Function

Now when the pooling is done, the resultant data will fed to another convolutional layer and this process will go on till the layer ends.

Lets see a Convolutional Neural Network visualisation and try to understand how Neural network is picking up different shapes and continously combining them to get the output.

Visualization of CNN

Here we can see, in the first layer, it is finding some highlights and horizontal-vertical lines. In the next layer, it starts detecting the features of a face, like a nose, right eye, left eye and so. we can clearly see that it is making a progress, in the third layer, the CNN starts detecting a face which is quite impressive.

So, Now we have a fair idea about how this model works, in the next lessons we will learn more about the CNN layers and activation function and we will also train a model to detect a cat or dog.

if you have any doubts or suggestion then please comment below.

Thanks for reading 😀

Here is the link for CNN Part 2: Downloading and Preprocessing the car dataset.

Leave a Reply

Your email address will not be published. Required fields are marked *