Today, if you want to analyze an image or video then Convolutional Neural Network is one of the popular choices available on the internet. In this series of lessons, we will cover what the CNN is all about, we will also cover TensorBoard and Keras libraries. For better understanding and hands-on experience on these libraries and CNN, we will build a fully functional model from scratch and I will guide you at every step. So
You will learn:
- Convolutional Neural Network
- Preprocessing the data
- Google Colab
So let’s get started
What is CNN
A neural network consists of a numerical input layer, some hidden layers, and the output layer, now if these hidden layers are the convolutional layer, then the neural network is said to be a Convolutional Neural Network,
I know this sounds technical or more precisely mathematical. but bear with me, I will explain each and everything in a simple way.
First of all, let’s see how this model looks like
Here there is an Input layer, 2 Hidden layers, and one an output layer,
To understand it better I would like you to imagine a scenario. This is kind of a story but trust me this is the best and simplest way I can teach you about CNN.
Some beautiful day you and your friends plan to go to a jungle. but for some reason, you are not able to make it, and you wait at the home for the friends to come by and tell you the stories of this huge mighty jungle.
Now tour is over and they are coming to you but one by one. As the jungle is huge they get scatter go to different places. and now they are describing you what a zoo looks like
One friend said that there were many lions.
The second one said I saw a rhino.
the third one said I saw nothing except trees,
And this process goes on, some told you about furious cheetahs, or some show their excitement because they saw an elephant.
You listened to each and every person and Now you have a whole image of a jungle, Note that they individually don’t have a whole picture as you do, but as they told you each and everything and you compiled everything, you have mode diverse picture of that place.
Now If it was not a familiar place and your friends told you the story in the same way then you could make an image of that place too and when someone shows you that place, then most probably you were able to figure out.
And that’s a CNN is all about, the only thing that left is relating above story with CNN models.
In Convolutional neural network, the input layer is the object image data which is in the form of numbers, then we pass it to the hidden layer, Hidden layer consist of filters,
Here filters are analogical to your friends, and they draw a conclusion over the different parts of an image, now they all send their conclusion to the final layer means you and you made a final conclusion of look and feel about image, This is an example of one Hidden layer CNN.
Now imagine if all of your friends told the same story to some other group of person individually the same way they told you, and that group come to you again individually and define about the place.
Note that Now each one of them has a full picture of that place like you do in the above example and they are telling you with many details. Now for sure this time you have a far better understanding of what it looks like. This is an example of two Hidden layer CNN.
Hope you are understanding.
Now, We read above that neural network process the data in numbers so let’s see how it works.
Every pixel in an image consists of three colors. Red Blue and Green (RBG) and the mixing of these colors define the color of that pixel, Each color in RBG ranges from 0-255. So, the color of the individual pixel looks like this (145,124,213).
But when the color is not the factor of determining what’s in the image, then we choose to convert the image into grayscale, the range of color of each pixel lies between 0-255. Thus, 0 is black and 255 is white, and between them, there are 253 shades of gray. So, the color of the individual pixel looks like this (152)
Now lets understand How the Convolution work.
When the image data is converted to the numerical data, then a window is opened to analyse the data, in the example below, the window size is 3×3 then the dot product operation is applied between the window matrix and the random filter matrix. and the resultant is saved in another matrix which then
In this example it is shown that how R-B-G matrix of an image get convoluted and form a new matrix.
But before passing it to next layer we generally pass it to the MaxPooling2D layer. It is used to filter out the max value in the defined window, Here the window size used is 2X2 and again it’s up to you to choose it. Lets understand with an example.
Now when the pooling is done, the resultant data will fed to another convolutional layer and this process will go on till the layer ends.
Lets see a Convolutional Neural Network visualisation and try to understand how Neural network is picking up different shapes and continously combining them to get the output.
Here we can see, in the first layer, it is finding some highlights and horizontal-vertical lines. In the next layer, it starts detecting the features of a face, like a nose, right eye, left eye and so. we can clearly see that it is making a progress, in the third layer, the CNN starts detecting a face which is quite impressive.
So, Now we have a fair idea about how this model works, in the next lessons we will learn more about the CNN layers and activation function and we will also train a model to detect a cat or dog.
if you have any doubts or suggestion then please comment below.
Thanks for reading 😀
Here is the link for CNN Part 2: Downloading and Preprocessing the car dataset.