The previous article was about the process of convolution and its implementation. This article is about the padding, stride and the parameters involved in a CNN.
We have seen that there is a reduction of dimension in the output vector. A technique known as padding is done to preserve the original dimensions in the output vector. The only change in this process is that we add a boundary of ‘0s’ over the input vector and then do the convolution process.
Procedure to implement padding
- To get n*n output use a (n+2*n+2) input
- To get 7*7 output use 9*9 input
- In that 9*9 input fill the first row, first column, last row and last column with zero.
- Now do the convolution operation on it using a filter.
- Observe that the output has the same dimensions as of the input.
Zero is used since it is insignificant so as to keep the output dimension without affecting the results
Here all the elements in the input vector have been transferred to the output. Hence using padding we can preserve the originality of the input. Padding is denoted using P. If P=1 then one layer of zeroes is added and so on.
It is not necessary that the filter or kernel must be applied to all the cells. The pattern of applying the kernel onto the input vector is determined using the stride. It determines the shift or gaps in the cells where the filter has to be applied.-
S=1 means no gap is created. The filter is applied to all the cells.
S=2 means gap of 1. The filter is applied to alternative cells. This halves the dimensions on the output vector.
This diagram shows the movement of filter on a vector with stride of 1 and 2. With a stride of 2; alternative columns are accessed and hence the number of computations per row decreases by 2. Hence the output dimensions reduce while use stride.
The padding and stride are some features used in CNN.
Parameters in a convolution layer
The following are the terms needed for calculating the parameter for a convolution layer.
Width Wi – width of input image
Height Hi – height of input image
Depth Di – 3 since they follow RGB
We saw that 7*7 inputs without padding and stride along with 3*3 kernels gave a 5*5 output. It can be verified using this calculation.
The role of padding can also be verified using this calculation.
The f is known as filter size. It can be a 1*1, 3*3 and so on. It is a 1-D value so the first value is taken. There is another term K which refers to the number of kernels used. This value is fixed by user.
These values are similar to those of w and b. The machine learns the ideal value for these parameters for high efficiency. The significance of partial connection or CNN can be easily understood through the parameters.
Consider the same example of (30*30*3) vector. The parameter for CNN by using 10 kernels will be 2.7 million. This is a large number. But if the same is done using FNN then the parameters will be at least 100 million. This is almost 50 times that of before. This is significantly larger than CNN. The reason for this large number is due to the full connectivity.
Parameter= 30*30*3*3*10= 2.7M