This notebook is meant to introduce convolutional layers, with special emphasis on the relation between the dimension of the input tensor, the kernel size, the stride, the number of filters and the dimension of the output tensor.

In [ ]:
import tensorflow as tf
In [ ]:
from tensorflow.keras.layers import Input, Conv2D, ZeroPadding2D, Dense, Flatten, Layer
from tensorflow.keras.models import Model
from tensorflow.keras import metrics
from tensorflow.keras.datasets import mnist

We run the example over the mnist data set. Keras provides a very friendly access to several renowed databases, comprising mnist, cifar10, cifar100, IMDB and many others. See https://keras.io/api/datasets/ for documentation

In [ ]:
import numpy as np
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
11501568/11490434 [==============================] - 0s 0us/step

Mnist images are grayscale images with pixels in the range [0,255]. We pass to floats, and normalize them in the range [0,1].

In [ ]:
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

Bidimensional convolutions expect input with three dimensions (plus an additional batchsize dimension): width, height, channels. Since mnist digits have only two dimensions (being in grayscale), we need to extend them with an additional dimension.

In [ ]:
(n,w,h) = x_train.shape
x_train = x_train.reshape(n,w,h,1)
(n,w,h) = x_test.shape
x_test = x_test.reshape(n,w,h,1)
print(x_train.shape)
print(x_test.shape)
(60000, 28, 28, 1)
(10000, 28, 28, 1)

Mnist labels are integers in the range [0,9]. Since the network will produce probabilities for each one of these categories, if we want to compare it with the ground trouth probability using categorical crossentropy, that is the traditional choice, we should change each integer in its categorical description, using e.g. the "to_categorical" function in utils.

Alternatively, we can use the so called "sparse categorical crossentropy" loss function https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy that allows us to directly compare predictions with labels.

In [ ]:
#y_train = keras.utils.to_categorical(y_train)
#y_test = keras.utils.to_categorical(y_test)

Let us come to the convolutional network. We define a simple network composed by three convolutional layers, followed by a couple of Dense layers.

In [ ]:
xin = Input(shape=(28,28,5))
x = Conv2D(16,(3,3),strides=(2,2),padding='valid')(xin)
x = Conv2D(32,(3,3),strides=(2,2),padding='valid')(x)
x = Conv2D(64,(3,3),strides=(2,2),padding='valid')(x)
x = Flatten()(x)
x = Dense(64, activation ='relu')(x)
res = Dense(10,activation = 'softmax')(x)

mynet = Model(inputs=xin,outputs=res)

Let's have a look at the summary

In [ ]:
mynet.summary()
Model: "model_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_17 (InputLayer)       [(None, 28, 28, 5)]       0         
                                                                 
 conv2d_42 (Conv2D)          (None, 13, 13, 16)        736       
                                                                 
 conv2d_43 (Conv2D)          (None, 6, 6, 32)          4640      
                                                                 
 conv2d_44 (Conv2D)          (None, 2, 2, 64)          18496     
                                                                 
 flatten_14 (Flatten)        (None, 256)               0         
                                                                 
 dense_28 (Dense)            (None, 64)                16448     
                                                                 
 dense_29 (Dense)            (None, 10)                650       
                                                                 
=================================================================
Total params: 40,970
Trainable params: 40,970
Non-trainable params: 0
_________________________________________________________________

In valid mode, no padding is applied. Along each axis, the output dimension O is computed from the input dimension I using the formula O=(I-K)/S +1, where K is the kernel dimension and S is the stride.

For all layers, K=3 and S=2. So, for the first conv we pass from dimension 28 to dimension (28-3)/2+1 = 13, then to dimension (13-3)/2+1 = 6 and finally to dimension (6-3)/2+1 = 2.

Exercise: modify "valid" to "same" and see what happens.

The second important point is about the number of parameters. You must keep in mind that a kernel of dimension K1 x K2 has an actual dimension K1 x K2 x CI, where CI is number of input channels: in other words the kernel is computing at the same time spatial and cross-channel correlations.

So, for the first convolution, we have 3 x 3 x 1 + 1 = 10 parameters for each filter (1 for the bias), and since we are computing 16 filters, the number of parameters is 10 x 16 = 160.

For the second convolution, each filter has 3 x 3 x 16 + 1 = 145 parameters, ans since we have 32 filters, the total number of parameters is 145 x 32 = 4640.

Let us come to training.

In addition to the optimizer and the loss, we also pass a "metrics" argument. Metrics are additional functions that are not directly used for training, but allows us to monitor its advancement. For instance, we use accuracy, in this case (sparse, because we are using labels, and cateogrical because we have multiple categories).

In [ ]:
mynet.compile(optimizer='adam',loss='sparse_categorical_crossentropy', metrics=[metrics.SparseCategoricalAccuracy()])
In [ ]:
mynet.fit(x_train,y_train, shuffle=True, epochs=10, batch_size=32,validation_data=(x_test,y_test))
Epoch 1/10
1875/1875 [==============================] - 19s 5ms/step - loss: 0.2662 - sparse_categorical_accuracy: 0.9207 - val_loss: 0.1366 - val_sparse_categorical_accuracy: 0.9582
Epoch 2/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.1377 - sparse_categorical_accuracy: 0.9582 - val_loss: 0.1102 - val_sparse_categorical_accuracy: 0.9653
Epoch 3/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.1119 - sparse_categorical_accuracy: 0.9657 - val_loss: 0.1032 - val_sparse_categorical_accuracy: 0.9680
Epoch 4/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0981 - sparse_categorical_accuracy: 0.9704 - val_loss: 0.1010 - val_sparse_categorical_accuracy: 0.9683
Epoch 5/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0898 - sparse_categorical_accuracy: 0.9724 - val_loss: 0.0908 - val_sparse_categorical_accuracy: 0.9722
Epoch 6/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0824 - sparse_categorical_accuracy: 0.9746 - val_loss: 0.1063 - val_sparse_categorical_accuracy: 0.9665
Epoch 7/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0756 - sparse_categorical_accuracy: 0.9761 - val_loss: 0.0949 - val_sparse_categorical_accuracy: 0.9707
Epoch 8/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0717 - sparse_categorical_accuracy: 0.9772 - val_loss: 0.0959 - val_sparse_categorical_accuracy: 0.9723
Epoch 9/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0650 - sparse_categorical_accuracy: 0.9794 - val_loss: 0.0915 - val_sparse_categorical_accuracy: 0.9735
Epoch 10/10
1875/1875 [==============================] - 8s 5ms/step - loss: 0.0621 - sparse_categorical_accuracy: 0.9804 - val_loss: 0.0956 - val_sparse_categorical_accuracy: 0.9733
Out[ ]:
<keras.callbacks.History at 0x7fa3f06ff5d0>