[MNIST1] - Simple classification with DNN and CNN

Example of classification with a fully connected neural network.

AUTHOR : Jean-Luc Parouty (CNRS/SIMaP), adaped to PyTorch by Laurent Risser (CNRS/IMT)

Objectives :

  • Recognizing handwritten numbers
  • Understanding the principle of a classifier DNN network
  • Implementation with PyTorch

The MNIST dataset (Modified National Institute of Standards and Technology) is a must for Deep Learning.
It consists of 60,000 small images of handwritten numbers for learning and 10,000 for testing.

What we're going to do :

  • Retrieve data
  • Preparing the data
  • Create a model
  • Train the model
  • Evaluate the result

Step 1 - Init python stuff

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torchvision  #to get the MNIST dataset


import numpy as np
import matplotlib.pyplot as plt
import sys,os

sys.path.append('./MISC/fidle/')
import fidle_pwk_reduced as ooo
from fidle_pwk_additional import convergence_history_CrossEntropyLoss

Step 2 - Retrieve data

MNIST is one of the most famous historic dataset.
Include in torchvision datasets

In [2]:
#get and format the training set
mnist_trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=None)
x_train=mnist_trainset.data.type(torch.DoubleTensor)
y_train=mnist_trainset.targets


#get and format the test set
mnist_testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=None)
x_test=mnist_testset.data.type(torch.DoubleTensor)
y_test=mnist_testset.targets

#check data shape and format
print("Size of the train and test observations")
print(" -> x_train : ",x_train.shape)
print(" -> y_train : ",y_train.shape)
print(" -> x_test  : ",x_test.shape)
print(" -> y_test  : ",y_test.shape)

print("\nRemark that we work with torch tensors and not numpy arrays:")
print(" -> x_train.dtype = ",x_train.dtype)
print(" -> y_train.dtype = ",y_train.dtype)
Size of the train and test observations
 -> x_train :  torch.Size([60000, 28, 28])
 -> y_train :  torch.Size([60000])
 -> x_test  :  torch.Size([10000, 28, 28])
 -> y_test  :  torch.Size([10000])

Remark that we work with torch tensors and not numpy arrays:
 -> x_train.dtype =  torch.float64
 -> y_train.dtype =  torch.int64

Question :

Que représentent les différentes dimensions dans la taille de x_train (les 60000, 28 et 28) ?

Step 3 - Preparing the data

In [3]:
print('Before normalization : Min={}, max={}'.format(x_train.min(),x_train.max()))

xmax=x_train.max()
x_train = x_train / xmax
x_test  = x_test  / xmax

print('After normalization  : Min={}, max={}'.format(x_train.min(),x_train.max()))
Before normalization : Min=0.0, max=255.0
After normalization  : Min=0.0, max=1.0

Have a look

In [4]:
np_x_train=x_train.numpy().astype(np.float64)
np_y_train=y_train.numpy().astype(np.uint8)   #convert the images into numpy arrays, as ooo.plot_images uses input 'matrices' at this format as input

ooo.plot_images(np_x_train,np_y_train , [27],  x_size=5,y_size=5, colorbar=True)
ooo.plot_images(np_x_train,np_y_train, range(5,41), columns=12)

Step 4 - Create model

About informations about :

In [5]:
class MyModel(nn.Module):
    """
    Basic fully connected neural-network
    """
    def __init__(self):
        hidden1     = 100
        hidden2     = 100
        super(MyModel, self).__init__()
        self.hidden1 = nn.Linear(784, hidden1)
        self.hidden2 = nn.Linear(hidden1, hidden2)
        self.hidden3 = nn.Linear(hidden2, 10)

    def forward(self, x):
        x = x.view(-1,784)
        x = self.hidden1(x)
        x = F.relu(x)
        x = self.hidden2(x)
        x = F.relu(x)
        x = self.hidden3(x)
        x = F.softmax(x, dim=0)
        return x

    
    
model = MyModel()
print(model)
MyModel(
  (hidden1): Linear(in_features=784, out_features=100, bias=True)
  (hidden2): Linear(in_features=100, out_features=100, bias=True)
  (hidden3): Linear(in_features=100, out_features=10, bias=True)
)

Questions :

  • La fonction *view* permet de changer la forme d'un tenseur pytorch. Quel est alors l'interet de la ligne *x = x.view(-1,784)* ? On se souviendra que les images en entrée du réseau de neurone sont de taille 28*28.
  • Pourquoi la sortie du réseau est en dimension 10 ? On se souviendra que l'on souhaite estimer le label le plus probable parmis les labels 0, 1, 2, ... 9 ? Vous pouvez vous renseigner sur le *one hot encoding* ou les fonctions *softmax* pour répondre.

</font>

Step 5 - Train the model

5.1 - stochastic gradient descent strategy to fit the model

In [6]:
def fit(model,X_train,Y_train,X_test,Y_test, EPOCHS = 5, BATCH_SIZE = 32):
    loss = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(),lr=1e-3) #lr is the learning rate
    model.train()
        
    history=convergence_history_CrossEntropyLoss()
    history.update(model,X_train,Y_train,X_test,Y_test)
    
    n=X_train.shape[0] #number of observations in the training data
    
    #stochastic gradient descent
    for epoch in range(EPOCHS):
        batch_start=0
        epoch_shuffler=np.arange(n) 
        np.random.shuffle(epoch_shuffler) #remark that 'utilsData.DataLoader' could be used instead
        
        while batch_start+BATCH_SIZE < n:
            #get mini-batch observation
            mini_batch_observations = epoch_shuffler[batch_start:batch_start+BATCH_SIZE]
            var_X_batch = Variable(X_train[mini_batch_observations,:,:]).float() #the input image is flattened
            var_Y_batch = Variable(Y_train[mini_batch_observations])
            
            #gradient descent step
            optimizer.zero_grad()               #set the parameters gradients to 0
            Y_pred_batch = model(var_X_batch)   #predict y with the current NN parameters
            curr_loss = loss(Y_pred_batch, var_Y_batch)  #compute the current loss
            curr_loss.backward()                         #compute the loss gradient w.r.t. all NN parameters
            optimizer.step()                             #update the NN parameters
            
            #prepare the next mini-batch of the epoch
            batch_start+=BATCH_SIZE
            
        history.update(model,X_train,Y_train,X_test,Y_test)
    
    return history
5.2 - fit the model
In [7]:
model = MyModel()

batch_size  = 512
epochs      = 32

history=fit(model,x_train,y_train,x_test,y_test,EPOCHS=epochs,BATCH_SIZE = batch_size)

Step 6 - Evaluate

6.1 - Final loss and accuracy

In [8]:
var_x_test = Variable(x_test[:,:,:]).float()
var_y_test = Variable(y_test[:])
y_pred = model(var_x_test)

loss = nn.CrossEntropyLoss()
curr_loss = loss(y_pred, var_y_test)

val_loss = curr_loss.item()
val_accuracy  = float( (torch.argmax(y_pred, dim= 1) == var_y_test).float().mean() )


print('Test loss     :', val_loss)
print('Test accuracy :', val_accuracy)
Test loss     : 2.3017261028289795
Test accuracy : 0.901199996471405

6.2 - Plot history

In [9]:
ooo.plot_history(history, figsize=(6,4))

6.3 - Plot results

In [10]:
y_pred = model(var_x_test)
np_y_pred_label = torch.argmax(y_pred, dim= 1).numpy().astype(np.uint8)

np_x_test=x_test.numpy().astype(np.float64)
np_y_test=y_test.numpy().astype(np.uint8)

ooo.plot_images(np_x_test, np_y_test, range(0,60), columns=12, x_size=1, y_size=1, y_pred=np_y_pred_label)

Question :

Quel est l'intéret de *torch.argmax(y_pred, dim= 1)* ?

6.4 - Plot some errors

In [11]:
errors=[ i for i in range(len(np_y_test)) if np_y_pred_label[i]!=np_y_test[i] ]
errors=errors[:min(24,len(errors))]
ooo.plot_images(np_x_test, np_y_test, errors[:15], columns=6, x_size=2, y_size=2, y_pred=np_y_pred_label)
In [12]:
ooo.display_confusion_matrix(np_y_test,np_y_pred_label, range(10))

Confusion matrix is :

0 1 2 3 4 5 6 7 8 9
0 951.00 0.00 1.00 1.00 1.00 1.00 17.00 2.00 6.00 0.00
1 0.00 1059.00 4.00 28.00 0.00 0.00 7.00 3.00 34.00 0.00
2 19.00 0.00 910.00 24.00 7.00 1.00 28.00 15.00 27.00 1.00
3 2.00 2.00 19.00 913.00 1.00 27.00 2.00 10.00 28.00 6.00
4 0.00 7.00 5.00 1.00 861.00 1.00 44.00 1.00 11.00 51.00
5 26.00 4.00 13.00 31.00 15.00 722.00 18.00 13.00 48.00 2.00
6 16.00 2.00 7.00 1.00 7.00 14.00 906.00 0.00 5.00 0.00
7 3.00 26.00 27.00 12.00 15.00 0.00 3.00 917.00 8.00 17.00
8 10.00 5.00 7.00 24.00 8.00 14.00 15.00 8.00 875.00 8.00
9 6.00 2.00 1.00 10.00 27.00 23.00 13.00 6.00 23.00 898.00

Exercice :

Vous trouverez ci-dessous deux architectures de réseaux convolutionels. Vérifiez si ils vous permettent d'améliorer la précision des résultats.

  • On comparera les scores obtenus à ceux de *val_loss* et *val_accuracy* obtenu section 6.1
  • Attention : il sera nécessaire de changer la forme des données en entrée des réseaux CNN et CNN2 pour tenir compte du fait qu'on a des images sur un canal et de taille 28x28, i.e. chaque obsrvation a une taille (1,28,28). Pour cela il faudra décommenter et renseigner les commandes *x = x.view( TO DO )* dans les fonctions forward des reseaux CNN et CNN2.
  • </font>

In [13]:
class CNN(nn.Module):
    """
    Basic convolutional neural network
    """
    
    def __init__(self):
        super(CNN, self).__init__()
        
        #Input channels = 1, output channels = 6
        self.conv1 = nn.Conv2d(1, 6, kernel_size=3, stride=1, padding=1)
        
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        
        #1536 input features, 64 output features (see sizing flow below)
        self.fc1 = nn.Linear(6 * 14 * 14, 64)
        
        #64 input features, 10 output features for our 10 defined classes
        self.fc2 = nn.Linear(64, 10)
        
    def forward(self, x):
        #Make sure that the batch shape for input x is (nbBatchObs, 1, 28, 28)
        x = x.view(-1,1,28,28)
        
        #Computes the activation of the first convolution
        #Size changes from (1, 28, 28) to (6, 28, 28)
        x = F.relu(self.conv1(x))
        
        #Size changes from (6, 28, 28) to (6, 14, 14)
        x = self.pool(x)
        
        #Reshape data to input to the input layer of the neural net
        #Size changes from (6, 14, 14) to (1, 1176)
        #Recall that the -1 infers this dimension from the other given dimension
        x = x.view(-1, 6 * 14 *14)
        
        #Computes the activation of the first fully connected layer
        #Size changes from (1, 1176) to (1, 64)
        x = F.relu(self.fc1(x))
        
        #Computes the second fully connected layer (activation applied later)
        #Size changes from (1, 64) to (1, 10)
        x = self.fc2(x)
        
        return(x)


cnn = CNN()
print(cnn)


class CNN2(nn.Module):
    """
    Deeper convolutional neural network than CNN
    """
    
    #Our batch shape for input x is (1, 28, 28)
    
    def __init__(self):
        super(CNN2, self).__init__()
        
        #Input channels = 1, output channels = 6
        self.conv1 = nn.Conv2d(1, 6, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(6, 6, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(6, 6, kernel_size=3, stride=1, padding=1)
        
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        
        #1536 input features, 64 output features (see sizing flow below)
        self.fc1 = nn.Linear(6 * 7 * 7, 64)
        
        #64 input features, 10 output features for our 10 defined classes
        self.fc2 = nn.Linear(64, 10)
        
    def forward(self, x):
        
        #Make sure that the batch shape for input x is (nbBatchObs, 1, 28, 28)
        x = x.view(-1,1,28,28)
        
        #Computes the activation of the first convolution
        #Size changes from (1, 28, 28) to (6, 28, 28)
        x = F.relu(self.conv1(x))
        
        #Size changes from (6, 28, 28) to (6, 14, 14)
        x = self.pool(x)
        
        #convolution on the 6x14x14 image
        x = F.relu(self.conv2(x))
        
        #Size changes from (6, 14, 14) to (6, 7, 7)
        x = self.pool(x)

        #convolution on the 6x7x7 image
        x = F.relu(self.conv3(x))
        
        #Reshape data to input to the input layer of the neural net
        #Size changes from (6, 7, 7) to (1, 294)
        #Recall that the -1 infers this dimension from the other given dimension
        x = x.view(-1, 6 * 7 *7)
        
        #Computes the activation of the first fully connected layer
        #Size changes from (1, 294) to (1, 64)
        x = F.relu(self.fc1(x))
        
        #Computes the second fully connected layer (activation applied later)
        #Size changes from (1, 64) to (1, 10)
        x = self.fc2(x)
        
        
        return(x)


cnn2 = CNN2()
print(cnn2)
CNN(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=1176, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=10, bias=True)
)
CNN2(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(6, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv3): Conv2d(6, 6, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=294, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=10, bias=True)
)

CODE AVEC LE RESEAU CNN

In [14]:
cnn = CNN()

batch_size  = 512
epochs      = 32

history=fit(cnn,x_train,y_train,x_test,y_test,EPOCHS=epochs,BATCH_SIZE = batch_size)


var_x_test = Variable(x_test[:,:,:]).float()
var_y_test = Variable(y_test[:])
y_pred = cnn(var_x_test)

loss = nn.CrossEntropyLoss()
curr_loss = loss(y_pred, var_y_test)

val_loss = curr_loss.item()
val_accuracy  = float( (torch.argmax(y_pred, dim= 1) == var_y_test).float().mean() )


print('Test loss     :', val_loss)
print('Test accuracy :', val_accuracy)
Test loss     : 0.059179969131946564
Test accuracy : 0.9815000295639038

CODE AVEC LE RESEAU CNN2

In [15]:
cnn2 = CNN2()

batch_size  = 512
epochs      = 32

history=fit(cnn2,x_train,y_train,x_test,y_test,EPOCHS=epochs,BATCH_SIZE = batch_size)


var_x_test = Variable(x_test[:,:,:]).float()
var_y_test = Variable(y_test[:])
y_pred = cnn2(var_x_test)

loss = nn.CrossEntropyLoss()
curr_loss = loss(y_pred, var_y_test)

val_loss = curr_loss.item()
val_accuracy  = float( (torch.argmax(y_pred, dim= 1) == var_y_test).float().mean() )


print('Test loss     :', val_loss)
print('Test accuracy :', val_accuracy)
Test loss     : 0.04696321859955788
Test accuracy : 0.9847999811172485

Conclusion :

La qualité des prédictions a largement été augmentée en utilisant une architecture de réseau adaptée à la classification d'images.

In [ ]: