Multi-Tasking PyTorch Model Building

We are going to develop a model which can Classify a digit from Mnist and also can classify the addition of predected numbers and a random number.We will Do this task using PyTorch:

PyTorch Model Building

We have to Build a neural network that can:

  1. Take 2 inputs: An image from MNIST data set and arandom number between 0 and 9

2. Gives two outputs: The “number” that was represented by the MNIST image, and the “sum” of this number with the random number that was generated and sent as the input to the network

We can mix fully connected layers and convolution layers.We can use one-hot encoding to represent the random number input as well as the “summed” output.

1>Data representation.

​ This problem statement need 2 types of data as input. 1> Image 2> A random number. The output of the data is going to be 1> Classifying what is the digit in the image . and the sum of The digit and random number.

For this Problem we are using MNIST data set. MNIST Handwritten Digit Classification Dataset. It is of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9 for training and 10,000 data points for testing.

We are using Numpy to create a 60000 Random Numbers to generate between 0 to 9 . And using 19 (0…..18) classes to represent the sum .

2>Data generation strategy

​ To Create the data we have the Function called data_generator. which helps us returning the Image , Label ,random number , sum value which we can use to create a Data loader.

MNISTRandom_loader Is the data loader which Gives us the required data for the model to Train and Test. We have Done one hot encoding to the random numbers and sum outputs in the loader function.

3>How to combine the two inputs ?

​ Now we have to combine the Two inputs to pass in the model. To do so once we have the data from data loader we train the model 1 i.e MNISt with the image data and return the features of shape [1,10]. Then we have the MNISTadder which needs to be trained for the sum function. for that we have concatenated the Output on model 1 and input using[mnist_d,Num],dim = -1) . where mnist_d is of [1,10] shape and Num is [1,10] shape . Output of the concatenation is [1,20] shape .

To understand Better let’s Visualise the Graph.


  • Blue boxes: these correspond to the tensors we use as parameters, the ones we’re asking PyTorch to compute gradients for;
  • Gray box: a Python operation that involves a gradient-computing tensor or its dependencies;
  • Green box: the same as the Gray box, except it is the starting point for the computation of gradients (assuming the backward() method is called from the variable used to visualize the graph) — they are computed from the bottom-up in a graph.

Layers used

For the image model we have the Receptive field calculated .

4> What results you finally got and how did you evaluate your results ?

Above is the Loss graph for the model. The Loss is the combination of Loss 1 And Loss 2. where Loss one is F.nll_loss(y_pred1, target1) and Loss 2 is nn.CrossEntropyLoss(). To do this calculation of loss we have created a function total_loss.

Training model Epochs

epochs : 25

Batch size = 128

Lr : 0.01 (Not used any tool for finding correct lr yet)

optimizer: SGD

Accuracy1,Loss1 = MNIST model accuracy and loss

Accuracy 2,loss2 = Sum model accuracy and loss

Testing the model

5>What loss function we picked and why ?

For The MNISt (CNN) model We are using the Loss function as The negative log likelihood loss.

The negative log-likelihood is bad at smaller values, where it can reach infinite (that’s too sad), and becomes good at larger values. Because we are summing the loss function to all the correct classes, what’s actually happening is that whenever the network assigns high confidence at the correct class, the unhappiness is low, and vice-versa. The input given through a forward call is expected to contain log-probabilities of each class. Means it needs a LogSoftmax layer before this .

Formula: $$ Hp(q) = -1/N ∑ yi * log(p(yi)) + (1-yi) * log(1-p(yi)) $$

  • For the Numeric model I tried CrossEntropyLoss.

Because from the PyTorch Documentation I learned that Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. We may use CrossEntropyLoss instead, if you prefer not to add an extra layer.

As we don’t have a LogSoftmax layer in MNISTadder (Summing ) model .

Formula: $$ loss(x,class)=−log( ∑ j​exp(x[j]) exp(x[class])​)=−x[class]+log( j ∑​exp(x[j])) $$

6>MUST use the GPU.

Created A GPU checker function To return the GPU information.

To Push the mode to GPU we have send all the Data and target to device = ‘cuda’ by using the below code.

Please refer to my Github link for code