Key Objectives:

Building a neural network that differentiates two hand-written digits 3 and 8.
Comparing the results of this Neural Network (NN) to that of a Logistic Regression (LR) model.

Requirements:

'Kudzu' : A neural network library that was designed during our course by Univ.AI.
MNIST Database

If MNIST is not installed, use the command !pip install mnist given below. It can be run both from the command line and Jupyter Notebook.

!pip install mnist

Collecting mnist
  Downloading mnist-0.2.2-py2.py3-none-any.whl (3.5 kB)
Requirement already satisfied: numpy in /opt/hostedtoolcache/Python/3.6.14/x64/lib/python3.6/site-packages (from mnist) (1.19.5)
Installing collected packages: mnist
Successfully installed mnist-0.2.2

Importing necessary libraries

%load_ext autoreload
%autoreload 2

%matplotlib inline
import matplotlib.pyplot as plt

import numpy as np
import pandas as pd

Preparing the Data

import mnist

train_images = mnist.train_images()
train_labels = mnist.train_labels()

train_images.shape, train_labels.shape

((60000, 28, 28), (60000,))

test_images = mnist.test_images()
test_labels = mnist.test_labels()

test_images.shape, test_labels.shape

((10000, 28, 28), (10000,))

image_index = 7776 # You may select anything up to 60,000
print(train_labels[image_index]) 
plt.imshow(train_images[image_index], cmap='Greys')

2

<matplotlib.image.AxesImage at 0x7fe34d79e1d0>

Filter data to get 3 and 8 out

train_filter = np.where((train_labels == 3 ) | (train_labels == 8))
test_filter = np.where((test_labels == 3) | (test_labels == 8))
X_train, y_train = train_images[train_filter], train_labels[train_filter]
X_test, y_test = test_images[test_filter], test_labels[test_filter]

We normalize the pixel values in the 0 to 1 range

X_train = X_train/255.
X_test = X_test/255.

Setup the labels as 1 (when the digit is 3) and 0 (when the digit is 8)

y_train = 1*(y_train==3)
y_test = 1*(y_test==3)

X_train.shape, X_test.shape

((11982, 28, 28), (1984, 28, 28))

Reshape the input data to create a linear array

X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)
X_train.shape, X_test.shape

((11982, 784), (1984, 784))

Importing appropriate functions from 'Kudzu'

from kudzu.layer import Sigmoid
from kudzu.layer import Relu
from kudzu.layer import Affine, Sigmoid

from kudzu.model import Model
from kudzu.train import Learner
from kudzu.optim import GD
from kudzu.data import Data, Dataloader, Sampler

from kudzu.callbacks import AccCallback
from kudzu.callbacks import ClfCallback

from kudzu.loss import MSE

Let us create a `Config` class, to store important parameters.

This class essentially plays the role of a dictionary.

class Config:
    pass
config = Config()
config.lr = 0.001
config.num_epochs = 251
config.bs = 50

Initializing data to the variables

data = Data(X_train, y_train.reshape(-1,1))
sampler = Sampler(data, config.bs, shuffle=True)

dl = Dataloader(data, sampler)

opt = GD(config.lr)
loss = MSE()

training_xdata = X_train
testing_xdata = X_test
training_ydata = y_train.reshape(-1,1)
testing_ydata = y_test.reshape(-1,1)

Running Models with the Training data

Details about the network layers:

A first affine layer has 784 inputs and does 100 affine transforms. These are followed by a Relu
A second affine layer has 100 inputs from the 100 activations of the past layer, and does 100 affine transforms. These are followed by a Relu
A third affine layer has 100 activations and does 2 affine transformations to create an embedding for visualization. There is no non-linearity here.
A final "logistic regression" which has an affine transform from 2 inputs to 1 output, which is squeezed through a sigmoid.

Help taken from Anshuman's Notebook.

# layers for the Neural Network
layers = [Affine("first", 784, 100), Relu("first"), Affine("second", 100, 100), Relu("second"), Affine("third", 100, 2), Affine("final", 2, 1), Sigmoid("final")]
model_nn = Model(layers)

# layers for the Logistic Regression
layers_lr = [Affine("logits", 784, 1), Sigmoid("sigmoid")]
model_lr = Model(layers_lr)

# suffix _nn stands for Neural Network.
learner_nn = Learner(loss, model_nn, opt, config.num_epochs)
acc_nn = ClfCallback(learner_nn, config.bs, training_xdata , testing_xdata, training_ydata, testing_ydata)
learner_nn.set_callbacks([acc_nn])

print("====== Neural Network ======")
learner_nn.train_loop(dl)

====== Neural Network ======
Epoch 0, Loss 0.2278
Training Accuracy: 0.7556, Testing Accuracy: 0.7424

Epoch 10, Loss 0.0845
Training Accuracy: 0.9109, Testing Accuracy: 0.9148

Epoch 20, Loss 0.0558
Training Accuracy: 0.9384, Testing Accuracy: 0.9466

Epoch 30, Loss 0.0442
Training Accuracy: 0.9503, Testing Accuracy: 0.9577

Epoch 40, Loss 0.038
Training Accuracy: 0.9573, Testing Accuracy: 0.9647

Epoch 50, Loss 0.0341
Training Accuracy: 0.9611, Testing Accuracy: 0.9682

Epoch 60, Loss 0.0314
Training Accuracy: 0.9638, Testing Accuracy: 0.9698

Epoch 70, Loss 0.0293
Training Accuracy: 0.9660, Testing Accuracy: 0.9708

Epoch 80, Loss 0.0277
Training Accuracy: 0.9685, Testing Accuracy: 0.9713

Epoch 90, Loss 0.0263
Training Accuracy: 0.9696, Testing Accuracy: 0.9713

Epoch 100, Loss 0.0251
Training Accuracy: 0.9705, Testing Accuracy: 0.9713

Epoch 110, Loss 0.0241
Training Accuracy: 0.9722, Testing Accuracy: 0.9718

Epoch 120, Loss 0.0232
Training Accuracy: 0.9735, Testing Accuracy: 0.9733

Epoch 130, Loss 0.0224
Training Accuracy: 0.9750, Testing Accuracy: 0.9733

Epoch 140, Loss 0.0216
Training Accuracy: 0.9756, Testing Accuracy: 0.9733

Epoch 150, Loss 0.021
Training Accuracy: 0.9768, Testing Accuracy: 0.9733

Epoch 160, Loss 0.0203
Training Accuracy: 0.9777, Testing Accuracy: 0.9733

Epoch 170, Loss 0.0198
Training Accuracy: 0.9785, Testing Accuracy: 0.9733

Epoch 180, Loss 0.0192
Training Accuracy: 0.9794, Testing Accuracy: 0.9738

Epoch 190, Loss 0.0187
Training Accuracy: 0.9804, Testing Accuracy: 0.9738

Epoch 200, Loss 0.0183
Training Accuracy: 0.9810, Testing Accuracy: 0.9738

Epoch 210, Loss 0.0178
Training Accuracy: 0.9813, Testing Accuracy: 0.9743

Epoch 220, Loss 0.0174
Training Accuracy: 0.9820, Testing Accuracy: 0.9748

Epoch 230, Loss 0.017
Training Accuracy: 0.9826, Testing Accuracy: 0.9753

Epoch 240, Loss 0.0166
Training Accuracy: 0.9831, Testing Accuracy: 0.9753

Epoch 250, Loss 0.0163
Training Accuracy: 0.9836, Testing Accuracy: 0.9753

0.05062484031787654

Logistic Regression based Implementation.

learner_lr = Learner(loss, model_lr, opt, config.num_epochs)
acc_lr = ClfCallback(learner_lr, config.bs, training_xdata , testing_xdata, training_ydata, testing_ydata)
learner_lr.set_callbacks([acc_lr])

print("====== Logistic Regression ======")
learner_lr.train_loop(dl)

====== Logistic Regression ======
Epoch 0, Loss 0.2511
Training Accuracy: 0.6647, Testing Accuracy: 0.6739

Epoch 10, Loss 0.1039
Training Accuracy: 0.9107, Testing Accuracy: 0.9249

Epoch 20, Loss 0.0796
Training Accuracy: 0.9286, Testing Accuracy: 0.9405

Epoch 30, Loss 0.0685
Training Accuracy: 0.9361, Testing Accuracy: 0.9531

Epoch 40, Loss 0.0618
Training Accuracy: 0.9406, Testing Accuracy: 0.9561

Epoch 50, Loss 0.0573
Training Accuracy: 0.9444, Testing Accuracy: 0.9597

Epoch 60, Loss 0.054
Training Accuracy: 0.9469, Testing Accuracy: 0.9607

Epoch 70, Loss 0.0514
Training Accuracy: 0.9496, Testing Accuracy: 0.9612

Epoch 80, Loss 0.0494
Training Accuracy: 0.9514, Testing Accuracy: 0.9622

Epoch 90, Loss 0.0477
Training Accuracy: 0.9528, Testing Accuracy: 0.9627

Epoch 100, Loss 0.0462
Training Accuracy: 0.9535, Testing Accuracy: 0.9637

Epoch 110, Loss 0.045
Training Accuracy: 0.9542, Testing Accuracy: 0.9637

Epoch 120, Loss 0.044
Training Accuracy: 0.9552, Testing Accuracy: 0.9647

Epoch 130, Loss 0.043
Training Accuracy: 0.9556, Testing Accuracy: 0.9667

Epoch 140, Loss 0.0422
Training Accuracy: 0.9565, Testing Accuracy: 0.9667

Epoch 150, Loss 0.0415
Training Accuracy: 0.9570, Testing Accuracy: 0.9672

Epoch 160, Loss 0.0408
Training Accuracy: 0.9577, Testing Accuracy: 0.9672

Epoch 170, Loss 0.0402
Training Accuracy: 0.9582, Testing Accuracy: 0.9672

Epoch 180, Loss 0.0396
Training Accuracy: 0.9585, Testing Accuracy: 0.9672

Epoch 190, Loss 0.0391
Training Accuracy: 0.9589, Testing Accuracy: 0.9672

Epoch 200, Loss 0.0386
Training Accuracy: 0.9591, Testing Accuracy: 0.9677

Epoch 210, Loss 0.0381
Training Accuracy: 0.9597, Testing Accuracy: 0.9682

Epoch 220, Loss 0.0377
Training Accuracy: 0.9601, Testing Accuracy: 0.9682

Epoch 230, Loss 0.0373
Training Accuracy: 0.9605, Testing Accuracy: 0.9693

Epoch 240, Loss 0.037
Training Accuracy: 0.9607, Testing Accuracy: 0.9688

Epoch 250, Loss 0.0366
Training Accuracy: 0.9609, Testing Accuracy: 0.9693

0.04126065747352803

Comparing results of NN and LR

plt.figure(figsize=(15,10))

# Neural Network plots
plt.plot(acc_nn.accuracies, 'r-', label = "Training Accuracies - NN")
plt.plot(acc_nn.test_accuracies, 'g-', label = "Testing Accuracies - NN")

# Logistic Regression plots
plt.plot(acc_lr.accuracies, 'k-', label = "Training Accuracies - LR")
plt.plot(acc_lr.test_accuracies, 'b-', label = "Testing Accuracies - LR")

plt.ylim(0.8, 1)

plt.legend()

<matplotlib.legend.Legend at 0x7fe34537b128>

From the plot, we can observe the following:

Neural Network achieves higher accuracy than the Logistic Regression model.
This apparently, is because of overfitting, i.e. NN captures more noise than data.
Testing accuracy of NN drops below the Training accuracy at higher epochs. This explains the over-fitting on training data.
Logistic Regression gives a reliable accuracy, without the above mentioned problem.

Moving till the last but one layer (excluding it).

Plotting the outputs of this layer of the NN.

model_new = Model(layers[:-2])

plot_testing = model_new(testing_xdata)

plt.figure(figsize=(8,7))
plt.scatter(plot_testing[:,0], plot_testing[:,1], alpha = 0.1, c = y_test.ravel());
plt.title('Outputs')

Text(0.5, 1.0, 'Outputs')

Plotting probability contours

model_prob = Model(layers[-2:])

# Adjust the x and y ranges according to the above generated plot.
x_range = np.linspace(-4, 1, 100) 
y_range = np.linspace(-6, 6, 100) 
x_grid, y_grid = np.meshgrid(x_range, y_range) # x_grid and y_grig are of size 100 X 100

# converting x_grid and y_grid to continuous arrays
x_grid_flat = np.ravel(x_grid)
y_grid_flat = np.ravel(y_grid)

# The last layer of the current model takes two columns as input. Hence transpose of np.vstack() is required.
X = np.vstack((x_grid_flat, y_grid_flat)).T

# x_grid and y_grid are of size 100 x 100
probability_contour = model_prob(X).reshape(100,100)

plt.figure(figsize=(10,9))
plt.scatter(plot_testing[:,0], plot_testing[:,1], alpha = 0.1, c = y_test.ravel())
contours = plt.contour(x_grid,y_grid,probability_contour)
plt.title('Probability Contours')
plt.clabel(contours, inline = True );