Python | Anish's Developer Blog

Welcome back! This marks the first deep learning blog post on this developer blog!

Today I’ll teach you step by step how to create this site. There are multiple steps we need to go through to create this classifier:

Pre-process the MNIST dataset.
Build and train the classifier model.
Converting our model to Tensorflow.js format.
Create an area for users to write numbers.
Load the model into the browser.
Run predictions off of user input.

The project source code can be found here.

Before we start let’s install some requirements for the backend. Run this in your terminal to install all the required libraries

$ pip install tensorflow tensorflowjs scikit-learn

Pre-processing the MNIST dataset

The Modified National Institute of Standards and Technology database (MNIST database) has the largest database of handwritten digits that is essentially the “Hello World” of machine learning. Running a Convolutional Neural Network (CNN) off of this dataset is simple and can yield great results.

100 images from the MNIST dataset

Let’s explore the dataset. The dataset has 60,000 images of size 28×28 pixels. They are in black and white, so they only have 1 extra dimension. The dimensions of our input are thus (28, 28, 1).

If you’re curious about how neural networks work then check this amazing video by 3Blue1Brown:

Thankfully this dataset is by default packaged with Tensorflow, so we don’t need to install it from anywhere else. Let’s start writing our model.py file.

import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder  # For later

Next we’ll load the MNIST dataset from Tensorflow.

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

Here, we are utilizing Python’s method of unpacking variables. By default when we call tf.keras.datasets.mnist.load_data() we will be returned 2 tuples of 2 lists of arrays. For ease, we are unpacking all of this in one statement. We are returned NumPy arrays that represent the images of handwritten digits

If we were to print the shape of X_train we would get returned (60000, 28, 28). We want to have the inputs to be of shape (60000, 28, 28, 1) for the CNN later.

input_shape = [28, 28, 1]

X_train = tf.reshape(X_train, [X_train.shape[0]] + input_shape)
X_test = tf.reshape(X_test, [X_test.shape[0]] + input_shape)

X_train = tf.cast(X_train, dtype=tf.float32)
X_test = tf.cast(X_test, dtype=tf.float32)

X_train /= 255
X_test /= 255

This block converts our shape into the (60000, 28, 28, 1) we are looking for by utilizing tf.reshape().

Line 1: We start off by defining what our target input shape is. We will save this in a list rather than a tuple so that we can use the variable in lines 3 and 4.

Lines 3-4: In lines 3 & 4, we reshape our X_train and X_test NumPy arrays. We use tf.reshape(array, shape) to reshape our array. For flexibility we won’t hardcode 60,000 into our input_shape because this value can change. Instead, we will use the first value of the shape of X_train and X_test respectively. By adding the 2 arrays together, the reshape function will reshape the old array to our desired (60000, 28, 28, 1). This returns a Tensor object, which is computationally different from a NumPy array.

Lines 6-7: We want to ensure that our values in the arrays aren’t whole numbers because our model will have a hard time to learn it. To remedy this we cast the reshaped X_train tensor to be of type tf.float32 which is just a 32-bit float number.

Lines 9-10: We will normalize the tensor so that the values will be between 0 (black pixel) or 1 (white pixel) and all values in between.

With this, we’ve prepared our inputs. Now to prepare our outputs!

y_train = tf.reshape(y_train, [-1, 1])
y_test = tf.reshape(y_test, [-1, 1])

encoder = OneHotEncoder(sparse=False)

y_train = tf.convert_to_tensor(encoder.fit_transform(y_train))
y_test = tf.convert_to_tensor(encoder.fit_transform(y_test))

Lines 1-2: We prepare the label arrays to be One Hot Encoded.

Line 4: We initialize the OneHotEncoder provided to us by sklearn.preprocessing and assign it to a local variable, encoder.

Line 6-7: We create a tensor off of one hot encoding done by our encoder.

With this done, we are ready to create the model and train it!

Build and Train the Classifier Model

We’ve prepared our data, now let us create a model that can actually use the data efficiently. With the model architecture I’m providing I get a 98.84% validation accuracy after training for 10 epochs with a batch size of 32. If you want to use your own model here, that is completely fine, but make sure the shape is (28, 28, 1).

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(32, (3,3), activation="relu", kernel_initializer="he_uniform", input_shape=input_shape))
model.add(tf.keras.layers.MaxPooling2D((2,2)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(125, activation="relu", kernel_initializer="he_uniform"))
model.add(tf.keras.layers.Dense(10, activation="softmax"))

Line 1: We use a Sequential() model from Tensorflow.

Line 2: Our input layer is a Conv2D layer that has 32 neurons, a convolution filter of shape (3, 3), using a relu activation function and he_uniform to initialize the neurons. Since this is the input layer, we pass the argument input_shape=input_shape, since we have input_shape defined above to be [28, 28, 1].

Line 3-5: We build the hidden layers of the model ending with a Dense layer with 125 neurons and using the same functions as the input layer.

Line 6: We add the output layer which is a Dense layer with 10 neurons (representing our 10 outputs, 0-9) and using softmax activation.

Now that we built the model, let’s compile it!

opt = tf.keras.optimizers.SGD(lr=0.01, momentum=0.9)
model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=["accuracy"])

Line 1: We create a Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.01 and momentum of 0.9 as our optimizer of choice. I don’t use the “ol’ reliable” Adam() optimizer because the Adam optimizer has a chance to not converge which shouldn’t be an issue in this dataset but I would still use SGD. The choice is up to you, the reader, and if you’re still interested in this, you can check out this article.

Line 2: We compile our model with our SGD optimizer, and record loss with categorical_crossentropy. Since we have all classes represented in this dataset (meaning there are roughly equal images for all numbers in this dataset)

Now with a compiled model, here comes the fun part… training the model! Don’t worry, this model doesn’t take too much time to train. On my 2018 Mac Mini (6-core Intel i5) it takes 100 seconds, or a little over a minute.

h = model.fit(x=X_train, y=y_train, epochs=10, validation_data=(X_test, y_test), batch_size=32)
model.save("model.h5")

Line 1: We combine everything we’ve done above to train the model with 10 epochs and a batch size of 32.

Line 2: We save the model we trained as model.h5

This marks the end of needing to use Python for our project. Yes we could have done this in Tensorflow.js, but I’m more familiar with Python and I want to leverage the converter that Tensorflow.js provides

Converting our model to Tensorflow.js format

We got a model saved as model.h5, but Tensorflow.js won’t be able to leverage this, so we will convert our model.h5 to Tensorflow.js format. To do this, assuming you installed tensorflowjs like instructed in the start of this article, you can run this command in the terminal:

$ tensorflowjs_converter --input_format=keras /path/to/model.h5 /path/to/destination

Since we used Keras to create our model, we have to specify this to Tensorflow.js when converting.

After doing this we will be able to work on the front end.

Creating a writeable area for users

Let’s take a break from all this machine learning tasks and focus on creating a frontend environment for our users.

First we start in index.html,

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Interactive MNIST</title>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.0.0/dist/tf.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/1.0.0/p5.min.js" integrity="sha512-At/xiUTqCg8jxnCMRNDhDVDm4qxlyPt1K+GrDhUvRvR8MjlBq0RH65OVVaCXn3OuvyVWK8CdlQpxgrc/5YspHw==" crossorigin="anonymous"></script>
</head>
<body>
    <button id="guess">Guess</button>
    <button id="clear">Clear</button>
    <p id="out"></p>
    <script type="text/javascript" src="index.js"></script>
</body>
</html>

We import the tensorflow.js and p5.js JavaScript files in the head of our document, then create 2 buttons, one to send off to the model to guess what the user drew, and one to clear the drawing area. After that we add an empty tag that will host our output. Lastly, we import the script that we are going to create.

Speaking of which, let’s work on index.js,

function setup(){
    let canvas = createCanvas(280, 280);
    background(0);
}

function draw(){
    strokeWeight(24);
    stroke(255);
    if (mouseIsPressed){
        line(pmouseX, pmouseY, mouseX, mouseY);
    }
}

If you’re familiar with p5.js you’ll understand the importance of the function names having to be setup and draw, as if they were any different, they wouldn’t be called. Now let’s break down what’s happening in each function.

setup(): We create a p5.js canvas object of size 280 by 280 pixels. We then set the background color of the box to be black (to match the dataset, we will be drawing in white).

draw(): This is what allows us to draw. We set the strokeWeight which is essentially the thickness of what we draw. For the photos in this tutorial I used a weight strokeWeight of 8, but when I revisited it, I founded that I need to change the strokeWeight to 24. If the stroke isn’t too thick, then during resizing the image, the input will lose details. Then we set the color of the stroke, and as said before we set it to be white. Then we add the condition that if the mouse is pressed, then draw a line at the mouse coordinates.

Now this is all you need in the file to have the drawing capability, and you might be confused because a lot of variables and functions aren’t defined in my index.html file, but thats all taken care of by p5.js. If your IDE shows an error, keep on writing and check if it is a real error in the Developer Tools Console of your browser.

At this stage, your site will look something along the lines like this:

And you can write on it if you click and drag within the black box.

Load the model into the browser

Now for this step you must understand the consequences of using Tensorflow.js to load a model. It must be hosted somewhere. It can’t locally use the files on your computer. The options to load your model are defined in the docs. For my file hosting, I used GitHub Pages, and I suggest you do as well.

For this article you can load the model however you see fit, but if you want to host it on GitHub Pages, you can take a look at my repository to see how I did it (while you’re there be sure to star the repo and follow me :D). Be sure to also have at least 1 commit where your model (in .json format is on the repo)

Now with that out of the way, let us get our model that we worked so hard on into the browser. Editing our index.js file, we will add this to the start of the file:

let model = null;

let loadNeuralNet = async () => {
	model = await tf.loadLayersModel('type://your/storage');
}
loadNeuralNet().then(r => console.log("Model Loaded."));

// setup() and draw() ...

We start off by making model a global variable so that we don’t need to pass it as arguments. We need to use an asynchronous function loadNeuralNet() because tf.loadLayersModel returns a promise, so we need to get the actual model loaded. After creating the function, we can simply call it and then use .then to provide us confirmation that we loaded the model.

Congratulations our model is loaded in the website, and if you open your console you should see the Model Loaded. message. Let’s get on to using the model now.

Run predictions off user input

Let’s start off by making the buttons added a while back actually useful. We will first made the Clear button have some use.

//...

let out = document.getElementById("out");

document.getElementById("clear").addEventListener("click", () =>{
    background(0);
    out.innerHTML = "";
})

We start by getting our output area which is another global variable that we will use later. Next we add an event listener to the Clear button. When the button is clicked we are essentially setting the background of our p5.js canvas to be black (but in reality clears the canvas) and clear any output from previous runs. Now let’s address the elephant in the room. The Guess button:

//...

document.getElementById("guess").addEventListener("click",() =>{
    const input_length = 28 * 28;
    let inputs = [];
    let img = get();
    img.resize(28, 28);
    img.loadPixels();
    for (let i = 0; i < input_length; i++){
        let bright = img.pixels[i * 4];
        inputs[i] = bright/255.0;
    }
    let predictions = predict(inputs);
    console.log(predictions);
    out.innerHTML = "The computer thinks you've drawn a " + predictions[0];
});

This is significantly larger, but still understandable.

Line 3: We add another event listener similar to the Clear button, but we have a different function to run when clicked.

Line 4: We create a constant that stores the value of the product of our input_shape that we used when creating the model.

Line 5: Initialize an empty array that we will use to predict on.

Line 6-8: p5.js provides a get() function that essentially grabs the image created at the instant the user clicks Guess. After that, we resize the image from our previous 280 x 280 pixels to 28 x 28 pixels (sounds familiar doesn’t it ;)). After that we use the function loadPixels() to get all the pixel values of the image the user draws. All of these functions are done in place, so we don’t need to set img = img.resize(28, 28); and such.

Line 9-12: We create a for loop to add info to our inputs array. We then find the brightness of a pixel. Since p5.js stores its pixels in RGBA format, we need to access that A which stores the info that we care about, so we index it at i * 4. Then we add the pixel to the inputs array after normalizing it to be between 0 and 1 and all decimals in between.

Line 12-15: We then find the predictions by using predict() which is a function that will be introduced next. After that we log the prediction into the console and then change our out area to output our predictions. We use predictions[0] for reasons that will be clear when we introduced the predict() function.

Now let’s write the predict() function we’ve heard so much about.

//...

let predict = (inputs) => {
    let tensor = tf.tensor(inputs, [28, 28, 1], "float32");
    tensor = tf.expandDims(tensor, 0);
    let outputs = model.predict(tensor).dataSync();
    let predictions = Array.from(outputs).map(n => parseFloat(n.toPrecision(5)))
    return [max(predictions), predictions];
}

Line 3: Declaring our predict() function and requiring an argument (which will be the inputs from our previous code)

Line 4: We create a tensor using the tensorflow.js library since that is what is required to use for predictions. We give tf.tensor() the array we want to be a tensor, then the input shape (for our model), as well as the data type (which is good ol’ 32-bit float values)

Line 5: We expand the dimension of our tensor because the model will expect a tensor of shape (1, 28, 28, 1), and since we don’t want to screw up the data we can’t create a tensor of that shape like in line 2. But we can use expandDims() to create that extra dimension we need.

Line 6: This is the best part! We finally predict with our model! After running model.predict(tensor) we need to call .dataSync(); to get the data in a format we can use.

Line 7: Now we create an array that will contain 10 values (which are the probabilities of each class). Since it’s an array we can use it’s built in functions

Line 8: Now we return an array with 2 elements, the final prediction and the predictions array entirely. I’ve done it this way because I want to incorporate a bar graph in the future showing the relative probabilities of each class, but this is optional to you.

Final Thoughts

You’re done! You’ve created a way to interact with machine learning that can work on desktop, phone, or tablet! Now all that’s left is to clean it up and make it look presentable. This is all up to your own taste however, so I’ll leave it to you. If you want inspiration you can check out my code. There are a few things you need to keep in mind when making a pretty frontend however:

The id of both buttons are important, so if you change it in index.html you need to change it in index.js as well in the event listeners.
If you want to change the size of the writing window, ensure that it can be resized to 28 x 28.
Be careful if you change the colors of the stroke or the background, that may lead to problems with prediction later on.
Be creative and unique!

NOTE: The model will tend to be inaccurate. This is not because of training, rather it is likely because of the input. When checking the 28 x 28 images, I have noticed many imperfections that the model has to deal with. Do try and change model architecture and how you train it if you are unsatisfied. Just remember the input shape has to be (28, 28, 1) if you decide to use this frontend code.

If there are any questions leave them in the comments and I’ll try my best to answer them.

UPDATE: Upon revisiting the site after witnessing the model to be inaccurate, I realized that the model wasn’t the issue, rather it was the inputs that we give the model. Let’s compare 2 images. One is the input for the model while the other is what we give the model (assuming strokeWeight(8); in our draw() function).

Look closely between the two images. Our training data trains the model using a thicker brush. Our input however is like drawing with a pencil. This small difference may not seem like much to us as humans, but remember a computer runs on numbers. Each and every shade of white on each of the 784 pixels are important in order for the model to perform well. To remedy this I tried upping strokeWeight and thankfully when I used strokeWeight(24); instead of the old strokeWeight(8); I got much more accurate responses.

With this small adjustment I’ve noticed the model behaves a lot better towards user input. Though it is not that close to our training data, we can simply up the strokeWeight, and I’ve noticed that using 32 behaves better than simply 24.

Anish's Developer Blog