A must-read tutorial when you are starting your journey with Deep Learning
Are you taking your first steps in deep learning and do not know where to start? Log out from Facebook, Instagram or any other social networking site. Turn on your brain. Keep reading. We will show you how to recognize images in 4 simple steps.
But first things first. Let’s start with an introduction.
This is the number eight. Handwritten. It was presented in a very low resolution of 28x28 pixels. But your brain has no trouble recognizing it as an eight. How crazy is that, right? If I asked you to write a program that reflects a function of the human brain, then this is a bit more challenging. The solution to this problem is deep learning. Deep learning, as a new area of machine learning research, is a process which allows the computer to learn to perform tasks which are natural for the brain like digit recognition. Whilst all deep learning algorithms are machine learning algorithms, it does not work the other way around. A little bit messy? Let’s see what it looks like on a Venn diagram:
Deep learning focuses on a specific category of machine learning called a neural network which is inspired by functionality of the human brain. Neurons receive signals from other neurons and after some transformation they pass them on. At the end of the journey an output signal is produced which (hopefully) represents the correct prediction. But do computers have neurons or any other biological structure? Not really. To model a neural network we use graph theory. The network turns out to be a series of layers of connected nodes. Every node in a layer has an edge to every node in the next layer and every edge is given a different weight. A neural network consists of three basic types of layers: input, hidden and output. And a whole collection of layers can be considered a “brain”.
Layers? What are they?
Input layer is composed of input neurons and brings initial data to the hidden layers for further processing.
Hidden layer is a layer between input and output layers since the output of one layer is the input of another one. This is the place where all computations are performed. How many hidden layers should a neural network contain? It depends of the scale of the problem. Usually you can use one or two layers for simple tasks, but nowadays research into deep neural network architectures show that many hidden layers can be fruitful for difficult object, handwritten character, and face recognition problems.
Output layer is the last layer in neural network which produces the outputs of the program.
Step 1. Don’t panic! Just get to know the dataset
This article will tell you how to build a simple neural network (complex of graphs) in Python using the Keras library. The goal is to take an input image (grayscale, 28x28 pixels) of a handwritten single digit (0–9) and classify it as an appropriate digit. The dataset consists of 60,000 training and 10,000 testing sets.
In the presented code we’ll set some constant values:
- batch_size is the size of the package, how many pictures we are analysing at once
- epochs define how many times an entire dataset is passed forward and backward through the neural network
Start having fun with it! Upload your data … it’s quite simple, really
Okay, let’s start with preparing your data, the most important step during data analysis. Fortunately, the hardest part has been already done thanks to the Keras library and National Institute of Standards and Technology. MNIST data is well-formatted and easy to load into our system separately as training and test data:
from keras.datasets import mnist (x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test variables contain grayscale RGB codes
y_train, y_test variables contain labels from 0 to 9 which represent which number digits
the images actually are.
It is always good to get familiar with the processed data, so let’s see what a small part of MNIST looks like. To visualise these number we can use the matplotlib library:
import matplotlib.pyplot as plt figure = plt.figure() num_of_images = 100 image = randint(0, 55000) for index in range(1, num_of_images + 1): plt.subplot(10, 10, index) plt.axis('off') plt.imshow(x_train[image], cmap=plt.get_cmap('gray_r'))
A little bit of fun with the data?
We need to know the shape of the dataset. To obtain this we can use the shape attribute of numpy array:
x_train.shape
As you can see, the result is (60000, 28, 28). As you can guess 60,000 represents the number of images in the training dataset and (28, 28) the size of the image (in pixels). Recall that this article is focused on fully connected neural networks. That means that our data must be a vector, so instead of several 28x28 images we want to have several vectors long on 784 neurons (28*28). Keras allows us to simply flatten the image into a vector with the following code:
vector = 784 x_train = x_train.reshape(60000, vector) x_test = x_test.reshape(10000, vector)
The last thing that we should handle are the labels. For this dataset they are numerical values from 0 to 9, but we don’t want them to be treated as integers but a set of items. For us none of these labels are smaller or greater than any other. They are just correctly or incorrectly predicted items. To resolve this issue, we’ll create a vector as long as the number of labels we have and set the value to 1 on the position when the specific label should be, while the rest is 0.
For example: label 5 will be transformed to [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.] Let’s see the code:
num_classes = 10 y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes)
Step 2. Build the model
To build the model we’ll use the high level Keras API which uses TensorFlow on the backend. It is worth mentioning that Keras is the most straightforward API to create a neural network. Let’s build our model:
from keras.models import Sequential model = Sequential() model.add(Dense(784, activation=’relu’, input_shape=(vector,))) model.add(Dense(32, activation=’relu’)) model.add(Dense(10, activation=’softmax’)) model.summary() model.compile(loss=’categorical_crossentropy’, optimizer=’sgd’, metrics=[‘accuracy’]) # compile model with training parameters
Having a defined model, we need to use the evaluation function of it. Most often this is called the loss function. It is a function which enables us to know if the model is good or bad. Using this function we can optimise the model using an optimisation algorithm. In other words, the loss function informs us about the number of errors made by our model, if the value is high, it means that our model needs to be trained some more. And the optimisation algorithm controls how the weights of the graph are adjusted during training. In this article I’ll use the most common loss function called categorical cross entropy and the simplest optimisation algorithm called Stochastic Gradient Descent (SGD).
Defined layers:
- input layer as a vector with 784 entries (which correspond to the 784 nodes in this layer)
- output layer as a vector with 10 labels (which correspond to the 10 nodes in this layer). This layer uses an activation function called softmax which normalises the values from the output layer such that all the values are between 0 and 1 and the sum of all ten values gives 1. Thanks to that those ten values are treated as probabilities and the largest one is selected as the final prediction.
- two hidden layers with 784 and 32 nodes using the relu activation function. The resulting model has 640,890 trainable parameters. Where did this number come from? Let’s calculate it. Note that there are 784 connections to each neuron from hidden layer, each combination having an associated weight (real number). By multiplying 784 values from the input by 784 weights, connected to the selected node from hidden layer, and then adding them together, we get one number. This value, in the node, is passed through a non-linear filter (activation function) so that at the end we receive a number informing us how much the input resembles a digit from the selected position. We need to do a similar thing with nodes from the first to the second hidden layer and from the second to the output layer. Keras has a nice method to help you calculate the parameters called summary()
Step 3. Train and evaluate the model
In the last step we need to train and evaluate the model.
Train?
In other words feed it to the algorithm. When training a neural network, training data is put into the first layer of the network, and individual neurons assign a weighting to the input (how correct or incorrect it is). And if the algorithm informs the neural network that it was wrong, it doesn’t get informed what the right answer is. The error is propagated back through the network’s layers and it has to guess at something else. And again. And again. Until it results in the correct prediction.
Evaluate?
In our case we use the validation dataset to estimate how good or bad your model is. We repeat training using the dataset for a predetermined number of epochs.
Check out how you can handle this in Keras:
batch_size = 500 epochs = 20 history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=.1) test_loss, test_acc = model.evaluate(x_test, y_test)
To visualise the accuracy of the created model you can use the following code:
plt.plot(history.history[‘acc’]) plt.plot(history.history[‘val_acc’]) plt.title(‘model accuracy’) plt.ylabel(‘accuracy’) plt.xlabel(‘epoch’) plt.legend([‘training’, ‘validation’], loc=‘best’) plt.show()
As you can see the obtained accuracy is around 95–98% which is an incredible result! Almost ten times better than guessing randomly. Great! Here you can see the expected and received values:
def display_output(num, x_train, y_train): x_train_ex = x_train[num, :].reshape(1, 784) # one training example image y_train_ex = y_train[num, :] label = y_train_ex.argmax() # get labels as integer prediction = int(model.predict_classes(x_train_ex)) # get prediction as integer plt.title(f’Predicted: {prediction} Label: {label}’) plt.imshow(x_train_ex.reshape([28, 28]), cmap=plt.get_cmap(‘gray_r’)) figure = plt.figure() for x in range(1, 10): plt.subplot(3, 3, x) plt.axis(‘off’) display_output(randint(0, 55000), x_train, y_train) plt.show()
Step 4. Be proud of yourself and enjoy your work!
If you would like to challenge yourself to create a neural network, we encourage you to recreate the experiment presented in this article using another dataset built with the Keras library like CIFAR10 small image classification or Fashion-MNIST database of fashion articles.
If you have any comments or questions, feel free to contact us! On our Github repository you can find a whole code from the story.
Fingers crossed.