I made a little progress on the Neural Networks project. After reading a bit about

Convolutional Neural Networks, I read a research paper on Face Detection by Stanford University, and made a short summary of it for my understanding.

After that, I set out to do the real technical part. I first read in detail about Theano, which is a Python Library used for processing expressions involving multi dimensional arrays and Lasagne, which is a Machine Learning library used for building and training Neural networks in Theano, and it’s documentation.

I installed and checked all the pre-requisites for Lasagne and Theano, like scipy, numpy, scikit-learn, matplotlib, etc. Thereafter, I installed Theano’s latest version and Lasagne from Github, as it is waiting for its first official release.

One problem I am running into is that, if I had a CUDA capable GPU, I could’ve configured it for Theano, and that would’ve made my computations even faster. Only IF. Sadly, I have a Radeon graphic card in my machine, and CUDA is enabled only by Nvidia graphic cards.

After I was done installing all the depencies, I set out analysing the dataset. The dataset consists of 7000 grayscale images in the form of pixel values, with attributes associated with each image. However, the dataset wasn’t perfect,(as if it really is) it had values missing for some of the features corresponding to the images. To resolve that, for my first prototype I have thought of considering only the data fields which have all the features for an image. Later on, I plan to replace the missing fields with the medians of the features. So, after reading the data from the csv file (using the pandas library, which will prove to be extremely helpful in all data generation related tasks), I started segregating the data into the pixel values (X), and the target co-ordinates (y). Also, the pixel values and the target co-ordinates had to be properly scaled for my algorithm to work on it. Since pixel values like b/w 0 to 255, i simply divided the pixel values by 255 to scale them between [0,1], and the target variables (the features), all lie between 0 to 96, and i scaled them to lie between [-1,-1] by doing y = (y-48)/48.

I put both my X and y into numpy arrays and casted them to float32 type. Finally, the I shuffled X and y randomly, (in a corresponding manner, obviously) to generate my final data set.

Next, i’ll run a simuation on this data set by creating one hidden layer (initially) and training my neural network on the data generated.

To build my first neural network, i had to start making use for lasagne now. Since I don’;t have a CUDA capable GPU, the training of my neural network was a bit simpler.

layers module and nesterov_momentum were imported from lasagne, and NeuralNet was imported from nolearn.lasagne. Nesterov_momentum is our gradient descent optimization method I am using. It works well for a large number of problems.

After specifying the layers and the layer types. we list the layer parameters like the shape of the layers, the number of units/neurons, the type of activation function of the layer, etc. The specifying of these parameters gives us great flexibility and compatibility with other modules and programes. The non-linearity of the layer defines the kind of activation function used. Default activation function used is a rectifier; by specifying ‘None’, we get activation values linearly dependent on the previous hidden layers. Validation set is automatoically chosen as 20% (or 0.2) of the training samples for validation. Thus by default, eval_size=0.2 which user can change. I decided to run the training of the neural network for about 400 epochs. I also have to specify that I am performing a regression task, by setting a flag as True. Now, I just pass the vector of target values and pixel values to fit the neural network (train).

When I train my data using this neural network I have built, I see the step by step reduction of the training error and the error on the validation set. After 400 epochs, I get the training loss as 0.00226 and validation loss as 0.00306, which is quite good.

Till now I have made a simple, regular neural network, with some default parameters. I will start venturing into convolutional neural networks in the coming days. I shall get into experimentations after I am done with the tutorial implementation, but I am collection ideas to improve the accuracy of my network, side by side.

I shall analyse the results obtained int the next blog post, and brainstorm methods to make this algorithm better.

~jigsaw