After my previous work on the Facial Keypoint Detectio problem, I made a submission to Kaggle with some changes.

Please do note that the whole data had not been made use of in the earlier implementation. I have dropped the fields having missing values for any of the features and I replaced the missing values with the median of the feature(column) they belong to, making use of the whole dataset.

With 400 epochs, this gave me a rank of 25 on the Kaggle leaderboard.

Although this is largely the approach followed by Daniel Nouri only, I have though of some improve and experiments to do on this model before moving on to implementing Convolutional Neural Networks.

1. We can use PCA (Principle Component Analysis) to reduce the dimensionality of the problem, reduce overfitting and to try and improve the accuracy.

2. I also want to implement histogram stretching for pre-processing the data. This can be used to improve image contrast. Basically refers to stretching the range of pixel intensity values for each image. I don’t think we would need normalization after this.

3. Currently, my implementation of Neural Nets in Lasagne uses hold-out cross validation. I want to replace it with K-fold cross validation. K-fold cross validation will make the neural network learn better, although it will take significantly more computational time.

My code for neural net in lasagne :-

net1 = NeuralNet(

layers = [ #3 layers, including 1 hidden layer

(‘input’, layers.InputLayer),

(‘hidden’, layers.DenseLayer),

(‘output’, layers.DenseLayer),

],

#layer parameters for every layer

#refer to the layer using it’s prefix

#96X96 input pixels per batch

#None – gives us variable batch sizes

input_shape = (None, 9216),

#number of hidden units in a layer

hidden_num_units = 100,

#Output layer uses identity function

#Using this, the output unit’s activation becomes a linear combination of activations in the hidden layer.

#Sine we haven’t chosen anything for the hidden layer

#the default non linearity is rectifier, which is chosen as the activation function of the hidden layer.

output_nonlinearity = None,

#30 target values

output_num_units = 30,

#Optimization Method:

#The following parameterize the update function

#The update function updates the weight of our network after each batch has been processed.

update = nesterov_momentum,

update_learning_rate = 0.01, #step size of the gradient descent

update_momentum = 0.9,

#Regression flag set to true as this is a regression

#problem and not a classification problem

regression=True,

max_epochs = 400,

#speci fies that we wish to output information during training

verbose = 1,

#NOTE: Validation set is automatoically chosen as 20% (or 0.2) of the training samples for validation.

#Thus by default, eval_size=0.2 which user can change accordingly

)

4. Also, I am using a rectifier as my linear activation function in this script. I want to try other activation functions like sigmoid function, softmax function and the tanh function. I used a rectifier here because linear activation functions have been found to perform consistently better than other activation functions. Currently there is no clear proof of why that happens, it’s just experimental.

http://neuralnetworksanddeeplearning.com/chap6.html

5. Like Daniel Nouri, I am using Nesterov’s Accelerated Gradient Descent (NAG) right now. I hope to try other techniques like stochastic gradient descent (although it is pretty slow).

When I will apply convolutional nets to this problem, every epoch will take a lot of time on my machine. (Cuurently every epoch is taking around 2-3 seconds). I don’t have a CUDA capableGPU on my own machine, therefore I would need to use cloud servers. So i’ll have to set up theano and lasagne on EC2 now.

I’ll also make a github repo for this project so that everyone can see the code. I make it a habit to use a lot of comments in my code, so that no one has problem understanding it.

~jigsaw