Machine Learning project documentation post #1

Finally started work on the Machine Learning project that had been stalled for the whole of June.
The initial working is the same as edMultiRouter.

But since this protocol uses an implementation of K Means ML algorithm, we will need to extract much more features than just Encounters and Distance.

A zero threshold has been set to begin the transfer of messages. Which is set to 75% by default, but can be changed from the settings file itself too. We added this feature while we were working on edMultiRouter itself. The significance of this threshold value is that it allows the encounter table to be sufficiently filled before the transfer of messages can be started. A zerothreshold set at 0.25 signifies that until and unless the no of zeroes in the encounter table becomes less than 25%, that is the table is more than 75% filled, the checkStart function doesn’t initialise start to 1, and thus the transfer doesn’t start.

Kunal’s suggestion: The time required for zero threshold should be the actual Report.warmup time. Right now our Report.warmup time has been randomly set to 1000s. We have no way of confirming that our zero threshold has been crossed in that time, meaning that our encounter matrix has been sufficiently filled up.

 void checkStart()  {

        int countZeroes=0;

        int i,j;

        for(i=0;i<nodeCount;i++) {
            for(j=0;j<nodeCount;j++) {
                if(encounters[i][j]==0){
                    countZeroes++;
                }
            }
        }
        if(countZeroes < maxPossibleZeroes)
        {
            start=1; //set start to 1 if the encounter matrix satisfies threshold for no of zero values
        }

        return;

    }   //end of checkStart

tryOtherMessages acts like the driver function of the protocol. It is called from the update method in the code when start equals 1.

if(start==1){

tryOtherMessages(); }

We need to create a feature vector to give to the K means function of shogun to generate the classification of nodes of the network.

We do that by creating a 2D matrix called featureMatrix for every message (source-destination pair). The first column representing the encounters, and the second column being distance from the destination(for now), of the various neighbours.
Then we convert the java 2D Matrix into a jblas DoubleMatrix. jblas is basically a java library which is used to perform fast calculations on matrices. The shogun K Means classifier must generate clusters by performing some operations on the feature matrix. That’s why the feature matrix is to needed to be converted into a matrix compatible with jblas.

Next, we load the shogun library and initialize shogun with default values, which must be shogun’s own standard set of values.
Then we choose the no of clusters and convert jblas features into feats_train, which are compatible with shogun.
Then we just train K-Means on our data (feats_train) and get the cluster centres and cluster radius.
One issue we encountere was that, while trying to display cluster centroids we were only getting one centre. We were getting perplexed over this as it was a really straightforward implementation. perplexed over this as it was a really straightforward implementation.
Ultimately, we examined the KMeans.cpp code, implemented in C++, learnt that it takes the transpose of the feature matrix before training it. Hence, we tried to transpose the feature matrix before giving it to the Realfeatures function as done in the KMeans.cpp code, and to our pleasant surprise it worked!
kMeansRouter is moving ahead now.

After getting the cluster centres, we find the most optimal cluster of neighbours (the ones that we will forward the copies of the message to), based on their intercept on the distance axis. Only if the distances are equal, we check for which cluster centre value has greater no of encounters out of the two.

int optimal_cluster;

//find out the positive cluster
if(cluster_centers.get(0,1)<cluster_centers.get(1,1))
                {
                    //distance of cluster1<distance of cluster 2
                    optimal_cluster=0;
                }
                else if(cluster_centers.get(0,1)>cluster_centers.get(1,1))
                {
                    //distance of cluster1>distance of cluster 2
                    optimal_cluster=1;
                }
                else
                {
                    //distance of cluster1==distance of cluster 2
                    //if distances are equal then compare by encounters
                    if(cluster_centers.get(0,0)>cluster_centers.get(1,0))
                    {
                        optimal_cluster=0;
                    }
                    else
                    {
                        optimal_cluster=1;
                    }
                }

Now the things we have to take care of are:-

  1. What to do when the source node only has one single node as the neighbour. Applying clustering in that case is meaningless. We will probably just pass the message to the single neighbour in that case or create some threshold formula. We love thresholds.
  1. We need to normalize the feature matrix as well. Will probably use some inbuilt shogun function for that.

~jigsaw

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s