Nearest Neighbour Classifier

In this post, I am explaining a very simple and basic classification algorithm called as the KNN algorithm i.e the K-nearest neighbor algorithm.
Suppose you are trying to predict for a binary classification problem. You would be having a pre-defined train data set. The Nearest Neighbour algorithm is different from all the other classification algorithms. It doesn't involve model building or function formation from the given train dataset as the other classification algorithm do. Instead, it memorizes the training data.
It will become clear with an example.
Suppose I have to classify the sentiment of a given sentence whether the given sentence is positive or negative.

We have our initial data set (or training set) :

Text1 I love this movie.
Category: Positive

Text2: I hate this phone.
Category: Negative.

After removing the stop words(the words which are of no use when it comes to classification ) I am left with :
Text1: loved, movie

Text2: hate, phone

After removing stop words we convert our data set into an M x N Matrix where M is the number of documents (or sentences) and N is the number of words in our dictionary.

In our case, it’ll be a 2 x 4 Matrix. Our matrix would look like :

        |  love  |movie |   hate |  phone
------------|------------|-----------|--------|--------
Text1 | 1  | 1 | 0  | 0
|  | |     |
Text2 | 0  | 0 | 1  | 1

The (i,j) entry signifies the number of occurrences of the j-th word from the dictionary in the i-th sentence.

Now, take each text and they’ll look like coordinates in a 4-dimensional graph.

Coordinates of Text1 are : (1,1,0,0) and coordinates of Text2 are : (0,0,1,1).

This was our training part.
Now comes the testing part. We take a test sentence :
Test Sentence: I love this game.

Now, We’ll convert the test sentence into coordinates using our dictionary. So, we’ll have : (1, 0, 0, 0)

Once, we have the coordinates of Test sentence and training sets we can use any similarity to get the result like Euclidean distance, Cosine Similarity, Manhattan distance, etc.
Let’s say we use Cosine Similarity.
 _  _      _  _
a . b = |a|.|b|.cos(theta)
                             _ _        _    _     
i.e cos(theta) = (a.b) / (|a|.|b|)

Hence, here a and b are our vectors. We compute the cosine between each training set sentence and our test sentence. The one which will have the highest value of similarity will tell us about the category of our test sentence.

In our set, you can clearly see that the angle between text1 and test sentence will be the lowest and thus it will give the highest value of cosine. Hence, it’s category will be our test sentence’s category i.e positive.

So this is the way the nearest neighbor algorithm works. So what is this KNN algorithm all about?
The problem with the nearest neighbor algorithm is that if we choose very few number of data which are closest to our given new test substance then there is a chance that they might be biased. Suppose that you check for 1 nearest neighbor and accept that this prediction is true but there are chances that this might not be the true prediction. Now suppose that we check for two nearest neighbors and find that both th test samples predict two different class for our current sample.

Therefore we can not just look for very few number of test samples also not for a very large number of samples because that might cause overfitting. So, ideally, 5-10 nearest neighbors are checked.

Comments

Popular posts from this blog

Underfitting and Overfitting

Face Recognition using Deep Learning