a. Here if k=1, we choose the nearest neighbor and classify the y as 1 or -1. There are some advantages or disadvantages to it.
Advantages:
1. The solution is unique. There is no voting or any ensemble involved and hence the method runs relatively fast.
Disadvantages:
1. Will fail if this is an image classification problem of sorts as it's always better to have an ensemble algorithm since each image can be represented in different ways.(eg. Two different persons can write "2" in different ways. Classifying it's a "2" or not based on k=1 is wrong.
b.
This can be easily extended to k=2,k=3. If all the y s turn out to be same on the nearest neighbors,we will be confident on the outcome on the new data x. If this is not the case, we will do a vote of the y's and choose y accordingly.
c. This algorithm will still work for k=1 as the nearest single neighbor will have an unique element( A/B/C). So we shall get an output. However we will not be too confident about the output as we are choosing only the nearest data point. Same problem arises for k=2/3. If the nearest two neighbors have y as A and B, we won't be confident on the new data set x. The ideal scenario is to choose a fairly large k >>3 say k=10 and then take a vote. However if all the 3/4 nearest neighbor give the same output, we need not go that far and choose y beforehand only.
This question investigates KNN for classification. Suppose that (Pvyi) (rnth) s a training set wi...
Classification in Python: Classification In this assignment, you will practice using the kNN (k-Nearest Neighbors) algorithm to solve a classification problem. The kNN is a simple and robust classifier, which is used in different applications. The goal is to train kNN algorithm to distinguish the species from one another. The dataset can be downloaded from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/ (Links to an external site.)Links to an external site.. Download `iris.data` file from the Data Folder. The Data Set description...