Question

Please write full justification for (a) and (b). Will uprate/vote!

4. K-means The goal of K-means clustering is to divide a set of n points into k< n subgroups of points that are close to ea

0 0
Add a comment Improve this question Transcribed image text
Answer #1

a) True. We have k clusters. So each of the n points can belong to any one of the k clusters, thus having k options per item. So , total number of possible clusters for 2 items = k for the first item multiplied by k for the second item=k^2. Likewise for n items, we have k^n ways of assigning colors to each point.

b) Naively searching through all possible ways of coloring a set of n points will take k^n, in this case 5^100 units of time = 25^50 units of time = 25^48 sec. This is greater than 10^17 sec. The naive approach hence will fail in this case. We should remember that only 1 of the 5^100 arrangements is the correct answer.

K means works as follows.

Input : number of clusters we want (k) and number of data-points we have(n)

  • initially all points are randomly scattered and unassigned. w
  • we randomly assign m points with m distinct colors ( initial centers)
  • for all remaining points:
  • we calculate distance from the k centers and assign the point the color

of the center closest to it.

  • Then we update the value of the centers of every cluster with the average coordinate value and update the center to that point closest to the average center.
  • repeat entire process till no point updates its color from the previous setting

TIME COMPLEXITY: O(t*k*n*d) where t is number of iterations, k=number of clusters, n=number of points and d=number of dimensions each point has ( for 2D grid coordinates, its value is 2 ). Thus this approach is way better than the naive approach.

Add a comment
Know the answer?
Add Answer to:
Please write full justification for (a) and (b). Will uprate/vote! 4. K-means The goal of K-means clustering is to divide a set of n points into k< n subgroups of points that are "close" t...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...

    K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of the process made it popular to data analysts. The task is to form clusters of similar data objects (points, properties etc.). When the dataset given is unlabeled, we try to make some conclusion about the data by forming clusters. Now, the number of clusters can be pre-determined and number of points can have any range. The main idea behind the process is finding nearest...

  • K-means clustering Problem 1. (10 pts) Suppose that we have the gene expression values for 5...

    K-means clustering Problem 1. (10 pts) Suppose that we have the gene expression values for 5 genes (G1 to G5) under 4 time points (t1 to t4) as shown in the following table. Please use K-Means clustering to group 5 genes into 2 clusters based on Euclidean distance. Find out the final centroids and their affiliated genes. The initial centroids are c1=(1,2,3,4) and c2=c(9,8,7,6). Please write down your algorithm step by step. Result without steps won't get points. t1 t2...

  • Data clustering and the k means algorithm. However, I'm not able to list all of the...

    Data clustering and the k means algorithm. However, I'm not able to list all of the data sets but they include: ecoli.txt, glass.txt, ionoshpere.txt, iris_bezdek.txt, landsat.txt, letter_recognition.txt, segmentation.txt vehicle.txt, wine.txt and yeast.txt. Input: Your program should be non-interactive (that is, the program should not interact with the user by asking him/her explicit questions) and take the following command-line arguments: <F<K><I><T> <R>, where F: name of the data file K: number of clusters (positive integer greater than one) I: maximum number...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT