Question

K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...

K-means clustering
K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of the process made it popular to data analysts. The task is to form clusters of similar data objects (points, properties etc.). When the dataset given is unlabeled, we try to make some conclusion about the data by forming clusters. Now, the number of clusters can be pre-determined and number of points can have any range.
The main idea behind the process is finding nearest cluster mean and assigning points to their nearest clusters. Initially we start by picking some random centroids (mean values of required clusters). Then we assign all points to some cluster. After assigning all the points, we calculate the cluster mean (centroid) and update it. Now, we have got new centroids of our clusters and all the points labelled to some cluster. Then, we have to again iterate over all points and re-assign the clusters considering the newly updated centroids. And, thus it goes on until all the points labelled does not change their clusters or at least a large (predetermined) number of assignment have been done.
The total algorithm can be summarized as:
1. Randomly pick n centroids (n being the number of clusters to be created)
2. At each iteration calculate the distance between each point and the centroids
3. Assign the point to its closest cluster (closest centroid in a cluster)
4. After assigning all the points to some cluster, update the centroids
5. Repeat steps 2-4 until there is no update in labelling or a fixed number of iterations completed
You are given with the kmeansCluster function:
function [] = kmeansCluster()
max_range = 10;
min_range = -10;
num_points = 10000;
vector = ( max_range - min_range ) .* rand( 1, num_points ) + min_range;
points = makePoints( vector );
centroids = initialCentroid( points );
init_labels = false( size( points, 1 ), 1 );
[ labels, c1, c2 ] = makeClusters( points, centroids );
trial = 0;
while ~isequal( labels, init_labels ) && trial < 1000
init_labels = labels;
centroids = [ mean( c1 ); mean( c2 ) ];
[ labels, c1, c2 ] = makeClusters( points, centroids );
trial = trial + 1;
end
hold on
plot( c1( :, 1 ), c1( :, 2 ), 'r.' );
plot( c2( :, 1 ), c2( :, 2 ), 'b.' );
plot( centroids(1), centroids(3), 'ro', 'MarkerSize', 10, 'MarkerFaceColor', 'r' );
plot( centroids(2), centroids(4), 'bo', 'MarkerSize', 10, 'MarkerFaceColor', 'b' );
hold off
end
Now, you have to use several functions in this function, they will have to be defined by you. All the functions
are described below:
• makePoints
o parameter:
? vector - row vector with m points
o returns:
? points - m/2 x 2 matrix of which each row represents a point
o description: A row vector with m points will be passed to the function as parameter, you have
to reshape the vector to a matrix with m/2 rows and 2 columns, then return the newly
modified matrix.
o example:
? >> vector = [ 2 4 5 10 5 9 21 23 10 22 ];
? >> points = makePoints( vector );
points =
2 4
5 10
5 9
21 23
10 22
• initialCentroid
o parameter:
? point - n x 2 matrix with n point coordinates
o returns:
? centroids - 2 x 2 matrix with two random centroids at each row
o description: The function takes in points and generates two random centroids ranging within
range of the input points. For this, you need to find out the maximum and minimum values
for x, y across all the points and generate random integer numbers using that range. The
points input will be floating numbers, so you have to round them to nearest integers to be
used as range for randi function.
o example:
? >> centroids = initialCentroid( points );
centroids =
9 17
15 9
• makeClusters
o parameter:
? points - matrix (n x 2 matrix)
? means - matrix (2 x 2 matrix)
o returns:
? label – n x 1 logical matrix containing 0’s and 1’s as labels for ith point
? cluster1, cluster2 – matrices with points assigned to that cluster, each row represents
points
o description: This function takes each point from points matrix and computes its distance
from the two centroids. If the distance to centroid1 is lesser than the distance to centroid2,
it assigned 0 and concatenated vertically with cluster1. Otherwise, it is assigned as 1 and
concatenated vertically with cluster2. To compute distance between two points, you also
have to define euclidDist function which is described later.
o example:
? >> [ labels, cluster1, cluster2 ] = makeClusters( points,
centroids );
? labels =
5×1 logical array
10000
? cluster1 =
5 10
5 9
21 23
10 22
? cluster2 =
2 4
• euclidDist
o parameter:
? point1, point2 – 1 x 2 matrices containing coordinates of points, x in column 1, y in
column 2
o returns:
? distance – Euclidean distance between the points
o description: This function computes the Euclidean distance between two points using the
age-old formula ???????????????????????????????? = ??(????1 ? ????2)2 + (????1 ? ????2)2
You can do this using Matlab shorthand command:
distance = sqrt( sum( ( point1 - point2 ) .^ 2 ))
o example:
? >> point1 = [ 2, 2 ];
? >> point2 = [ 5, 5 ];
? >> distance = euclidDist( point1, point2 );
? distance =
4.2426
You do not have to worry about other functions used here in the kmeansCluster function. What we did
was initially created a matrix label which has zeros as labels.
init_labels = false( size( points, 1 ), 1 );
Then we called the makeCluster function for the first time outside the while loop to initially label all the
points.
[ labels, c1, c2 ] = makeClusters( points, centroids );
Now, the while loop will run until the labels assigned at each iteration remain same or trials are less than
1000. That is, there will be at most 1000 iterations if points get reassigned every iteration, or otherwise it
would stop when there is no other change to labels.
while ~isequal( labels, init_labels ) && trial < 1000
In the while loop, we are calculating new centroids and trying to reassign each point based on those newly
updated centroids. Also, trial value is incremented.
init_labels = labels;
centroids = [ mean( c1 ); mean( c2 ) ];
[ labels, c1, c2 ] = makeClusters( points, centroids );
trial = trial + 1;
The next lines are for plotting as you can guess obviously. If you complete the functions correctly, you
would be seeing a plot of the points clustered into two clusters, red and blue with centroids in solid circles.
For different values of num_points you will see something like these graphs, though they might not be
exactly same since we used random functions to generate our points.
num_points = 100
num_points = 1000
num_points = 5000 num_points = 10000

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Please find required MATLAB code below:

--------------------------------------------------- makePoints.m

function data=makePoints(vector)
% A row vector with m points will be passed to the function as parameter, you have
% to reshape the vector to a matrix with m/2 rows and 2 columns, then return the newly
% modified matrix.

% example:
% vector = [ 2 4 5 10 5 9 21 23 10 22 ];
% points = makePoints( vector );
% points =
% 2 4
% 5 10
% 5 9
% 21 23
% 10 22

m=length(vector);
data=reshape(vector,[floor(m/2),2]);

--------------------------------------------------------------- initialCentroid.m

function centroids=initialCentroid(point)
% description: The function takes in points and generates two random centroids ranging within
% range of the input points. For this, you need to find out the maximum and minimum values
% for x, y across all the points and generate random integer numbers using that range. The
% points input will be floating numbers, so you have to round them to nearest integers to be
% used as range for randi function.

% example:
% centroids = initialCentroid( points );
% centroids =
% 9 17
% 15 9

x=point(:,1);
y=point(:,2);

randX = randi([round(min(x)) round(max(x))],2,1);
randY = randi([round(min(y)) round(max(y))],2,1);

centroids=[randX randY];

-----------------------------------------------  makeClusters.m

function [label,cluster1,cluster2]=makeClusters(points,means)

% description: This function takes each point from points matrix and computes its distance
% from the two centroids. If the distance to centroid1 is lesser than the distance to centroid2,
% it assigned 0 and concatenated vertically with cluster1. Otherwise, it is assigned as 1 and
% concatenated vertically with cluster2. To compute distance between two points, you also
% have to define euclidDist function which is described later.

% Example:
% >> [ labels, cluster1, cluster2 ] = makeClusters( points,
% centroids );
% labels = 5*1 logical array
% 10000
% cluster1 =
% 5 10
% 5 9
% 21 23
% 10 22
% cluster2 =
% 2 *
cluster1=[];
cluster2=[];

for k=1:size(points,1)
distance(k,1)=euclidDist(points(k,:), means(1,:));
distance(k,2)=euclidDist(points(k,:), means(2,:));
if distance(k,1)< distance(k,2)
label(k,:)=0;
cluster1=[cluster1;points(k,:)];
else
label(k,:)=1;
cluster2=[cluster2;points(k,:)];
end
end

------------------------------------------------  euclidDist.m

function distance=euclidDist(point1, point2)

% This function computes the Euclidean distance between two points

% Example:
% point1 = [ 2, 2 ];
% point2 = [ 5, 5 ];
% distance = euclidDist( point1, point2 )
% distance =
%
% 4.2426
  
distance = sqrt( sum( ( point1 - point2 ) .^ 2 ));

-------------------------------------------------------- SCREENSHOT OF CODES

kmeansCluster.m makepoints.m initialCentroid.mmakeClusters.m euclidDist.m X+ function [] kmeansCluster() max range 10; min range10 num points 10000; vector = ( max range -min range ) . * rand( 1, num points ) + min range points -makePoints( vector) centroids = initialCentroid( points ); initlabels = false( 3?ze ( points, 1 ), 1 ); [ labels, c1, c2 ] = makeClusters ( points, centroids ); trial 0; - 10 ?while -isequal( labels, initlabels ) && trial < 1000 - 12 13 14 - 15 initlabels = labels; centroids = [ mean( c1 ); mean( c2 ) ]; [ labels, c1, c2 ] = makeClu3 ters ( points, centroids ); trial trial + 1; - end hold on plot( c1(, 1, c1, 2),. plot c2,1), c2(, 2,.; plot( centroids (1), centroids (3), ro,MarkerSize, 10, MarkerFaceColor,r plot( centroids (2), centroids (4), bo,MarkerSize, 10, HarkerFaceColor,b hold off en 17 19 20 21- 23

-------------------------------------- SAMPLE OUTPUT

Add a comment
Know the answer?
Add Answer to:
K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • K-means clustering Problem 1. (10 pts) Suppose that we have the gene expression values for 5...

    K-means clustering Problem 1. (10 pts) Suppose that we have the gene expression values for 5 genes (G1 to G5) under 4 time points (t1 to t4) as shown in the following table. Please use K-Means clustering to group 5 genes into 2 clusters based on Euclidean distance. Find out the final centroids and their affiliated genes. The initial centroids are c1=(1,2,3,4) and c2=c(9,8,7,6). Please write down your algorithm step by step. Result without steps won't get points. t1 t2...

  • Please write full justification for (a) and (b). Will uprate/vote! 4. K-means The goal of K-means clustering is to divide a set of n points into k< n subgroups of points that are "close" t...

    Please write full justification for (a) and (b). Will uprate/vote! 4. K-means The goal of K-means clustering is to divide a set of n points into k< n subgroups of points that are "close" to each other. Each subgroup (or cluster) is identified by the center of the cluster, the centroid (μι, μ2' ··· ,14k) In class, we have seen a brute force approach to solve this problem exactly. Each of the k clusters is represented by a color, e.g.,...

  • 1. apply k-means clustering to a dataset Task Consider the following set of two-dimensional records: RID...

    1. apply k-means clustering to a dataset Task Consider the following set of two-dimensional records: RID Dimension 1 Dimension2 1 00 8 4 5 4 N 3 2 4 4 6 N 5 2. 00 6 00 8 6 Use the k-means algorithm to cluster the data in the dataset with K=3. You can assume that the records with RIDS 1, 3, and 5 are used for the initial cluster centroids (means). You must include the intermediate results in each...

  • Question: Use the data file DemoKTC file to conduct the following analysis. (a) Use k-means clustering...

    Question: Use the data file DemoKTC file to conduct the following analysis. (a) Use k-means clustering with a value of k = 3 to cluster based on the Age, Income, and Children variables to reproduce the results in Appendix 4.2. Average distance within least dense cluster Minimum cluster distance to least dense cluster (b) Repeat the k-means clustering for values of: k = 2 Average distance within least dense cluster Minimum cluster distance to least dense cluster k = 4...

  • Data clustering and the k means algorithm. However, I'm not able to list all of the...

    Data clustering and the k means algorithm. However, I'm not able to list all of the data sets but they include: ecoli.txt, glass.txt, ionoshpere.txt, iris_bezdek.txt, landsat.txt, letter_recognition.txt, segmentation.txt vehicle.txt, wine.txt and yeast.txt. Input: Your program should be non-interactive (that is, the program should not interact with the user by asking him/her explicit questions) and take the following command-line arguments: <F<K><I><T> <R>, where F: name of the data file K: number of clusters (positive integer greater than one) I: maximum number...

  • Question 4 1 pts Which of the following reasons is not the reason why the K-means...

    Question 4 1 pts Which of the following reasons is not the reason why the K-means algorithm will likely end up with sub-optimal clustering? (Select all that apply.) Bad choices for the initial cluster centers. Choosing a k that corresponds to the number of natural clusters in the dataset. Fast convergence of the K-means algorithm. Existence of closely located data samples in the dataset. Question 5 1 pts Which of the following is a step in K-means algorithm implementation? (Select...

  • Question 2 - Programming Exercise 1. Make a directory for this lab and change into it....

    Question 2 - Programming Exercise 1. Make a directory for this lab and change into it. 2. Copy files using the following command: cp/net/data/ftp/pub/class/115/ftp/cpp/Inheritance/Exercise.cpp Exercise.cpp Finish the program so that it compiles and runs. The instructions are contained in the C++ file. Your completed program should generate output similar to the following: TwoD default constructor This program asks for the coordinates of two points in 3D space and calculates their distance. Please enter the xyz coordinates for the first point:...

  • I am programing an Extended Kalman Filter , with noise but not getting correct answer ....

    I am programing an Extended Kalman Filter , with noise but not getting correct answer . this is my code # ---------- # Part Two # # Now we'll make the scenario a bit more realistic. Now Traxbot's # sensor measurements are a bit noisy (though its motions are still # completetly noise-free and it still moves in an almost-circle). # You'll have to write a function that takes as input the next # noisy (x, y) sensor measurement and...

  • Calculator Project

    AssignmentYou will be designing a calculator complete with user interface to take input from the user, process a response, and produce the output.Step 1: Present a menuFirst thing you should do is present a menu to the user when your program is first ran. Make sure that the operations match the numbers presented below, otherwise the graders won't be able to grade your program.1. Cartesian distance 2. Vector x matrix 3. Normalize 4. Quit Enter command:You will present the menu and wait for the user to input their...

  • It's pretty much just that pages discussing the graphs and explaining their meaning. would you like...

    It's pretty much just that pages discussing the graphs and explaining their meaning. would you like to see my notes? They're pretty similar to the graph. I wrote down my answer but I'm still pretty confused on it. 1) Consider the figure below. FIGURE 32 Self-Reinforcing Effects and Clustering Total cost labor cos Prop cost Number of firms in cluster Profit of fimm cincter Imagine that there is a $3 per unit fall in labor costs show that change on...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT