K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...

Question

Question

K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...

K-means clustering
K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of the process made it popular to data analysts. The task is to form clusters of similar data objects (points, properties etc.). When the dataset given is unlabeled, we try to make some conclusion about the data by forming clusters. Now, the number of clusters can be pre-determined and number of points can have any range.
The main idea behind the process is finding nearest cluster mean and assigning points to their nearest clusters. Initially we start by picking some random centroids (mean values of required clusters). Then we assign all points to some cluster. After assigning all the points, we calculate the cluster mean (centroid) and update it. Now, we have got new centroids of our clusters and all the points labelled to some cluster. Then, we have to again iterate over all points and re-assign the clusters considering the newly updated centroids. And, thus it goes on until all the points labelled does not change their clusters or at least a large (predetermined) number of assignment have been done.
The total algorithm can be summarized as:
1. Randomly pick n centroids (n being the number of clusters to be created)
2. At each iteration calculate the distance between each point and the centroids
3. Assign the point to its closest cluster (closest centroid in a cluster)
4. After assigning all the points to some cluster, update the centroids
5. Repeat steps 2-4 until there is no update in labelling or a fixed number of iterations completed
You are given with the kmeansCluster function:
function [] = kmeansCluster()
max_range = 10;
min_range = -10;
num_points = 10000;
vector = ( max_range - min_range ) .* rand( 1, num_points ) + min_range;
points = makePoints( vector );
centroids = initialCentroid( points );
init_labels = false( size( points, 1 ), 1 );
[ labels, c1, c2 ] = makeClusters( points, centroids );
trial = 0;
while ~isequal( labels, init_labels ) && trial < 1000
init_labels = labels;
centroids = [ mean( c1 ); mean( c2 ) ];
[ labels, c1, c2 ] = makeClusters( points, centroids );
trial = trial + 1;
end
hold on
plot( c1( :, 1 ), c1( :, 2 ), 'r.' );
plot( c2( :, 1 ), c2( :, 2 ), 'b.' );
plot( centroids(1), centroids(3), 'ro', 'MarkerSize', 10, 'MarkerFaceColor', 'r' );
plot( centroids(2), centroids(4), 'bo', 'MarkerSize', 10, 'MarkerFaceColor', 'b' );
hold off
end
Now, you have to use several functions in this function, they will have to be defined by you. All the functions
are described below:
• makePoints
o parameter:
? vector - row vector with m points
o returns:
? points - m/2 x 2 matrix of which each row represents a point
o description: A row vector with m points will be passed to the function as parameter, you have
to reshape the vector to a matrix with m/2 rows and 2 columns, then return the newly
modified matrix.
o example:
? >> vector = [ 2 4 5 10 5 9 21 23 10 22 ];
? >> points = makePoints( vector );
points =
2 4
5 10
5 9
21 23
10 22
• initialCentroid
o parameter:
? point - n x 2 matrix with n point coordinates
o returns:
? centroids - 2 x 2 matrix with two random centroids at each row
o description: The function takes in points and generates two random centroids ranging within
range of the input points. For this, you need to find out the maximum and minimum values
for x, y across all the points and generate random integer numbers using that range. The
points input will be floating numbers, so you have to round them to nearest integers to be
used as range for randi function.
o example:
? >> centroids = initialCentroid( points );
centroids =
9 17
15 9
• makeClusters
o parameter:
? points - matrix (n x 2 matrix)
? means - matrix (2 x 2 matrix)
o returns:
? label – n x 1 logical matrix containing 0’s and 1’s as labels for ith point
? cluster1, cluster2 – matrices with points assigned to that cluster, each row represents
points
o description: This function takes each point from points matrix and computes its distance
from the two centroids. If the distance to centroid1 is lesser than the distance to centroid2,
it assigned 0 and concatenated vertically with cluster1. Otherwise, it is assigned as 1 and
concatenated vertically with cluster2. To compute distance between two points, you also
have to define euclidDist function which is described later.
o example:
? >> [ labels, cluster1, cluster2 ] = makeClusters( points,
centroids );
? labels =
5×1 logical array
10000
? cluster1 =
5 10
5 9
21 23
10 22
? cluster2 =
2 4
• euclidDist
o parameter:
? point1, point2 – 1 x 2 matrices containing coordinates of points, x in column 1, y in
column 2
o returns:
? distance – Euclidean distance between the points
o description: This function computes the Euclidean distance between two points using the
age-old formula ???????????????????????????????? = ??(????1 ? ????2)2 + (????1 ? ????2)2
You can do this using Matlab shorthand command:
distance = sqrt( sum( ( point1 - point2 ) .^ 2 ))
o example:
? >> point1 = [ 2, 2 ];
? >> point2 = [ 5, 5 ];
? >> distance = euclidDist( point1, point2 );
? distance =
4.2426
You do not have to worry about other functions used here in the kmeansCluster function. What we did
was initially created a matrix label which has zeros as labels.
init_labels = false( size( points, 1 ), 1 );
Then we called the makeCluster function for the first time outside the while loop to initially label all the
points.
[ labels, c1, c2 ] = makeClusters( points, centroids );
Now, the while loop will run until the labels assigned at each iteration remain same or trials are less than
1000. That is, there will be at most 1000 iterations if points get reassigned every iteration, or otherwise it
would stop when there is no other change to labels.
while ~isequal( labels, init_labels ) && trial < 1000
In the while loop, we are calculating new centroids and trying to reassign each point based on those newly
updated centroids. Also, trial value is incremented.
init_labels = labels;
centroids = [ mean( c1 ); mean( c2 ) ];
[ labels, c1, c2 ] = makeClusters( points, centroids );
trial = trial + 1;
The next lines are for plotting as you can guess obviously. If you complete the functions correctly, you
would be seeing a plot of the points clustered into two clusters, red and blue with centroids in solid circles.
For different values of num_points you will see something like these graphs, though they might not be
exactly same since we used random functions to generate our points.
num_points = 100
num_points = 1000
num_points = 5000 num_points = 10000

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Please find required MATLAB code below:

--------------------------------------------------- makePoints.m

function data=makePoints(vector)
% A row vector with m points will be passed to the function as parameter, you have
% to reshape the vector to a matrix with m/2 rows and 2 columns, then return the newly
% modified matrix.

% example:
% vector = [ 2 4 5 10 5 9 21 23 10 22 ];
% points = makePoints( vector );
% points =
% 2 4
% 5 10
% 5 9
% 21 23
% 10 22

m=length(vector);
data=reshape(vector,[floor(m/2),2]);

--------------------------------------------------------------- initialCentroid.m

function centroids=initialCentroid(point)
% description: The function takes in points and generates two random centroids ranging within
% range of the input points. For this, you need to find out the maximum and minimum values
% for x, y across all the points and generate random integer numbers using that range. The
% points input will be floating numbers, so you have to round them to nearest integers to be
% used as range for randi function.

% example:
% centroids = initialCentroid( points );
% centroids =
% 9 17
% 15 9

x=point(:,1);
y=point(:,2);

randX = randi([round(min(x)) round(max(x))],2,1);
randY = randi([round(min(y)) round(max(y))],2,1);

centroids=[randX randY];

----------------------------------------------- makeClusters.m

function [label,cluster1,cluster2]=makeClusters(points,means)

% description: This function takes each point from points matrix and computes its distance
% from the two centroids. If the distance to centroid1 is lesser than the distance to centroid2,
% it assigned 0 and concatenated vertically with cluster1. Otherwise, it is assigned as 1 and
% concatenated vertically with cluster2. To compute distance between two points, you also
% have to define euclidDist function which is described later.

% Example:
% >> [ labels, cluster1, cluster2 ] = makeClusters( points,
% centroids );
% labels = 5*1 logical array
% 10000
% cluster1 =
% 5 10
% 5 9
% 21 23
% 10 22
% cluster2 =
% 2 *
cluster1=[];
cluster2=[];

for k=1:size(points,1)
distance(k,1)=euclidDist(points(k,:), means(1,:));
distance(k,2)=euclidDist(points(k,:), means(2,:));
if distance(k,1)< distance(k,2)
label(k,:)=0;
cluster1=[cluster1;points(k,:)];
else
label(k,:)=1;
cluster2=[cluster2;points(k,:)];
end
end

------------------------------------------------ euclidDist.m

function distance=euclidDist(point1, point2)

% This function computes the Euclidean distance between two points

% Example:
% point1 = [ 2, 2 ];
% point2 = [ 5, 5 ];
% distance = euclidDist( point1, point2 )
% distance =
%
% 4.2426

distance = sqrt( sum( ( point1 - point2 ) .^ 2 ));

-------------------------------------------------------- SCREENSHOT OF CODES

-------------------------------------- SAMPLE OUTPUT

Add a comment

Answer 2

K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...

Homework Answers

Add Answer to:
K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...

Post as a guest

Earn Coins

K-means clustering Problem 1. (10 pts) Suppose that we have the gene expression values for 5...

Please write full justification for (a) and (b). Will uprate/vote! 4. K-means The goal of K-means clustering is to divide a set of n points into k< n subgroups of points that are "close" t...

1. apply k-means clustering to a dataset Task Consider the following set of two-dimensional records: RID...

Question: Use the data file DemoKTC file to conduct the following analysis. (a) Use k-means clustering...

Data clustering and the k means algorithm. However, I'm not able to list all of the...

Question 4 1 pts Which of the following reasons is not the reason why the K-means...

Question 2 - Programming Exercise 1. Make a directory for this lab and change into it....

I am programing an Extended Kalman Filter , with noise but not getting correct answer ....

Calculator Project

It's pretty much just that pages discussing the graphs and explaining their meaning. would you like...

K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...

Homework Answers

Add Answer to: K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...

Post as a guest

Earn Coins

Add Answer to:
K-means clustering K-means clustering is a very well-known method of clustering unlabeled data. The simplicity of...