explain the 5 methods used to measure distance between clusters
Euclidean distance
This is the most usual, “natural” and intuitive way of computing a distance between two samples. It takes into account the difference between two samples directly, based on the magnitude of changes in the sample levels. This distance type is usually used for data sets that are suitably normalized or without any special distribution problem.
Manhattan distance
Also known as city-block distance, this distance measurement is especially relevant for discrete data sets. While the Euclidean distance corresponds to the length of the shortest path between two samples (i.e. “as the crow flies”), the Manhattan distance refers to the sum of distances along each dimension (i.e. “walking round the block”).
Pearson Correlation distance
This distance is based on the Pearson correlation coefficient
that is calculated from the sample values and their standard
deviations. The correlation coefficient 'r' takes
values from –1 (large, negative correlation) to +1 (large, positive
correlation). Effectively, the Pearson distance -dp- is computed as
dp = 1 - r and lies between 0 (when correlation coefficient is +1,
i.e. the two samples are most similar) and 2 (when correlation
coefficient is -1).
Note that the data are centered by subtracting the
mean, and scaled by dividing by the standard
deviation.
Absolute Pearson Correlation distance
In this distance, the absolute value of the Pearson correlation
coefficient is used; hence the corresponding distance lies between
0 and 1, just like the correlation coefficient.
The equation for the Absolute Pearson distance -da- is:
da = 1 - ½ r ½
Taking the absolute value gives equal meaning to positive and negative correlations, due to which anti-correlated samples will get clustered together.
Un-centered Correlation distance
This is the same as the Pearson correlation, except that the sample means are set to zero in the expression for un-centered correlation. The un-centered correlation coefficient lies between –1 and +1; hence the distance lies between 0 and 2.
Explain two different methods that can be used to measure the phase angle difference between two sinusoidal functions with the same frequency using an oscilloscope.
(a) Write down the objective function of K-means. (b) Assume you have n d-dimension vectors, write down the code of K-means to cluster these n vectors to K groups (c) Explain three methods to measure the distance between two clusters for numerical data (a) Write down the objective function of K-means. (b) Assume you have n d-dimension vectors, write down the code of K-means to cluster these n vectors to K groups (c) Explain three methods to measure the distance...
We used velocity dispersion and average distance between galaxies in the cluster to determine the virial mass, and we used the number of galaxies and the average mass of a galaxy to determine luminous mass. How would the average distance between galaxies have to change in order to eliminate the evidence for dark matter in galaxy clusters? Calculate the change needed in the case of the Coma Cluster.
Hierarchical clustering is sometimes used to generate K clusters, K > 1 by taking the clusters at the Kth level of the dendrogram. (Root is at level 1.) By looking at the clusters produced in this way, we can evaluate the behavior of hierarchical clustering on different types of data and clusters, and also compare hierarchical approaches to K-means. The following is a set of one-dimensional points: {6, 12, 18, 24, 30, 42, 48}. (a) For each of the following...
The masses of clusters of galaxies can be measured using methods based on three different physical processes. Name these methods and state what assumptions must be made about the physical state of the cluster in order for the individual methods to be applied.
Document one or more methods used to characterize and measure consumer confidence. Compare and contrast how confidence might be related to financial markets’ expectations of risk of a recession, similarly to interest rate spreads. Do you find consumer confidence to be a useful measure? Explain why or why not. Also comment on indicators contained in “economic fundamentals,” in its value to firm managers.
How must the line appear in order to measure the shortest distance between a point and a line?
5. Hierarchical clustering and k-means clustering both require the mumber of clusters (k) to be specified in advance False True Explain 5. Hierarchical clustering and k-means clustering both require the mumber of clusters (k) to be specified in advance False True Explain
Two methods were used to measure florescence lifetime of a dye. Method 1 Method 2 Mean lifetime 2.382 2.346 Standard deviation 0.035 0.049 Number of measurements 5 5 a) Are the standard deviations significantly different at 95% confidence level? b) Are the mean values significantly different at 95% confidence level?
Two different methods were used to measure the iron content of a cereal. Are the means of the two methods significantly different? Assume that the population standard deviation for each method is essentially the same. Method 1: -Mean [Fe+3] (mM) 1.382 -Standard Deviation (mM) 0.025 -# of measurements 4 Method 2: -Mean [Fe+3] (mM) 1.346 -Standard Deviation (mM) 0.039 -# of measurements 4