Question

explain the 5 methods used to measure distance between clusters

explain the 5 methods used to measure distance between clusters

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Euclidean distance

This is the most usual, “natural” and intuitive way of computing a distance between two samples. It takes into account the difference between two samples directly, based on the magnitude of changes in the sample levels. This distance type is usually used for data sets that are suitably normalized or without any special distribution problem.

Manhattan distance

Also known as city-block distance, this distance measurement is especially relevant for discrete data sets. While the Euclidean distance corresponds to the length of the shortest path between two samples (i.e. “as the crow flies”), the Manhattan distance refers to the sum of distances along each dimension (i.e. “walking round the block”).

Pearson Correlation distance

This distance is based on the Pearson correlation coefficient that is calculated from the sample values and their standard deviations. The correlation coefficient 'r' takes values from –1 (large, negative correlation) to +1 (large, positive correlation). Effectively, the Pearson distance -dp- is computed as dp = 1 - r and lies between 0 (when correlation coefficient is +1, i.e. the two samples are most similar) and 2 (when correlation coefficient is -1).
Note that the data are centered by subtracting the mean, and scaled by dividing by the standard deviation.

Absolute Pearson Correlation distance

In this distance, the absolute value of the Pearson correlation coefficient is used; hence the corresponding distance lies between 0 and 1, just like the correlation coefficient. The equation for the Absolute Pearson distance -da- is:
da = 1 - ½ r ½

Taking the absolute value gives equal meaning to positive and negative correlations, due to which anti-correlated samples will get clustered together.

Un-centered Correlation distance

This is the same as the Pearson correlation, except that the sample means are set to zero in the expression for un-centered correlation. The un-centered correlation coefficient lies between –1 and +1; hence the distance lies between 0 and 2.

Add a comment
Know the answer?
Add Answer to:
explain the 5 methods used to measure distance between clusters
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT