In R load the tidyverse package Consider the `USArrests` dataset, which contains statistics, in a...

Question

Question

In R load the tidyverse package Consider the `USArrests` dataset, which contains statistics, in a...

In R load the tidyverse package

Consider the `USArrests` dataset, which contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.

(a) Perform k-means clustering using all numerical variables in this dataset, scaling the variables before running the clustering algorithm

(b) Try two different values of $k$ and comment on your results.

(c) Visualize the results of the clustering using the variables `Murder` and `UrbanPop`

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

data("USArrests")
mydata <- USArrests
mydata <- na.omit(mydata)
mydata <- scale(mydata)
head(mydata, n=10)

set.seed(124)
ss <- sample(1:50,10)
df <- USArrests[ss, ]
df <- na.omit(df)
head(df,n=6)
df.scaled <- scale(df)
head(round(df.scaled, 2))
desc_stats <- data.frame(
  Min = apply(USArrests, 2, min),
  Max = apply(USArrests, 2, max),
  Med = apply(USArrests, 2, median),
  SD = apply(USArrests, 2, sd),
  Mean = apply(USArrests, 2, mean))
desc_stats <- round(desc_stats,1)
head(desc_stats)
library(stats)
eucl <- dist(df.scaled, method = "euclidean" )
round(as.matrix(eucl)[1:6,1:6],1)
cor <- cor(t(df.scaled), method = "pearson")
dist_cor <- as.dist(1 - cor)
round(as.matrix(dist_cor)[1:6,1:6],1)

#daisy() to compute dissimilarity matrices between observations 
library(cluster)
library(factoextra)
daisy(df.scaled, metric = c("euclidean", "manhattan", "gower"), stand = FALSE)
data("flower")
head(flower)
str(flower)
daisy_dist <- as.matrix(daisy(flower))
head(round(daisy_dist[1:6,1:6]),2)
library(corrplot)
corrplot(as.matrix(eucl), is.corr = FALSE, method = "color")
corrplot(as.matrix(eucl), is.corr = FALSE, method = "color", order = "hclust", type = "upper")
plot(hclust(eucl, method = "ward.D2"))

heatmap(as.matrix(eucl), symm = TRUE, distfun = function(x) as.dist(x))

Add a comment

Answer 2

In R load the tidyverse package Consider the `USArrests` dataset, which contains statistics, in a...

Homework Answers

Add Answer to:
In R load the tidyverse package Consider the `USArrests` dataset, which contains statistics, in a...

Post as a guest

Earn Coins

For the following exercises you can use the 'Wooldridge' package in R to load the data 9. (7 marks) (using data...

R studio #Exercise : Calculate the following probabilities : #1. Probability that a normal random variable...

ies yuu t pret and comimuhicate the findings of two linear regression models. The data is...

In R load the tidyverse package Consider the `USArrests` dataset, which contains statistics, in a...

Homework Answers

Add Answer to: In R load the tidyverse package Consider the `USArrests` dataset, which contains statistics, in a...

Post as a guest

Earn Coins

For the following exercises you can use the 'Wooldridge' package in R to load the data 9. (7 marks) (using data...

R studio #Exercise : Calculate the following probabilities : #1. Probability that a normal random variable...

ies yuu t pret and comimuhicate the findings of two linear regression models. The data is...

Add Answer to:
In R load the tidyverse package Consider the `USArrests` dataset, which contains statistics, in a...