This homework assignment should help you towards understanding one-way analysis of variance (ANOVA) statistical tests AND allow you to become familiar with using Rmarkdownand some basic LaTexprogramming. By using Rmarkdown, you can include your R code, the output and any written explanations within one output file.
You should complete this entire assignment with Rmarkdown.
species |
length |
red |
53.2 |
red |
44.6 |
red |
60.2 |
red |
50.7 |
red |
44.8 |
red |
54.0 |
red |
51.4 |
red |
51.5 |
red |
53.3 |
red |
47.5 |
red |
44.5 |
red |
49.0 |
red |
50.2 |
blue |
59.1 |
blue |
60.3 |
blue |
60.4 |
blue |
60.1 |
blue |
58.5 |
blue |
59.4 |
blue |
61.3 |
blue |
60.4 |
blue |
60.0 |
blue |
61.6 |
blue |
59.9 |
green |
63.6 |
green |
85.1 |
green |
79.7 |
green |
67.5 |
green |
90.6 |
green |
78.9 |
green |
71.9 |
green |
66.9 |
green |
81.3 |
green |
46.1 |
green |
87.1 |
green |
90.6 |
##First make a ".txt" file of data named "data.txt"
d=read.table("data.txt",header=TRUE)
##Question of interest: The scientists are trying to figure out
if the fish are significantly different in length (cm) from one
another.
##Null Hypothesis-H0: the species of fish are not
significantly different in length (cm) from one another.
##Alternative Hypothesis-H1: the species of fish are significantly
different in length (cm) from one another.
summary(d)
species length
blue :11 Min. :44.50
green:12 1st Qu.:51.48
red :13 Median :60.05
Mean :61.81
3rd Qu.:67.05
Max. :90.60
var(d[,2])
[1] 177.697
y=d$length
d1=subset(y,species=="red")
d2=subset(y,species=="blue")
d3=subset(y,species=="green")
> summary(d1) ##red
Min. 1st Qu. Median Mean 3rd Qu. Max.
44.50 47.50 50.70 50.38 53.20 60.20
> summary(d2) ##blue
Min. 1st Qu. Median Mean 3rd Qu. Max.
58.50 59.65 60.10 60.09 60.40 61.60
> summary(d3) ##green
Min. 1st Qu. Median Mean 3rd Qu. Max.
46.10 67.35 79.30 75.78 85.60 90.60
> var(d1)
[1] 19.81359
> var(d2)
[1] 0.8009091
> var(d3)
[1] 172.8693
##The variances of different species are not equal.
##Shapiro-Wilks test for Normality
##H00: Data are normally distributed.
shapiro.test(d1)
Shapiro-Wilk normality test
data: d1
W = 0.9338, p-value = 0.3818
shapiro.test(d2)
Shapiro-Wilk normality test
data: d2
W = 0.96771, p-value = 0.8625
shapiro.test(d3)
Shapiro-Wilk normality test
data: d3
W = 0.9174, p-value = 0.2651
##p-value for all the three species is greater than alpha=0.05.
Thus, we fail to Reject H00.
##Therefore, Data (each group) is normally
distributed.
species=factor(d$species)
species
[1] red red red red red red red red red red red red red blue blue
blue blue blue blue blue blue blue blue blue green green green
green green green green green green green green green
Levels: blue green red
ANOVA=aov(y~species)
ANOVA
Call:
aov(formula = y ~ species)
Terms:
species Residuals
Sum of Squares 4072.061 2147.335
Deg. of Freedom 2 33
Residual standard error: 8.066644
Estimated effects may be unbalanced
summary(ANOVA)
Df Sum Sq Mean Sq F value Pr(>F)
species 2 4072 2036.0 31.29 2.4e-08 ***
Residuals 33 2147 65.1
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##From the output of summary, p-value=2.4e-08=0 (approx.) which
is less than alpha=0.05. We Reject H0.
##Thus, the species of fish are significantly different in
length (cm) from one another.
###Finding which species is different
TukeyHSD(ANOVA,'species',conf.level=0.95)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ species)
$species
diff lwr upr p adj
green-blue 15.684091 7.421656 23.94653
0.0001459
red-blue -9.713986 -17.823012 -1.60496
0.0160087
red-green -25.398077 -33.321973 -17.47418
0.0000000
##From output of TukeyHSD test,p-value for all 3 pairs is less
than alpha=0.05.
##Therefore, it is clear that at 5% l.o.s, all pairs of
species are significant.
This homework assignment should help you towards understanding one-way analysis of variance (ANOVA) statistical tests AND...
This homework assignment should help you towards understanding one-way analysis of variance (ANOVA) statistical tests AND allow you to become familiar with using Rmarkdownand some basic LaTexprogramming. By using Rmarkdown, you can include your R code, the output and any written explanations within one output file. You should complete this entire assignment with Rmarkdown. Recently scientists have surveyed the local fish species in several nearby ponds. The scientists are trying to figure out if the fish are significantly different in...