Find genes in ALL with the most and the least significant effect on age – plot distribution of p-values for gene effect on age for all 12K genes, generate diagnostic plots and present model summaries and ANOVA results for the linear models of the two extreme cases – the gene with the most and with the least significant effect on age as assessed by ANOVA. Hint 1: quantify the effect strength by significance, not by the magnitude of the slope; Hint 2: anova is slow, running 12K times may take some time – debug you code on a subset of data, e.g. first 100 rows.
TO access ALL dataset go on R and do the following:
source("http://www.bioconductor.org/biocLite.R")
biocLite(ALL)
library (ALL)
data (ALL)
exprs(ALL)
ANalysis Of VAriance is a statistical test that calculates
whether the means of variables differ between two or more
groups..ANOVA concerned a sound test when a variables are normally
distributes and samples are independent..use R2 to find which genes
exhibit differential expression between groups of samples in a
dataset..statistical test will be useful for this process..
All(advanced) parameters can be adopted to the specific needs..In
single dataset view a gene in groups,select additional conditions
and proceed..For the current dataset track called alive consist of
survival data ofthe patients whom tumor sample was taken..
Gene selector using dataset and adjust the
settings..Track gender,filtering and making sample can repeat the
filter procedure..one selected group will be ANOVA for data..
R2 performs one-way ANOVA..R2 shows when mRNA
default in the samples among two groups..for choosing single or
multiple dataset analysis,select a dataset for analysis and select
type of analysis to find differential expression between groups..R2
determines P-values for the different genes by one-way anova or
alternatively a brute-force --test on combination of groups..With
this statistical tests obtain a top-X list of the genes which are
used in specific test..
We are testing lot of genes here..so we have to
connect for multiple testing..multiple-comparision problem arises
if one wanted to use this test..FDR threshold determines P-values
distribution..for this specific chromosomes can be chosen and with
positive result we can defined also..
R is the correlation coefficient..it ranges from -1 to
+1 if R> 0 the value of two variables tends to increase or
decrease together..if R<0 the value of X increases if that of Y
decreases..if R~ 0there is no relation..If R^2=0.59 then 59% of the
variance in Y can be explained by variance in X..The P-values for
this calculation estimates a P-value of 0.01 can be 99% sure that
is not the case..
Find genes in ALL with the most and the least significant effect on age – plot...