Note : Allowed to solve only 4 questions in one post.
Please post the remaining in a new post.
a. We see that the number of stations in New
York as well as number of vehicles are far greater than the
remaining city. Hence New York could be an outlier.
b.
Box plot for Stations
Box plot for Vehicles
Box plot for Track
We see that New York is an outlier in all three plot hence it
must be removed.
c. Scatterplot
d. Correlation coefficient
Yes there seem to be positive linear relationship between the two
variables.
The correlation coefficient is given below.
Q1 Q2 Q3 14 16 1 22 38 43 53 53 56 144 408
Arrange the data in ascending order To find the find the median: We see that there odd number of observation. Hence we take middle value as shown above. Quartile 2 or Median = 43 To find the find the first quartile : We take the upper half of the data from the median value and take the middle value as shown. Quartile 1 or 25th percentile = 18 To find the find the third quartile : We take the lower half of the data from the median value and take the middle value as shown. Quartile 3 or 75th percentile = 86 The five point summary min = 14 1st quartile =18 median = 43 3rd quartile = 86 max = 468 3(86.8182-43) skewness 3(mean-median) 0.9953667 o 132.0665 Interquartile Range (IQR) IQR = Q3 - Q1 = 86-18=68 To find the outlier : lowerlimit = Q1 - 2 IQR = 18 - (1.5 X 68)= -118 upperlimit = Q3 + 2 IQR = 86 + (1.5 X 68)= 222 Hence any value lower than-118 or greater than 222 is an outlier.
Boxplot o 468 400 300 200 0 100 86 43 18 0
Q1 Q2 Q3 60 100 102 136 252 371 408 669 950 1190 6333
Arrange the data in ascending order To find the find the median : We see that there odd number of observation. Hence we take middle value as shown above. Quartile 2 or Median = 371 To find the find the first quartile : We take the upper half of the data from the median value and take the middle value as shown. Quartile 1 or 25th percentile = 102 To find the find the third quartile : We take the lower half of the data from the median value and take the middle value as shown. Quartile 3 or 75th percentile = 950 The five point summary min = 60 1st quartile = 102 median = 371 3rd quartile = 950 max = 6333 skewness 3(mean-median) 3(961-371) 1820.0118 0.9725212 Interquartile Range (IQR) IQR = Q3 - Q1 = 950- 102= 848 To find the outlier lowerlimit = Q1 - 2 IQR = 102 - (1.5 X 848)= - 1594 upperlimit = Q3 +2 IQR = 950 + (1.5 X 848)= 2646 Hence any value lower than -1594 or greater than 2646 is an outlier.
Boxplot 6333 5000 T 3000 1000 950 371 o
Q1 Q2 Q3 34 34 42 57 102 108 193 226 246 288 835
Arrange the data in ascending order To find the find the median : We see that there odd number of observation. Hence we take middle value as shown above. Quartile 2 or Median = 108 To find the find the first quartile : We take the upper half of the data from the median value and take the middle value as shown. Quartile 1 or 25th percentile = 42 To find the find the third quartile : We take the lower half of the data from the median value and take the middle value as shown. Quartile 3 or 75th percentile = 246 The five point summary min = 34 1st quartile =42 median = 108 3rd quartile = 246 max = 835 3(mean-median) skewness 3(196.8182-108) 230.7145 1.1549105 Interquartile Range (IQR) IQR = Q3 - Q1 = 246-42= 204 To find the outlier: lowerlimit = Q1 - 2 IQR = 42-(1.5 X 204)= -366 upperlimit = Q3 +2 IQR = 246 + (1.5 x 204)= 654 Hence any value lower than -366 or greater than 654 is an outlier.
Boxplot 835 800 600 400 246 200 108 84
Track(x) Station(y) 193 38 34 14 Scatter plot 160 City altanta Baltimore Boston Chicago Cleveland 108 53 140 . 288 144 120 42 18 100 LA 34 16 . Station(y) 80 57 22 60 102 53 2 Miami Philadelphia SF Washington 40 246 43 226 86 20 0 0 50 100 150 200 250 300 350 Track(x)
Let x represent Track Let y represent Station Obs.No. X (x - 2) y Y-Y (y-y2 (x-7)(y-7) 1 193 50 3600 38 -10.7 114.49 -642 34 -99 9801 14 -34.7 1204.09 3435.3 N کا 108 -25 625 53 4.3 18.49 -107.5 4 288 155 24025 144 95.3 9082.09 14771.5 5 42 -91 8281 co -30.7 942.49 2793.7 6 34 -99 9801 16 -32.7 1069.29 3237.3 7 57 -76 5776 22 -26.7 712.89 2029.2 8 102 -31 961 53 4.3 18.49 -133.3 9 246 113 12769 43 -5.7 32.49 -644.1 10 226 93 8649 86 37.3 1391.29 3468.9 Total 1330 0 84288 487 O 14586.1 28209
n=10 x = 1330 Mean of Track 1330 10 133 72 x-=0 (x - 2)2 = 84288 y="487 Mean of Station 487 y 48.7 10 Sy-y=0 (y-7)? = 14586.1 (x - 2)(y- y) = 28209 Standard deviation of Track = 72 (2-1) 84288 96.7746523 72-1 3 Variance of Track o = 96.7746523 = 9365.3333333 Standard deviation of Station 14586.1 40.2576425 3 = 1620.6777778 12-1 Variance of Station o} = 40.25764252 Correlation (1-1)(y-7) P (1-7)(y-7) 28209 84288X14586.1 0.8045 V