Question

On this worksheet, make an XY scatter plot linked to the following data: X 22 48 37 30 24 10 42 3...

On this worksheet, make an XY scatter plot linked to the following data:

X 22 48 37 30 24 10 42 30 41 29 16 36 45 11 31 26 31 33 46 22 13 22 32 49 35

Y 3872 9312 5217 4230 4536 1820 8274 121 6314 3828 2448 6156 7515 1309 3534 4576 5797 4983 6670 2464 2197 3278 5408 7497 5705

Add trendline, regression equation and r squared to the plot. Add this title. ("Scatterplot of X and Y Data") The scatterplot reveals a point outside the point pattern.

Copy the data to a new location in the worksheet. You now have 2 sets of data. Data that are more tha 1.5 IQR below Q1 or more than 1.5 IQR above Q3 are considered outliers and must be investigated.

It was determined that the outlying point resulted from data entry error. Remove the outlier in the copy of the data.

Make a new scatterplot linked to the cleaned data without the outlier, and add title ("Scatterplot without Outlier,") trendline, and regression equation label. Compare the regression equations of the two plots.

How did removal of the outlier affect the slope and R2?

0 0
Add a comment Improve this question Transcribed image text
Answer #1

This is a simple problem of visualization of a data with the help of a scatter plot and to appriciate the change in the quality of the data inference based on removal of outliers from it.

We shall start with a plot (scatter) of the raw data and measure the various attributes of the curve/straight line .

Then we take out the outliers and then try to see the improvement in the quality of the prediction of the curve.

Scatterplot of X and Y Data 10000 8000 6000 4000 y= 172.46x-567.36 2000 outlier 30 10 20 40 50 60 Linear (Y)

Now that we have developed a scatter plot

let us make a scatter plot based on clean data.

This will be done as per the requirement of the question

We need to weed off all values lower than Q1-1.5IQR and higher than Q3+1.5IQR

IQR (inter quartile range) is given by Q3-Q1

I took the help of QUARTILE function in excel to find out the quartiles of X and Y as shown below.

Quartile X Y
Q1 22 3278
Q3 37 6156

IQR (X) =37-22=15

IQR(Y)=6156-3278=2878

Now for X

Q1-1.5IQR=22-1.5*15 =-0.50

Q3+1.5IQR =37+1.5*15=59.50

So we take all values between -0.50 to 59.50

For Y

Q1-1.5IQR=3278-1.5*2878 =-1039

Q3+1.5IQR =6156+1.5* 2878=10473

We take all values between -1039 to 10473

Clearly the X =30 ,Y =121 is the outlier and has to taken out.

We again make a new scatter plot as below

1309 1820 2197 2448 2464 3278 3534 3828 3872 4230 4536 4576 4983 5217 5408 5705 5797 6156 6314 6670 7497 7515 8274 9312 10 13

now compare the two plots.

clearly the R squared has moved up from 73.19% to 88.04 % meaning that the scatter best fit trend line can now explain more variablity between the actual and predicted value .Hence the straight line in second plot is a better fit .

Add a comment
Know the answer?
Add Answer to:
On this worksheet, make an XY scatter plot linked to the following data: X 22 48 37 30 24 10 42 3...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT