Question

Instructions First, download real estate data from the city of Ames, Iowa: download.file("htt...

Instructions

First, download real estate data from the city of Ames, Iowa:

download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

Now write code in R to answer the following questions. Make sure the script that you turn in includes the code that you write, the output that you get from that code (in a comment), and a sentence or more answering the question if there was one (also in a comment).

  1. Consider the variable Gr.Liv.Area of the ames data frame, which represents the above ground living area of the house in square feet. Calculate the mean of the values.
  2. Now assume that you'd like to get an idea of the average living area, but you have time to check only 50 of the 2930 houses sold in Ames. Take a sample of size 50 from Gr.Liv.Area without replacement and save it in the variable ames.gr.liv.area.sample. Calculate the mean of the sample. Is the mean for the sample different from the mean for the entire population? Why or why not?
  3. Use the par function with the mfrow argument to set up the plotting space with 2 rows and 1 column. Then calculate the range of Gr.Liv.Area in the data and save it in a variable named area.xlim.
  4. Plot a histogram of Gr.Liv.Area in the first row, setting its x axis limits to match area.xlim. Then use abline to draw a red vertical line representing the mean of the distribution on top of the histogram.
  5. Plot a histogram of the sample of size 50 in the second row, and draw a red vertical line for its mean. Make sure that the x axis limits are the same as those of the other histogram. How different is the plot of the sample from the plot of the full population?
  6. Each time we take a sample, we'll get a different mean. Use replicate to repeat the following process 5000 times:
    • Take a sample of Gr.Liv.Area of size 10
    • Take the mean.
  7. Save these 5000 means in a variable named area.means.10. Plot a histogram of these means and describe the shape of the distribution.
  8. Use replicate two more times, once taking 5000 samples of Gr.Liv.Area of size 50, and once taking 5000 samples of size 100. Save your results in two variables area.means.50 and area.means.100.
  9. Set up the plotting space with 3 rows and 1 column. Then calculate the range of area.means.10 and save it in a variable named area.means.10.xlim.
  10. Plot histograms for area.means.10, area.means.50, and area.means.100 in the three rows. Set all of their x axis limits to match area.means.10.xlim, and set all of their breaks to 20. If you wanted to estimate a mean, and you had to choose between a sample size of 10, 50, and 100, which one would you choose? Explain how the histograms here support your choice.
  11. What would the distribution of means look like if you took samples of size 1? What would it look like if you took samples of size 2930?
0 0
Add a comment Improve this question Transcribed image text
Answer #1

download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

1)

#MEan of the variable Gr.Liv.Area
mean(ames$Gr.Liv.Area) # Output: 1499.69 #

2)

#MEan of the sample of 50 for variable variable Gr.Liv.Area
ames.gr.liv.area.sample<-sample(ames$Gr.Liv.Area,50, replace = FALSE)

mean(ames$Gr.Liv.Area[ames.gr.liv.area.sample]) # Output: 1515.64 #
#Means of the population and sample are not very different. The sample is representative of the population

3)

# Plot set plotting space
par(mfrow=c(2,1))

#Save range of variable Gr.Liv.Area
area.xlim = range(ames$Gr.Liv.Area)

4)

#Plot histogram for variable Gr.Liv.Area
hist(ames$Gr.Liv.Area, xlim=area.xlim, main = "Histogram for Living Area")
abline(v=mean(ames$Gr.Liv.Area),col="red", lwd="4")

5)

#Plot histogram for sample of 50 for variable variable Gr.Liv.Area
hist(ames$Gr.Liv.Area[ames.gr.liv.area.sample], xlim=area.xlim, main = "Histogram for Living Area (Sample of 50)")
abline(v=mean(ames$Gr.Liv.Area[ames.gr.liv.area.sample]),col="red", lwd="4")

#The histograms for the variable Gr.Liv.Area for the population and sample are similar

6)

replicate(5000,mean(ames$Gr.Liv.Area[sample(ames$Gr.Liv.Area,10)]))

7)

area.means.10<-replicate(5000,mean(ames$Gr.Liv.Area[sample(ames$Gr.Liv.Area,10)]))
hist(area.means.10)
#The shape of the histogram is that of a normal distribution

Add a comment
Know the answer?
Add Answer to:
Instructions First, download real estate data from the city of Ames, Iowa: download.file("htt...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • mathematical statistic 5.4.11 Generate a sample of 1000 from an N (3, 2) distribution. (a) Calculate...

    mathematical statistic 5.4.11 Generate a sample of 1000 from an N (3, 2) distribution. (a) Calculate Fx for this sample. 280 Section 5.4: Data Collection (b) Plot a density histogram based on these data using the intervals of length 1 over the range (-5, 10). (c) Plot a density histogram based on these data using the intervals of length 0.1 over the range (-5,10) d) Comment on the difference in the look of the histograms in parts (b) and (c)....

  • Using R, Exercise 4 (CLT Simulation) For this exercise we will simulate from the exponential distribution....

    Using R, Exercise 4 (CLT Simulation) For this exercise we will simulate from the exponential distribution. If a random variable X has an exponential distribution with rate parameter A, the pdf of X can be written for z 2 0 Also recall, (a) This exercise relies heavily on generating random observations. To make this reproducible we will set a seed for the randomization. Alter the following code to make birthday store your birthday in the format yyyymmdd. For example, William...

  • Option 1 or 2. Option 1: Use the NOAA data set provided, to examine the variable...

    Option 1 or 2. Option 1: Use the NOAA data set provided, to examine the variable DX32. DX32 represents the number of days in that month whose maximum temperature was less than 32 degrees F. The mean of DX32 during this time period was 3.6. Using Excel, StatCrunch, etc, draw a histogram for DX32. Does this variable have an approximately normal (i.e. bell-shaped) distribution? A normal distribution should have most of its values clustered close to its mean. What kind...

  • , Samples In 30) drawn from a uniform distribution la Minitab was used to generate the...

    , Samples In 30) drawn from a uniform distribution la Minitab was used to generate the samples. es 300, b 500) Variables 15 Observations Variable TypeFormValues Missing Sample 1 Quantitative Sample 2 Quantitative Numeric Sample 3 Quantitative Numeric Sample 4 Quantitative Sample 5 ive Sample 6 Quantitative Sample 7 Quantitative Observations Sample 8 Quantitative Numeric Sample 9 Quantitative Sample 10 Quantitative Sample 11 Quantitative Sample 12 Quantitative Sample 13 Quantitative Sample 14 Quantitative Sample 15 Quantitative Numeric Numeric Variable Numeric...

  • Generate 20 samples of size n = 30 from your population. To do this: i. Generating...

    Generate 20 samples of size n = 30 from your population. To do this: i. Generating a sample of just size n = 30. Calculate the sample mean, X¯, for this sample. Record this value somewhere in your spreadsheet (you will need it later). ii. Repeat the previous step 19 more times, so that you end up with a spreadsheet with 20 columns, each column has 30 randomly generated values from your population, and you have calculated a sample mean...

  • 4- Variance and Standard deviation (how far the data values lie from the mean) The mean,...

    4- Variance and Standard deviation (how far the data values lie from the mean) The mean, mode, and median do a nice job in telling where the center of the data set is, but often we are interested in more. For example, a pharmaceutical engineer develops a new drug that regulates iron in the blood. Suppose she finds out that the average sugar content after taking the medication is the optimal level. This does not mean that the drug is...

  • Using minitab. 1. The data shown in Table 116.1 come from a production process with two...

    Using minitab. 1. The data shown in Table 116.1 come from a production process with two observable quality characteristics: x1 and x2. The data are sample means of each quality characteristic, based on samples of size n = 25. Assume that mean values of the quality characteristics and the covariance matrix were computed from 50 preliminary samples: 200 1307 1 = 30s= 130 120 TABLE 11 E. 1 Data for Exercise 11.1 Sample Number 1 58 60 50 10 12...

  • please I need help with excel or matlab part. part 3 Lab 1 BASIC DATA PROCESSING...

    please I need help with excel or matlab part. part 3 Lab 1 BASIC DATA PROCESSING PRE-LAB ASSIGNMENT 1. Read the lab manual carefully so you know what you have to do when you walk into the lab. 2. In a lab, the resistance of a resistor was measured using 50 samples giving the following values: 119.95 (6), 121.32 (5), 119.57 (7), 117.43(1), 120.76 (15), 120.67 (1), 119.78 (8), 121.43(3), 121.82(1), and 118.47 (3) 2 Estimate the average value of...

  • P-value is the probability, computed assuming H0 is true, that the test statistic would take a...

    P-value is the probability, computed assuming H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed. Given the sample at hand, it is the smallest level of significance at which H0 would be rejected. It depends on the sample (hypotheses as well) and is hence also a test statistic. Generate 50 samples of size n=10 from a normal distribution with mean μ=1 and variance σ2=4. For each sample, use the...

  • 6. The sampling distribution of the sample proportion Aa Aa In 2007, about 14% of new-car purchas...

    6. The sampling distribution of the sample proportion Aa Aa In 2007, about 14% of new-car purchases in New York were financed with a home equity loan. [Source: "Auto Industry Feels the Pain of Tight Credit," The New York Times, May 27, 2008.] The ongoing process of new-car purchases in New York can be viewed as an infinite population Define p as the proportion of the population of new-car purchases in New York that are financed with a home equity...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT