Problem

The data set families contains information about 43,886 families living in the city of C...

The data set families contains information about 43,886 families living in the city of Cyberville. The city has four regions: the Northern region has 10,149 families, the Eastern region has 10,390 families, the Southern region has 13,457 families, and theWestern region has 9,890. For each family, the following information is recorded:

1. Family type

1: Husband-wife family

2: Male-head family

3: Female-head family

2. Number of persons in family

3. Number of children in family

4. Family income

5. Region

1: North

2: East

3: South

4: West

6. Education level of head of household

31: Less than 1st grade

32: 1st, 2nd, 3rd, or 4th grade

33: 5th or 6th grade

34: 7th or 8th grade

35: 9th grade

36: 10th grade

37: 11th grade

38: 12th grade, no diploma

39: High school graduate, high school diploma, or equivalent

40: Some college but no degree

41: Associate degree in college (occupation/vocation program)

42: Associate degree in college (academic program)

43: Bachelor’s degree (e.g., B.S., B.A., A.B.)

44: Master’s degree (e.g., M.S., M.A., M.B.A.)

45: Professional school degree (e.g., M.D., D.D.S., D.V.M., LL.B., J.D.)

46: Doctoral degree (e.g., Ph.D., Ed.D.)

In these exercises, you will try to learn about the families of Cyberville by using sampling.

a. Take a simple random sample of 500 families. Estimate the following population parameters, calculate the estimated standard errors of these estimates, and form 95% confidence intervals:

i. The proportion of female-headed families

ii. The average number of children per family

iii. The proportion of heads of households who did not receive a high school diploma

iv. The average family income

Repeat the preceding parameters for five different simple random samples of size 500 and compare the results.

b. Take 100 samples of size 400.

i. For each sample, find the average family income.

ii. Find the average and standard deviation of these 100 estimates and make a histogram of the estimates.

iii. Superimpose a plot of a normal density with that mean and standard deviation of the histogram and comment on how well it appears to fit.

iv. Plot the empirical cumulative distribution function (see Section 10.2). On this plot, superimpose the normal cumulative distribution function with mean and standard deviation as earlier. Comment on the fit.

v. Another method for examining a normal approximation is via a normal probability plot (Section 9.9). Make such a plot and comment on what it shows about the approximation.

vi. For each of the 100 samples, find a 95% confidence interval for the population average income. How many of those intervals actually contain the population target?

vii. Take 100 samples of size 100. Compare the averages, standard deviations, and histograms to those obtained for a sample of size 400 and explain how the theory of simple random sampling relates to the comparisons.

c. For a simple random sample of 500, compare the incomes of the three family types by comparing histograms and boxplots (see Chapter 10.6).

d. Take simple random samples of size 400 from each of the four regions.

i. Compare the incomes by region by making parallel boxplots.

ii. Does it appear that some regions have larger families than others?

iii. Are there differences in education level among the four regions?

Step-by-Step Solution

Request Professional Solution

Request Solution!

We need at least 10 more requests to produce the solution.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the solution will be notified once they are available.
Add your Solution
Textbook Solutions and Answers Search