1. Frequency and Relative Frequency Distribution.
What is the shape of the distribution of your sample data?
Which class best estimates the center of the distribution?
Do you have any modes? Are there any outliers?
Make a guess as to the shape of the population distribution from the above histogram.
Use 2 decimal places for all values.
Confidence interval Construction
Validate the Assumptions/Conditions to construct a 95% Confidence interval.
What is the symbol and value of the critical value for this interval? (You must interpolate.) Write the formula used and show work to calculate.
What is you 95% CI? Fill in the table to the right above.
Interpret your confidence interval in the context of this problem.
What does 95% Confidence mean?
46 |
45 |
54 |
53 |
60 |
68 |
70 |
54 |
44 |
43 |
66 |
60 |
54 |
71 |
43 |
48 |
41 |
63 |
47 |
54 |
48 |
66 |
52 |
58 |
55 |
49 |
50 |
71 |
73 |
54 |
41 |
56 |
58 |
53 |
50 |
49 |
42 |
71 |
47 |
50 |
44 |
66 |
74 |
69 |
63 |
59 |
59 |
55 |
62 |
48 |
Frequency and relative frequency distribution -
Class | Frequency | Relative frequency |
40-44 | 7 | 0.14 |
45-49 | 9 | 0.18 |
50-54 | 11 | 0.22 |
55-59 | 7 | 0.14 |
60-64 | 5 | 0.1 |
65-69 | 5 | 0.1 |
70-74 | 6 | 0.12 |
75-79 | 0 | 0 |
Frequencies for different cost groups
Most of the values are clustered to the right side of the distribution.
Hence, the distribution is slightly right (or positive) skewed.
As, we can see from the histogram, class 50 - 55 best estimates the center of the sample distribution
Mode of the sample is 54 as it occurs most (5) times
The sample size is 50, we can compare the sample distribution to population distribution.
A rule of thumb is as soon as the sample size reaches 30, the sample distribution starts taking the shape of population distribution.
Base on this rule of thumb, the population distribution is also slightly right skewed
Outlier detection -
Mean: | 55.5 |
SD: | 9.5 |
# of observations: | 50 |
Outlier detected? | No |
Significance level: | 0.05 (two-sided) |
Critical value of Z: | 1.96 |
(Critical value of z is looked up from student's z table)
Observation | Cost | Z | Significant Outlier? |
---|---|---|---|
1 | 46. | 1.01 | |
2 | 45. | 1.11 | |
3 | 54. | 0.16 | |
4 | 53. | 0.27 | |
5 | 60. | 0.47 | |
6 | 68. | 1.32 | |
7 | 70. | 1.53 | |
8 | 54. | 0.16 | |
9 | 44. | 1.22 | |
10 | 43. | 1.32 | |
11 | 66. | 1.11 | |
12 | 60. | 0.47 | |
13 | 54. | 0.16 | |
14 | 71. | 1.64 | |
15 | 43. | 1.32 | |
16 | 48. | 0.79 | |
17 | 41. | 1.53 | |
18 | 63. | 0.79 | |
19 | 47. | 0.90 | |
20 | 54. | 0.16 | |
21 | 48. | 0.79 | |
22 | 66. | 1.11 | |
23 | 52. | 0.37 | |
24 | 58. | 0.26 | |
25 | 55. | 0.05 | |
26 | 49. | 0.69 | |
27 | 50. | 0.58 | |
28 | 71. | 1.64 | |
29 | 73. | 1.85 | |
30 | 54. | 0.16 | |
31 | 41. | 1.53 | |
32 | 56. | 0.05 | |
33 | 58. | 0.26 | |
34 | 53. | 0.27 | |
35 | 50. | 0.58 | |
36 | 49. | 0.69 | |
37 | 42. | 1.43 | |
38 | 71. | 1.64 | |
39 | 47. | 0.90 | |
40 | 50. | 0.58 | |
41 | 44. | 1.22 | |
42 | 66. | 1.11 | |
43 | 74. | 1.95 | Furthest from the rest, but not a significant outlier (P > 0.05). |
44 | 69. | 1.42 | |
45 | 63. | 0.79 | |
46 | 59. | 0.37 | |
47 | 59. | 0.37 | |
48 | 55. | 0.05 | |
49 | 62. | 0.68 | |
50 | 48. | 0.79 |
Summary statistics -
Mean: | 55.5 |
SD: | 9.5 |
# of observations: | 50 |
Standard error | 1.35 |
Standard error = std dev / sqrt (sample size-1)
Point estimate | 55.5 |
Margin of error | 2.7 |
Lower bound | 52.8 |
Upper bound | 58.2 |
Where margin of error = t value * standard error
(t value for 49 df and 0.05 significance level for a two tailed hypothesis )
point estimate = mean
lower bound = point estimate - margin of error
upper bound = point estimate + margin of error
Hence, the 95% CI is
52.8 to 58.2
Interpretation of the 95% CI -
(95 sample means out of 100 are estimated to be within 52.8 to 58.2)
A 95% CI means -
95 estimates of the true population mean (sample mean) out of 100 estimates are estimated to lie between the calculated confidence interval
estimate the average age at which multiple sclerosis patients were diagnosed with the condition for the first time in a given city. How big should the sample be? Define your procedures for this estimate (if necessary, set your own values of unknown parameters, based on statistical theory). In Table 1 you will find all ages of this patient population. 54 58 56 48 62 59 55 56 60 52 53 61 56 56 53 37 71 62 39 61 54...
Problem 1: Confidence Interval for Percentage of B’s. The data set “STAT 250 Final Exam Scores” contains a random sample of 269 STAT 250 students’ final exam scores (maximum of 80) collected over the past two years. Answer the following questions using this data set. a) What proportion of students in our sample earned B’s on the final exam? A letter grade of B is obtained with a score of between 64 and 71 inclusive. Hint: You can do this...
NUMBER OF PEOPLE 10.2 10.0 10.1 8.5 10.2 8.2 8 Source: United States Census. 11. In the Sanitary District of Chicago, operating engineers are hired on of a competitive civil-service examination. In 1966, there were 223 appl for 15 jobs. The exam was held on March 12; the test scores are s arranged in increasing order. The height of each bar in the histogram next page) shows the number of people with the correspondin examiners were charged with rigging the...
(a). Construct a frequency distribution with the suitable class interval size of marks obtained by 50 students of a class, which are given below: 23, 50, 38, 42, 63, 75, 12, 33, 26, 39, 35, 47, 43, 52, 56, 59, 64, 77, 15, 21, 51, 54, 72, 68, 36, 65, 52, 60, 27, 34, 47, 48, 55, 58, 59, 62, 51, 48, 50, 41, 57, 65, 54, 43, 56, 44, 30, 46, 67, 53 (b). Find the Minimum Value (c)....
Problem #1: Consider the below matrix A, which you can copy and paste directly into Matlab. The matrix contains 3 columns. The first column consists of Test #1 marks, the second column is Test # 2 marks, and the third column is final exam marks for a large linear algebra course. Each row represents a particular student.A = [36 45 75 81 59 73 77 73 73 65 72 78 65 55 83 73 57 78 84 31 60 83...
Problem 4: Variables that may affect Grades The data set contains a random sample of STAT 250 Final Exam Scores out of 80 points. For each individual sampled, the time (in hours per week) that the student spent participating in a GMU club or sport and working for pay outside of GMU was recorded. Values of 0 indicate the students either does not participate in a club or sport or does not work a job for pay. The goal of...
For determining half-lives of radioactive isotopes, it is important to know what the background radiation is in a given detector over a specific period. The following data were taken in a y -ray detection experiment over 98 ten-second intervals: 58 50 57 58 64 63 54 64 59 41 43 56 60 50 46 59 54 60 59 60 67 52 65 63 55 61 68 58 63 36 42 54 58 54 40 60 64 56 61 51 48...
1) Arrange the following data into a frequency distribution table and explain the solution step-by-step : 72 65 64 60 72 48 41 90 75 53 48 63 49 58 60 39 75 55 62 53 59 58 39 38 62 60 59 68 60 70 72 60 56 80 65 85 71 45 70 ss a) Find measure central tendency. b) Find measure dispersion.
1. Descriptive Statistics Data 20 25 40 41 42 43 44 45 46 47 48 49 50 50 51 52 53 54 55 56 57 58 59 60 65 80 90 For the data above calculate and draw the following: a. A Frequency Diagram with 7 classes. b. A Histogram C. A Stem and Leaf Diagram. Don't forget the instruction. d. The Five Number Summary e. The Box and Whisker Plot which needs the IQR, the Lower Fence and Upper...
solvel only E F G H below is pat data 3. A hospital administrator wished to study the relation between patient satisfaction (Y) and patients age (Xi, in years), severity of illness (X2, an index) and anxiety level (X3, an index). The administrator randomly selected 23 patients and collected the data in pat, where larger values of Y, X2 and X3 are, respectively, associated with more satisfaction, increased severity of illness and more anxiety. The data is saved in Moodle2...