Question

Suppose that the data for analysis includes the attribute age. The age values for the data...

Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.

a.Use smoothing by bin means to smooth these data, using a bin depth of 3. Illustrate your steps. Comment on the effect of this technique for the given data?

b.How might you determine outliers in the data?

c.What other methods are there for data smoothing?

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Answers:

Given data contains attribute age values. The given age values for the data tuples are

13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,35,36,40,45,46,52,70.

The total number of age values of tuples is 27.

a).Use smoothing by bin means to smooth these data, using a bin depth of 3. Illustrate your steps. Comment on the effect of this technique for the given data?

Answer:

Smoothing by bin means: In this method each value in a bin is replaced by the mean value of a bin.

Given depth of bin = 3.

We have to smoothing the given data using bin means method.

The following steps are used to smooth the given data. The given bin depth is 3.

Steps:

Step 1: Sort the data (the given data is already sorted ).

13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,35,36,40,45,46,52,70.

Step 2: Partitioning the given data into equal frequency of bin size i.e. 3.

Total number of bins = number of tuples / depth of bin

                                = 27 / 3

                                = 9

Bin 1 : 13,15,16                                 Bin 2 : 16,19,20                                 Bin 3 : 20,21,22                    

Bin 4 : 22,25,25                                 Bin 5 : 25,25,30                                 Bin 6 : 33,33,35                    

Bin 7 : 35,35,35                                 Bin 8 : 36,40,45                                 Bin 9 : 46,52,70

Step 3: Calculate the arithmetic mean of each bin.

Calculating smoothed values for bins

Bin 1 = (13+15+16)/3 = 44/3 = 14.67

Bin 2 = (16+19+20)/3 = 55/3 = 18.33        

Bin 3 = (20+21+22)/3 = 63/3 = 21  

Bin 4 = (22+25+25)/3 = 72/3 = 24

Bin 5 = (25+25+30)/3 = 80/3 = 26.67       

Bin 6 = (33+33+35)/3 = 101/3 = 33.67

Bin 7 = (35+35+35)/3 = 105/3 = 35

Bin 8 = (36+40+45)/3 = 121/3 = 40.33

Bin 9 = (46+52+70)/3 = 168/3 = 56

Step 4: Replace the each in each bin by the arithmetic mean of calculated for the bin.

Bin

Values

Smoothed values

1

13,15,16

14.67,14.67,14.67

2

16,19,20

18.33,18.33,18.33

3

20,21,22

21,21,21

4

22,25,25

24,24,24

5

25,25,30

26.67,26.67,26.67

6

33,33,35

33.67,33.67,33.67

7

35,35,35

35,35,35

8

36,40,45

40.33,40.33,40.33

9

46,52,70

56,56,56

b. How might you determine outliers in the data?

Determining the outliers in the data:

Outlier is defined as it as an observation point that is distant from other observations.

Outliers in the data can be identified in many ways:

  1. The given data can be clustering the data into groups. Here any data do not fall in any group can be taken as a outliers.
  2. Any data points that are deviate (based on some threshold value) from the model can be considered as a outliers.
  3. By dividing the given data into equal width of histograms and identifying the outlying histograms.

c. What other methods are there for data smoothing?

The other methods for the data smoothing is:

Smoothing by bin boundaries and Smoothing by bin medians.

Smoothing by bin boundaries

In this method we need to pick the minimum and maximum value, put the minimum value in left side and maximum value on the right side. Middle values in bin boundaries are move to its closet neighbor value with less distance.

Example:

Bin: 8, 9, 15, 17

Here 8 is the minimum value and 17 is the maximum value. 9 is the nearer to 8 and 15 is nearer to 17.so 9 is treated as 8 and 15 as treated as 17.

After bin boundary bin is    Bin: 8, 8, 17, 17.

Methods other than binning are regression techniques and classification techniques.

Regression techniques are used to smooth the data by fitting into a function such as linear or multiple regressions.

Classification techniques are used to be implementing the concept hierarchies that can smooth the data by rolling up lower level concepts to higher level concepts.

Add a comment
Know the answer?
Add Answer to:
Suppose that the data for analysis includes the attribute age. The age values for the data...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Q2: Suppose that a collection of data was given to you in order to provide an...

    Q2: Suppose that a collection of data was given to you in order to provide an initial analysis. However, you noticed that the data given includes the attribute age. The age values for the data tuples are listed in an ascending order as follows: Q2: Suppose that a collection of data was given to you in order to provide an initial analysis. However, you noticed that the data given includes the attribute age. The age values for the data tuples...

  • Suppose the following data set gives the age a person first receives a winning lottery ticket....

    Suppose the following data set gives the age a person first receives a winning lottery ticket. Use JMP to answer the questions below. Use JMP to answer the questions below. 35 43 16 30 20 32 25 26 32 30 18 35 21 27 34 5 38 31 38   41 15 22 28 38 a. find P15* b. in the context of this problem, interpret P15* c. find the quartiles d. find the interquartile range e. interpret the interquartile range...

  • The following table consists of training data from an employee database. The data have been generalized....

    The following table consists of training data from an employee database. The data have been generalized. For example, “31 . . . 35” for age represents the age range of 31 to 35. For a given row entry, count represents the number of data tuples having the values for department, status, age, and salary given in that row. department status age salary count sales senior 31. . . 35 46K. . . 50K 30 sales junior 26. . . 30...

  • Obs # Age Obs # Age Obs # Age Obs # Age Obs # Age 1...

    Obs # Age Obs # Age Obs # Age Obs # Age Obs # Age 1 2019 11 2019 21 1976 31 2019 41 2006 2 1998 12 2019 22 2013 32 2018 42 2013 3 2019 13 2019 23 2019 33 2019 43 1982 4 1995 14 1980 24 1994 34 1997 44 2019 5 2018 15 2019 25 1979 35 2015 45 1988 6 2011 16 2016 26 1974 36 2019 46 2019 7 1974 17 1998 27...

  • Obs # Age Obs # Age Obs # Age Obs # Age Obs # Age 1...

    Obs # Age Obs # Age Obs # Age Obs # Age Obs # Age 1 2019 11 2019 21 1976 31 2019 41 2006 2 1998 12 2019 22 2013 32 2018 42 2013 3 2019 13 2019 23 2019 33 2019 43 1982 4 1995 14 1980 24 1994 34 1997 44 2019 5 2018 15 2019 25 1979 35 2015 45 1988 6 2011 16 2016 26 1974 36 2019 46 2019 7 1974 17 1998 27...

  • Provide an appropriate response. Use the data to identify any outliers. 35 40 54 65 67...

    Provide an appropriate response. Use the data to identify any outliers. 35 40 54 65 67 69 71 73 74 76 80 82 87 90 99 O 35 OO 35,99 None QUESTI 4.5 points Save Answer Provide an appropriate response. Use the data to identify any outliers. 15 18 18 19 22 23 24 24 24 24 25 26 26 27 28 28 30 32 33 40 42 42 15,42 40, 42 None 000 QUESTI 4.5 points Save Answer Provide...

  • The following data relates to the age of 10 employees and the number of days which...

    The following data relates to the age of 10 employees and the number of days which they reported sick in a month: Age   Sick days 20   11 30   12 32   10 35   13 40   14 46   16 52   15 55   17 58   18 62   19 Required: i) Calculate the Karl Pearson’s coefficient of correlation                   (5 marks) ii) What observation can you make from (i) above and interpret the results       (4 marks)

  • The accompanying table provides data for the sex, age, and weight of bears. For sex, let...

    The accompanying table provides data for the sex, age, and weight of bears. For sex, let 0 represent female and let 1 represent male. Letting the response (y) varieble represent weight, use the dummy variable of sex and the variable of age and to find the multiple regression equation, Use the equation to find the predicted weight of a bear with the characteristics given below. Does sex appear to have much of an effect on the weight of a bear?...

  • Obs # Age Obs # Age Obs # Age Obs # Age Obs # Age 1...

    Obs # Age Obs # Age Obs # Age Obs # Age Obs # Age 1 2019 11 2019 21 1976 31 2019 41 2006 2 1998 12 2019 22 2013 32 2018 42 2013 3 2019 13 2019 23 2019 33 2019 43 1982 4 1995 14 1980 24 1994 34 1997 44 2019 5 2018 15 2019 25 1979 35 2015 45 1988 6 2011 16 2016 26 1974 36 2019 46 2019 7 1974 17 1998 27...

  • Do the Academy Awards involve discrimination based on age? Listed below are the ages of actresses...

    Do the Academy Awards involve discrimination based on age? Listed below are the ages of actresses and actors at the times that they won Oscars in the Best Actress and Best Actor categories. the ages are listed in order, beginning with the first Academy Awards ceremony in 1928. (Note: in 1968 there was a tie in the Best Actress category, and the mean of the two ages is used; in 1932 there was a tie in the Best Actor category,...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT