Question

In the following data set, the attribute ?????? is the class label. the information gains for...

In the following data set, the attribute ?????? is the class label. the information gains for attributes ??? and ??????, respectively. If you are going to split the data set into smaller partitions, which of the two would you choose? Explain your answer.

Info(D) = -5/11 log (5/11) – 6/11 log (6/11) = 2.04

Info Age (D)= 4/11 I (0,4) + 5/11 I (3,2) + 2/11 I (2,0) = 0.43

Gain (Age) = Info(D) - Info salary (D) = 2.04 - 0.43= 1.61

Info(D) = -5/11 log (5/11) – 6/11 log (6/11) = 2.04

Info ?????? (D)= 4/11 I (1,3) + 5/11 I (2,3) + 2/11 I (2,0) = 1

Gain (S?????) = Info(D) - Info salary (D)= 2.04 - 1= 1.04

which of the two would we choose? Explain your answer.

data mining

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Answer:

To spilt the data in the decision tree we use some measures.One of them is Information Gain.

The Information gain(gain) for Age attribute is 1.61

The Information gain(gain) for Salary attribute is 1.04

We will choose the attribute with Highest Information gain as decision node.

In this case,We choose the Age attribute to spilt the data because,The Information gain is high compared to Salary attribute.

Thank you..

Add a comment
Know the answer?
Add Answer to:
In the following data set, the attribute ?????? is the class label. the information gains for...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Use the Age data set and the GPA data set from the Class Survey spreadsheet to complete this proj...

    Use the Age data set and the GPA data set from the Class Survey spreadsheet to complete this project. (1 points) By submitting this completed project for grading, you are agreeing that you worked independently on this project and the work is yours. If you are found to have worked with anyone or copied another student's wo you will be given a zero on this project Question: Is there a relationship between a student's age(x), in years, and their Grade...

  • Consider the following data set which will be used for a binary classification problem where the ...

    Consider the following data set which will be used for a binary classification problem where the goal is to predict whether a house will sell within 6 months Sold Age Overrpriced Features Location 50 90 60 60 70 80 50 90 80 80 60 70 None Edinburgh None None Aberdeen GarageDundee Garage Edinburgh Pool Edinburgh] PoolInverness None Inverness Garage Edinburgh Garag PoolAberdeen Glagsow e Edinburgh Dundee 1. What is the initial entropy of the Sold variable? 2. If we classify...

  • The following table consists of training data from an employee database. The data have been generalized....

    The following table consists of training data from an employee database. The data have been generalized. For example, “31 . . . 35” for age represents the age range of 31 to 35. For a given row entry, count represents the number of data tuples having the values for department, status, age, and salary given in that row. department status age salary count sales senior 31. . . 35 46K. . . 50K 30 sales junior 26. . . 30...

  • 1. Decision trees As part of this question you will implement and compare the Information Gain,...

    1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits in decision tree construction.Let D= (x,y), D = n be a dataset with n samples. The entropy of the dataset is defined as H(D)= P(c|D)log2P(c|D), where P(CD) is the fraction of samples in class i. A split on an attribute of the form X, <c partitions the dataset into two subsets Dy and Dn based on...

  • 07. [Classification] Consider the following data set for a binary-class problem. [20] Customer ID Gender M...

    07. [Classification] Consider the following data set for a binary-class problem. [20] Customer ID Gender M Class CO CO M M M M Car Type Family Sports Sports Sports Sports Sports Sports Sports Sports Luxury Family Family Family Luxury Luxury Luxury Luxury Luxury Luxury Luxury Shirt Size Small Medium Medium Large Extra Large Extra Large Small Small Medium Large Large Extra Large Medium Extra Large Small Small Medium Medium Medium 888885555555555 Large 1. Compute the Gini index for the overall...

  • Consider the following data set. You are asked to predict the class label (if Stolen =...

    Consider the following data set. You are asked to predict the class label (if Stolen = Yes or No) for a test data point “X” where “X” = (color=Red, Type=SUV, Origin=Domestic) using the Naïve Bayes approach. Show every step and calculation details. Color Type Origin Stolen 1 Red Sports Domestic Yes 2 Red Sports Domestic Yes 3 Red Sports Domestic Yes 4 Yellow Sports Domestic No 5 Yellow Sports Import Yes 6 Yellow SUV Import No 7 Yellow SUV Import...

  • The following data set (columns C1 thru C4 of exam2.mtw) attempts to model the annual salary...

    The following data set (columns C1 thru C4 of exam2.mtw) attempts to model the annual salary (in thousands of dollars) of a random sample of n=20 high school mathematics teachers using years of teaching experience and geographic location (0= South, and 1=North). The data set is given below. The same data set and Minitab output is provided exam2.mtw.             Your model should first be built in order to answer all of the following research    questions. Make certain you have...

  • The following is a set of data from a sample of size n( n=7): 16 ,...

    The following is a set of data from a sample of size n( n=7): 16 , 7 , 2, 2 , 10, 11, -6. 1) Find the mean of the data set A) 7 B) 7.28 C) 6 D) 2 2) Find the mode of the data set A) 6 B) 2 C) 7 D) 7.28 3) Find the standard deviation of the data set A) 53 B) 7.28 C) 6.74 D) 318 4) The five-number summary of the data...

  • 19 A simple quantitative data set has been provided. Use limit grouping with a first class...

    19 A simple quantitative data set has been provided. Use limit grouping with a first class of 0-4 and a class width of 5 to complete parts (a) through (d) for this data set Practice Test 24 8 15 5 5 5 19 28 25 20 26 17 3 51 27 3 26 17 11 .Determine a frequency distribution Class 0-4 Frequeney . Ra (1 b. Obtain a relative-frequency distrbutlon Relative Frequency Class 0-4 Type integers or decimals. Do not...

  • The following data set (columns C1 thru C4 of exam2.mtw) attempts to model the annual salary...

    The following data set (columns C1 thru C4 of exam2.mtw) attempts to model the annual salary (in thousands of dollars) of a random sample of n=20 high school mathematics teachers using years of teaching experience and geographic location (0= South, and 1=North). The data set is given below. The same data set and Minitab output is provided exam2.mtw.             Your model should first be built in order to answer all of the following research    questions. Make certain you have...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT