In the following data set, the attribute ?????? is the class label. the information gains for attributes ??? and ??????, respectively. If you are going to split the data set into smaller partitions, which of the two would you choose? Explain your answer.
Info(D) = -5/11 log (5/11) – 6/11 log (6/11) = 2.04
Info Age (D)= 4/11 I (0,4) + 5/11 I (3,2) + 2/11 I (2,0) = 0.43
Gain (Age) = Info(D) - Info salary (D) = 2.04 - 0.43= 1.61
Info(D) = -5/11 log (5/11) – 6/11 log (6/11) = 2.04
Info ?????? (D)= 4/11 I (1,3) + 5/11 I (2,3) + 2/11 I (2,0) = 1
Gain (S?????) = Info(D) - Info salary (D)= 2.04 - 1= 1.04
which of the two would we choose? Explain your answer.
data mining
Answer:
To spilt the data in the decision tree we use some measures.One of them is Information Gain.
The Information gain(gain) for Age attribute is 1.61
The Information gain(gain) for Salary attribute is 1.04
We will choose the attribute with Highest Information gain as decision node.
In this case,We choose the Age attribute to spilt the data because,The Information gain is high compared to Salary attribute.
Thank you..
In the following data set, the attribute ?????? is the class label. the information gains for...
Use the Age data set and the GPA data set from the Class Survey spreadsheet to complete this project. (1 points) By submitting this completed project for grading, you are agreeing that you worked independently on this project and the work is yours. If you are found to have worked with anyone or copied another student's wo you will be given a zero on this project Question: Is there a relationship between a student's age(x), in years, and their Grade...
Consider the following data set which will be used for a binary classification problem where the goal is to predict whether a house will sell within 6 months Sold Age Overrpriced Features Location 50 90 60 60 70 80 50 90 80 80 60 70 None Edinburgh None None Aberdeen GarageDundee Garage Edinburgh Pool Edinburgh] PoolInverness None Inverness Garage Edinburgh Garag PoolAberdeen Glagsow e Edinburgh Dundee 1. What is the initial entropy of the Sold variable? 2. If we classify...
The following table consists of training data from an employee
database. The data
have been generalized. For example, “31 . . . 35” for age
represents the age range
of 31 to 35. For a given row entry, count represents the
number of data tuples
having the values for department, status, age, and salary
given in that row.
department status age salary count
sales senior 31. . . 35 46K. . . 50K 30
sales junior 26. . . 30...
1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits in decision tree construction.Let D= (x,y), D = n be a dataset with n samples. The entropy of the dataset is defined as H(D)= P(c|D)log2P(c|D), where P(CD) is the fraction of samples in class i. A split on an attribute of the form X, <c partitions the dataset into two subsets Dy and Dn based on...
07. [Classification] Consider the following data set for a binary-class problem. [20] Customer ID Gender M Class CO CO M M M M Car Type Family Sports Sports Sports Sports Sports Sports Sports Sports Luxury Family Family Family Luxury Luxury Luxury Luxury Luxury Luxury Luxury Shirt Size Small Medium Medium Large Extra Large Extra Large Small Small Medium Large Large Extra Large Medium Extra Large Small Small Medium Medium Medium 888885555555555 Large 1. Compute the Gini index for the overall...
Consider the following data set. You are asked to predict the class label (if Stolen = Yes or No) for a test data point “X” where “X” = (color=Red, Type=SUV, Origin=Domestic) using the Naïve Bayes approach. Show every step and calculation details. Color Type Origin Stolen 1 Red Sports Domestic Yes 2 Red Sports Domestic Yes 3 Red Sports Domestic Yes 4 Yellow Sports Domestic No 5 Yellow Sports Import Yes 6 Yellow SUV Import No 7 Yellow SUV Import...
The following data set (columns C1 thru C4 of exam2.mtw) attempts to model the annual salary (in thousands of dollars) of a random sample of n=20 high school mathematics teachers using years of teaching experience and geographic location (0= South, and 1=North). The data set is given below. The same data set and Minitab output is provided exam2.mtw. Your model should first be built in order to answer all of the following research questions. Make certain you have...
The following is a set of data from a sample of size n( n=7): 16 , 7 , 2, 2 , 10, 11, -6. 1) Find the mean of the data set A) 7 B) 7.28 C) 6 D) 2 2) Find the mode of the data set A) 6 B) 2 C) 7 D) 7.28 3) Find the standard deviation of the data set A) 53 B) 7.28 C) 6.74 D) 318 4) The five-number summary of the data...
19 A simple quantitative data set has been provided. Use limit grouping with a first class of 0-4 and a class width of 5 to complete parts (a) through (d) for this data set Practice Test 24 8 15 5 5 5 19 28 25 20 26 17 3 51 27 3 26 17 11 .Determine a frequency distribution Class 0-4 Frequeney . Ra (1 b. Obtain a relative-frequency distrbutlon Relative Frequency Class 0-4 Type integers or decimals. Do not...
The following data set (columns C1 thru C4 of exam2.mtw)
attempts to model the annual salary (in thousands of dollars) of a
random sample of n=20 high school mathematics teachers using years
of teaching experience and geographic location (0= South, and
1=North). The data set is given below. The same data set and
Minitab output is provided exam2.mtw.
Your model should first be built in order to answer all of the
following research questions. Make certain you have...