Question

Given the following six instances each with five attributes (Outlook, Temperature, Humidity, Wind, Day) and one class label,

0 0
Add a comment Improve this question Transcribed image text
Answer #1

a. The class attribute has two instances namely Yes and No. So, the entropy of the whole system is:

Entropy of the system = -p/(p+n) X log2 [p/(p+n)] - n/(p+n) X log2 [n/(p+n)]

where p is the number of positive(yes) in class and n is the number of negative(no) in class.

So, in the above dataset, we have p = 3 and n = 3

Putting the values of p and n in the above equation:

= -p/(p+n) X log2 [p/(p+n)] - n/(p+n) X log2 [n/(p+n)]

= -3/(3+3) X log2 [3/(3+3)] -3/(3+3) X log2 [3/(3+3)]

On calculating we get,

Entropy of class attribute or the whole system = 1

Information Gain I(pi,ni) formula is same as formula of entropy given above = -p/(p+n) X log2 [p/(p+n)] - n/(p+n) X log2 [n/(p+n)]

b. Information Gain For attribute Outlook:

In Outlook there are three instances namely Sunny, Overcast, and Rain. So we will calculate total number of Yes(p) and total number of No(n) for Outlook = Sunny. Similarly, calculate total number of Yes and total number of No for Outlook = Overcast and for Outlook = Rain.

Information Gain of Outlook if it is Sunny = -pi/(pi+ni) X log2 [pi/(pi+ni)] - ni/(pi+ni) X log2 [n/(pi+ni)]

Here pi=1  and ni=2 if Outlook is Sunny.

So putting in the formula:

Information Gain of Outlook if it is Sunny = -1/(1+2) X log2 [1/(1+2)] - 2/(1+2) X log2 [2/(1+2)]

= -1/3 X log2 [1/3)] - 2/(3) X log2 [2/(3)]

= -0.333 X -1.586 - 0.666 X -0.586

= 0.528 + 0.390

= 0.918

Similarly, calculate total number of Yes and total number of No for Outlook = Overcast

Information Gain of Outlook if it is Overcast = -pi/(pi+ni) X log2 [pi/(pi+ni)] - ni/(pi+ni) X log2 [n/(pi+ni)]

Here pi=1  and ni=1 if Outlook is Overcast.

So putting in the formula:

Information Gain of Outlook if it is Overcast= -1/(1+1) X log2 [1/(1+1)] - 1/(1+1) X log2 [1/(1+1)]

= 1

Similarly, calculate total number of Yes and total number of No for Outlook = Rain

Information Gain of Outlook if it is Rain= -pi/(pi+ni) X log2 [pi/(pi+ni)] - ni/(pi+ni) X log2 [n/(pi+ni)]

Here pi=1  and ni=0 if Outlook is Rain.

So putting in the formula:

Information Gain of Outlook if it is Rain= -1/(1+0) X log2 [1/(1+0)] - 0/(1+0) X log2 [0/(1+0)]

= 0

So, information gain for attribute Outlook is as follows:

Outlook Yes(pi) No(ni) Information Gain of Outlook
Sunny 1 2 0.918
Overcast 1 1 1
Rain 1 0 0

c. Gini Index For attribute Outlook:

Gini Index for a particular value of an attribute:

= 1 - (Probability of positive samples)2 - (Probability of negative samples)2

In Outlook as we already know that there are three instances namely Sunny, Overcast, and Rain. So we will calculate total number of Yes(p) and total number of No(n) for Outlook = Sunny. Similarly, calculate total number of Yes and total number of No for Outlook = Overcast and for Outlook = Rain.

Gini Index of (Outlook = Sunny) = 1 - (Probability of positive samples)2 - (Probability of negative samples)2

Here pi=1  and ni=2 if Outlook is Sunny.

So putting in the formula:

Gini Index of (Outlook = Sunny) = 1 - (1/3)2 - (2/3)2 = 0.444

Gini Index of (Outlook = Overcast) = 1 - (Probability of positive samples)2 - (Probability of negative samples)2

Here pi=1  and ni=1 if Outlook is Overcast.

So putting in the formula:

Gini Index of (Outlook = Overcast)= 1 - (1/2)2 - (1/2)2 = 0.5

Gini Index of (Outlook = Rain) = 1 - (Probability of positive samples)2 - (Probability of negative samples)2

Here pi=1  and ni=0 if Outlook is Rain.

So putting in the formula:

Gini Index of (Outlook = Rain)= 1 - (1/1)2 - (0/1)2 = 0

Outlook Yes(pi) No(ni) Gini Index of particular value of Outlook
Sunny 1 2 0.444
Overcast 1 1 0.5
Rain 1 0 0

Now, Gini Index of attribute Outlook = Weighted Sum of Gini Indexes of values of attribute Outlook

= (probability of total Instances of Outlook=Sunny in dataset)* Gini Index of (Outlook=Sunny) + (probability of total Instances of Outlook=Overcast in dataset)* Gini Index of (Outlook=Overcast) + (probability of total Instances of Outlook=Rain in dataset)* Gini Index of (Outlook=Rain)

= (3/6)*0.444 + (2/6)*0.5 + (1/6)*0

= 0.222 + 0.166

Gini Index of attribute Outlook = 0.388

d. Information Gain and Gini Index of attribute Day:

As we can see in the dataset that the attribute Day also has 6 different instances such as Monday, Tuesday....Saturday here if we will try to calculate the information gain, then either pi or ni will become 0 for a particular day because only Yes or NO is there for a single day. So information gain and Gini Index both will become 0. I will show you one example of this by solving for Day=Monday and Day=Thursday

Calculate total number of Yes and total number of No for Day = Monday

Information Gain of (Day=Monday) = -pi/(pi+ni) X log2 [pi/(pi+ni)] - ni/(pi+ni) X log2 [n/(pi+ni)]

Here pi=0 and ni=1 if Day is Monday.

So putting in the formula:

Information Gain of (Day=Monday) = -0/(0+1) X log2 [0/(0+1)] - 1/(0+1) X log2 [1/(0+1)]

Information Gain of (Day=Monday) = 0

Calculate total number of Yes and total number of No for Day = Thursday

Information Gain of (Day=Thursday) = -pi/(pi+ni) X log2 [pi/(pi+ni)] - ni/(pi+ni) X log2 [n/(pi+ni)]

Here pi=1 and ni=0 if Day is Thursday.

So putting in the formula:

Information Gain of (Day=Thursday) = -1/(1+0) X log2 [1/(1+0)] - 0/(1+0) X log2 [0/(1+0)]

Information Gain of (Day=Thursday) = 0

Similarly, for all other days, the information gain will be 0 and hence Information Gain for attribute Day = 0.

Also,

Gini Index For attribute Day:

Gini Index for a particular value of an attribute:

= 1 - (Probability of positive samples)2 - (Probability of negative samples)2

In Day as we already know that there are 6 different instances, so we will calculate the total number of Yes(p) and the total number of No(n) for Day= Monday.

Gini Index of (Outlook = Sunny) = 1 - (Probability of positive samples)2 - (Probability of negative samples)2

Here pi=0 and ni=1 if Day=Monday

So putting in the formula:

Information Gain of (Day= Monday) = 1 - (0/1)2 - (1/1)2 = 0

Similarly, for Day=Thursday:

Here pi=1 and ni=0

So putting in the formula:

Gini Index of (Day= Monday) = 1 - (1/1)2 - (0/1)2 = 0

So similarly, Gini Index for all days = 0 and hence

Therefore, Gini Index Of attribute Day = 0

e. Information Gain refers to the information which is learnt by the model to classify the instances, but as Day has 0 Information Gain, our model will not be able to learn how to correctly identify different instances. So, as already seen above, Day is not a good attribute or feature being used as the root node of a decision tree because it has 0 Information Gain and 0 Gini Index. Whereas, while constructing a decision tree, we should always select and attribute or feature with the Highest Information Gain but here Day has the lowest.

To avoid using Day as the root node to create the decision tree, we can either remove it from the data set completely or ignoring it while calculations. Also, we can use pruning that is removing nodes from decision tree which help nothing or very little while classifying different instances. Pruning will only be done if by chance a person has selected Day as the root node. But in this case, it is preferred to completely remove the attribute Day from the dataset to avoid using it as the root node for the decision tree.

Thanks!

Add a comment
Know the answer?
Add Answer to:
Given the following six instances each with five attributes (Outlook, Temperature, Humidity, Wind, Day) and one...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • . Question 3 (5 pts]: Given the following six instances cach with five attributes (Outlook, Temperature,...

    . Question 3 (5 pts]: Given the following six instances cach with five attributes (Outlook, Temperature, Humidity, Wind, Day) and one class label, calculate Entropy of the whole system [1 pt] • calculate Information gain for attribute "Outlook" [1 pt] • calculate Gini-index for attribute "Outlook" [1 pt] What is the information gain and Gini-index for attribute "Day" [pt] • Explain why "Day" is NOT a good feature being used as the root node of a decision tree. How to...

  • An example for Playing Tennis in machine learning: Attributes are Outlook, Temperature, Humidity and Wind. Data...

    An example for Playing Tennis in machine learning: Attributes are Outlook, Temperature, Humidity and Wind. Data set and some statistics are already calculated below: Day Outlook Temperature Humidity Wind Mlay Tennis DI Sunny D2 Sunny D3 Ov ercast D4 Rain DS Rain D6 Rain D7 Overast DS Sunny D9 Sunny DIO Rain DI Sunny DI2 Overcast D13 Ovencast D1411 Rain Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild High Weak High Strong No High Weak ligh...

  • 4. Reconsider the tennis playing training examples, if B Bayesian Belief Network depicting the conditional independence...

    4. Reconsider the tennis playing training examples, if B Bayesian Belief Network depicting the conditional independence relationship between the attributes and target classification are shown as follows: Day Outlook Temperature Humidity Wind Play Tennis gh Weak No Sunny Ho gh Sunny Ho Strong No gh Weak Yes Overcast Ho Mild gh Weak Yes Rain Normal Weak Yes Rain Coo Normal Rain Coo Strong No Normal Strong Yes Overcast Cool Mild gh Weak No Sunny Normal Weak Yes 9 Sunny Coo...

  • Question 5 [3 pts]: In the dataset showing in Table 1, please use Gini Index to...

    Question 5 [3 pts]: In the dataset showing in Table 1, please use Gini Index to calculate the correlation between each of the four attributes (outlook, temperature, humidity, wind) to the Class label, respectively [2 pts]. Please rank and select the most important attribute to build the root node of the decision tree [1 pt] 91N

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT