Question

Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...

Can you give me a poste for Science Writing

TOPIC: DECISION TREE

Decision Tree Algorithm Pseudocode:-
1) Place the best attribute of the dataset at the root node of the tree.
2) Split the training set into subsets. Subsets should be make in such a way that each subset contains data with the same value for an attribute.
3) Repeat steps 1 and 2 on each subset until you find leaf nodes in all the branches of the tree.
Two features for using the selection of attribute:-
1) Information gain
2) gini index
In case of information gain, the more the gain value,it is suitable for selecting the attribute as the root node/internal node of the tree.
In case of the gini index. the less the gini value,it is suitable for selecting the attribute as the root node/internal node of the decision tree.

Examples:-

Solution:-

similarly for all the sub trees we calculate the gini index for each feature, 2) Example for ID3:-

similarly do calculations for all the features for the internal nodes,then we get final decision tree for this dataset

Formulas:-

0 0
Add a comment Improve this question Transcribed image text
Answer #1

From what I have understood you want a good example to explain decision tree, impurity algorithms. Here is an elaborative example.

Let us say we want to create a tree that uses chest pain, good blood circulation, and blocked artery status to predict whether or not a patient has heart disease. (Data is as shown in the table attached)
We have to decide which node will be at the top; in other words, we need to decide which node will become the root node. To do so, we have to calculate 'impurity.' Impurity is the state of actual results with false positives. To find contamination, we use the Gini index or Information Gain. Extending this example, we do it something like this.

Assumptions:
number of people with heart disease = x
number of people with no heart disease = y

Let us say that from our data we got the following results:

a) Making 'Chest Pain' as root:
if yes: x=105 and y=39
if no: x=34 and y=125
This means, out of all the people having chest pain, 105 have heart disease, whereas 39 do not. Also, out of all the people not having chest pain, 34 have heart disease, whereas 125 do not.

b) Similarly for making 'Good blood Circulation' as root:
if yes: x=37 and y=127
if no: x=100 and y=33

c) Making 'Blocked Arteries' as root:
if yes: x=92 and y=31
if no: x= 45 and y = 129


1) Gini Impurity:
Algorithm:

1) Calculate all of the Gini impurity scores.
2) If the node itself has the lowest score, then there is no point in separating the patients anymore, and it becomes a leave node.
3) If separating the data results in an improvement, then pick the separation with the lowest impurity value.

Formula:
GI = 1 - (probability of yes)2 - (probability of no)2

a) For chest pain:
For yes:
GI = 1 - (105/(105+39))2 - (39/(105+39))2
GI = 0.395
For no:
GI = 1 - (34/(34+125))2 - (125/(34+125))2
GI = 0.336

Total GI:
Note: In both side (yes and no) the number of patients is not equal. Thus, we take a weighted average.

TGI = ((Total of yes)/Total patients * GI of yes) + ((Total of no)/Total patients * GI of no)

TGI = (144/144+159)*0.395 + (159/144+159)*0.336
TGI = 0.364

b) Similarly, we calculate for good blood circulation:
TGI = 0.360

c) And for blocked arteries:
TGI = 0.381

Thus we find that for good blood circulation total Gini impurity is the least and therefore, we use it as the root node.

Note: Now the number of patients in each separated node is different, so the Gini impurity has to be calculated again for remaining features.

2) Information Gain.
Algorithm:

1) Calculate all of the gain scores.
2) If the node itself has the highest score, then there is no point in separating the patients anymore, and it becomes a leave node.
3) If separating the data results in an improvement, then pick the separation with the highest score value.

Formula:
(Base of the log is 2)

Entropy of class(Ce) = -(p/p+n) (log(p/p+n)) - (n/p+n) (log(n/p+n))
Information Gain of each attribute (IG) = -(p/p+n) (log(p/p+n)) - (n/p+n) (log(n/p+n))
Entropy of attribute (Ea) = Sum(Pi + Ni)/p+n (IG)
Gain = Ce - Ea

Ce = -139/(139+164) (log(139/139+164)) - 164/(139+164) (log(164/139+164))
Ce = 0.995 or Ce=1

a) For chest pain:
IG for yes:
IG = -105/(105+39) (log(105/105+39)) - 39/(105+39) (log(39/105+39))
IG = 0.842

IG for no:
IG = -34/(34+125) (log(34/(34+125))) - 125/(34+125) (log(125/34+125))
IG = 0.749

Ea = (105+39)/(303) * (0.842) + (125+34)/303 * 0.749
Ea = 0.794

Gain = Ce - Ea = 1 - 0.794
Gain = 0.206

Similarly, calculate for the other attributes and find the highest score.
In this case, the score of Blocked Arteries comes out to be highest, and thus, we make it the root node.

Note: Now the number of patients in each separated node is different, so the Gain has to be calculated again for remaining features.

Add a comment
Know the answer?
Add Answer to:
Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • 1. Decision trees As part of this question you will implement and compare the Information Gain,...

    1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits in decision tree construction.Let D= (x,y), D = n be a dataset with n samples. The entropy of the dataset is defined as H(D)= P(c|D)log2P(c|D), where P(CD) is the fraction of samples in class i. A split on an attribute of the form X, <c partitions the dataset into two subsets Dy and Dn based on...

  • Below is a example of a ID3 algorithm in Unity using C# im not sure how...

    Below is a example of a ID3 algorithm in Unity using C# im not sure how the ID3Example works in the whole thing can someone explain the whole thing in more detail please. i am trying to use it with this data set a txt file Alternates?:Bar?:Friday?:Hungry?:#Patrons:Price:Raining?:Reservations?:Type:EstWaitTime:WillWait? Yes:No:No:Yes:Some:$$$:No:Yes:French:0-10:True Yes:No:No:Yes:Full:$:No:No:Thai:30-60:False No:Yes:No:No:Some:$:No:No:Burger:0-10:True Yes:No:Yes:Yes:Full:$:Yes:No:Thai:10-30:True Yes:No:Yes:No:Full:$$$:No:Yes:French:>60:False No:Yes:No:Yes:Some:$$:Yes:Yes:Italian:0-10:True No:Yes:No:No:None:$:Yes:No:Burger:0-10:False No:No:No:Yes:Some:$$:Yes:Yes:Thai:0-10:True No:Yes:Yes:No:Full:$:Yes:No:Burger:>60:False Yes:Yes:Yes:Yes:Full:$$$:No:Yes:Italian:10-30:False No:No:No:No:None:$:No:No:Thai:0-10:False Yes:Yes:Yes:Yes:Full:$:No:No:Burger:30-60:True Learning to use decision trees We already learned the power and flexibility of decision trees for adding a decision-making component to...

  • Summary You will write an application to build a tree structure called Trie for a dictionary...

    Summary You will write an application to build a tree structure called Trie for a dictionary of English words, and use the Trie to generate completion lists for string searches. Trie Structure A Trie is a general tree, in that each node can have any number of children. It is used to store a dictionary (list) of words that can be searched on, in a manner that allows for efficient generation of completion lists. The word list is originally stored...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT