Question

Decision Tree

You use a certain value for max_depth parameter in decision tree classifier? What do you mean by the impurity of a node in decision tree? 


0 0
Add a comment Improve this question Transcribed image text
Answer #1

Although a decision tree's theoretical maximum depth is one less than the number of training samples, no algorithm will allow you to reach this point for obvious reasons, one of which is overfitting. It is important to note that this is the number of training samples, not the number of features, because the data can be split multiple times on the same feature.

 

To begin, let us discuss the default none case. If no depth is specified for the tree, sci-kit-learn will expand the nodes until all leaves are pure, which means the leaf will only have labels if the min samples leaf parameter is set to default, which is one. Take note that the majority of these hyper-parameters are related, and we will discuss the min samples leaf shortly. On the other hand, if you specify a min samples split, as we will see next, the nodes are expanded until all leaves contain fewer than the specified minimum number of samples. Scikit-learn will choose one over the other based on which method provides the greatest depth for your tree. There are a lot of moving parts here, min samples split and min samples leaf, so let's start with max depth and see what happens to your model when you change it, so that after we go through min samples split and min samples leaf, we'll have a better sense of how everything fits together.

 

It is also detrimental to have a very shallow depth, as this will cause your model to under-fit. The best way to determine the optimal depth is to experiment, as overfitting and under-fitting are highly subjective to a dataset, and there is no one-size-fits-all solution. Thus, I usually let the model determine the max depth first, and then compare my train and test scores for overfitting or under-fitting, and adjust the max depth accordingly.

Impurity

Task

Formula

Description

Gini impurity

Classification

∑Ci=1fi(1−fi)∑i=1Cfi(1−fi)

fifi is the frequency of label   ii at a node and CC is the number of unique labels.

Entropy

Classification

∑Ci=1−filog(fi)∑i=1C−filog(fi)

fifi is the frequency of label   ii at a node and CC is the number of unique labels.

Variance

Regression

1N∑Ni=1(yi−μ)21N∑i=1N(yi−μ)2

yiyi is label for an instance,   NN is the number of instances and μμ is the mean given by 1N∑Ni=1yi1N∑i=1Nyi.

The information gain is the difference between the impurity in the parent node and the weighted sum of the impurities in the two child nodes. If a split ss partitions the dataset DD of size NN into two datasets DleftDleft and DrightDright of sizes NleftNleft and NrightNright, respectively, the information gained is as follows:

IG(D,s)=Impurity(D)−NleftNImpurity(Dleft)−NrightNImpurity(Dright)

 


answered by: Zahidul Hossain
Add a comment
Know the answer?
Add Answer to:
Decision Tree
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT