Decision Tree

Question

Question

Decision Tree

You use a certain value for max_depth parameter in decision tree classifier? What do you mean by the impurity of a node in decision tree?

Machine-Learning

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Although a decision tree's theoretical maximum depth is one less than the number of training samples, no algorithm will allow you to reach this point for obvious reasons, one of which is overfitting. It is important to note that this is the number of training samples, not the number of features, because the data can be split multiple times on the same feature.

To begin, let us discuss the default none case. If no depth is specified for the tree, sci-kit-learn will expand the nodes until all leaves are pure, which means the leaf will only have labels if the min samples leaf parameter is set to default, which is one. Take note that the majority of these hyper-parameters are related, and we will discuss the min samples leaf shortly. On the other hand, if you specify a min samples split, as we will see next, the nodes are expanded until all leaves contain fewer than the specified minimum number of samples. Scikit-learn will choose one over the other based on which method provides the greatest depth for your tree. There are a lot of moving parts here, min samples split and min samples leaf, so let's start with max depth and see what happens to your model when you change it, so that after we go through min samples split and min samples leaf, we'll have a better sense of how everything fits together.

It is also detrimental to have a very shallow depth, as this will cause your model to under-fit. The best way to determine the optimal depth is to experiment, as overfitting and under-fitting are highly subjective to a dataset, and there is no one-size-fits-all solution. Thus, I usually let the model determine the max depth first, and then compare my train and test scores for overfitting or under-fitting, and adjust the max depth accordingly.

Impurity	Task	Formula	Description
Gini impurity	Classification	∑Ci=1fi(1−fi)∑i=1Cfi(1−fi)	fifi is the frequency of label ii at a node and CC is the number of unique labels.
Entropy	Classification	∑Ci=1−filog(fi)∑i=1C−filog(fi)	fifi is the frequency of label ii at a node and CC is the number of unique labels.
Variance	Regression	1N∑Ni=1(yi−μ)21N∑i=1N(yi−μ)2	yiyi is label for an instance, NN is the number of instances and μμ is the mean given by 1N∑Ni=1yi1N∑i=1Nyi.

The information gain is the difference between the impurity in the parent node and the weighted sum of the impurities in the two child nodes. If a split ss partitions the dataset DD of size NN into two datasets DleftDleft and DrightDright of sizes NleftNleft and NrightNright, respectively, the information gained is as follows:

IG(D,s)=Impurity(D)−NleftNImpurity(Dleft)−NrightNImpurity(Dright)

answered by: Zahidul Hossain

Add a comment

Answer 2

Decision Tree

Homework Answers

Add Answer to:
Decision Tree

Post as a guest

Earn Coins

Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...

Decision Trees and Random Forests (Programing language R) To predict room occupancy using the decision tree...

From the decision tree branch shown, determine the expected values of the two outcomes if decision D3 is already select...

Consider the following decision tree in which the probabilities are shown after each chance node and...

4. You must fit a single node decision tree to predict weight from height and gender...

A chance node on a decision tree has four possible outcomes each worth $100,000. The probabilities...

Decision Tree-Fall Break Shopping Trip Node Node Cost (3) Probabilities Value ($) Flal 12 $2,500.00 13...

Data Mining: Explain why decision tree algorithm based on impurity measures such as entropy and Gini...

There could be more than one decision node on a decision tree. a.) True b.) False

There is a unique decision tree T for quicksort on five element a_1, a_2, a_3, a_4,...

Decision Tree

Homework Answers

Add Answer to: Decision Tree

Post as a guest

Earn Coins

Add Answer to:
Decision Tree