Question

Which of these is NOT true about the evaluation of decision tree performance? (A) Performance on the training dataset ca...

Which of these is NOT true about the evaluation of decision tree performance?
(A) Performance on the training dataset can overestimate performance on future data

(B) Decision trees sometimes overfit the training data

(C) Creating a test dataset simulates the model's performance on unseen data

(D) The model's accuracy is unaffected by the rarity of the outcome.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Please don't hesitate to give a "thumbs up" for the answer in case the answer has helped you

A, B and C are correct statements

D - is true. The rarer the event, the more likely the event' representation is captured / taken care off by the statistical model. If the model is not able to find the required representation then the model' accuracy is effected by the rarity of the outcome.

If there are many of these rare events, then the model' accuracy goes for a toss

D is correct

Add a comment
Know the answer?
Add Answer to:
Which of these is NOT true about the evaluation of decision tree performance? (A) Performance on the training dataset ca...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Decision Trees and Random Forests (Programing language R) To predict room occupancy using the decision tree...

    Decision Trees and Random Forests (Programing language R) To predict room occupancy using the decision tree classification algorithm. (a) Load the room occupancy data and train a decision tree classifier. Evaluate the predictive performance by reporting the accuracy obtained on the testing dataset. (b) Output and analyse the tree learned by the decision tree algorithm, i.e. plot the tree structure and make a discussion about it. (c) Train a random forests classifier, and evaluate the predictive performance by reporting the...

  • The key purpose of splitting the dataset into training and test sets is A) To speed...

    The key purpose of splitting the dataset into training and test sets is A) To speed up the training process 8) To reduce the amount of labelled data needed for evaluating classifier accuracy C) To reduce the number of features we need to consider as input to the learning algorithm D) To estimate how well the learned model will generalize to new/unseen data 3- k-NN algorithm can be used for A) Regression B) Classification C) Both A and B D)...

  • 3. Consider a labeled data set containing 100 data instances which are randomly partitioned into two...

    3. Consider a labeled data set containing 100 data instances which are randomly partitioned into two sets A and B, each containing 50 instances. We use A as the training set to learn two decision trees T10 with 10 leaf nodes and T100 with 100 leaf nodes. The accuracies of the two decision trees on data sets A and B are shown below: Data Set T100 А. T10 0.86 0.84 B 0.97 0.77 (a) Based on the accuracies shown in...

  • Performance Metrics: Which of the following are terms used for performance metrics a. Specificity & Precision...

    Performance Metrics: Which of the following are terms used for performance metrics a. Specificity & Precision b. Precision & Recall c. Recall & Sensitivity d. band e All of the above 9. Performance Metrics: When looking at the ROC/AUC curve, what are the values being compared represented on the x-axis and y-axis? a. False Positive Rate and True Positive Rate b. Precision and True Positive Rate c. False Positive Rate and Precision d. True Positive Rate and Specificity e. None...

  • will give thumbs up to 3/5 answers to question Select all reasonable methods for handling local...

    will give thumbs up to 3/5 answers to question Select all reasonable methods for handling local minima when training an ANN (Artificial Neural Networks): restart the training several times from the same initial state use simulated annealing perturb the weight matrix slightly and continue the training use a momentum term use full gradient descent add an additional hidden layer Select all that are true in regard to the hidden units of a fully-connected ANN: unlike decision tree nodes, ANN nodes...

  • Which statement is NOT true about the histogram? a. The histogram is always a normal distribution....

    Which statement is NOT true about the histogram? a. The histogram is always a normal distribution. b. The histogram can show the shape of your data set. c. The histogram is a visual picture of variability. d. The histogram is sometimes referred to as a frequency distribution.

  • 9. Which of the following is a TRUE statement about how students perceive marijuana use? Most...

    9. Which of the following is a TRUE statement about how students perceive marijuana use? Most students tend to overestimate the number of their peers who consume marijuana Most students tend to believe that females smoke more marijuana than males Most students tend to underestimate the number of their peers who consume marijunana O Most students tend to believe marijuana use is mostly something used by student athletes 10. What impact can marijuana have on the user's performance in school?...

  • Can you please help with the below? 1)   Which of the following is true about using...

    Can you please help with the below? 1)   Which of the following is true about using a 2-3-4 tree? a.   It is designed to minimize node visits while keeping to an O(log n) search performance b.   It is designed to self-balance as new values are inserted into the tree c.   As soon as a node becomes full, it performs the split routine d.   None of the above 2)   Which of the following is true about a binary search tree? a.  ...

  • 1. Decision trees As part of this question you will implement and compare the Information Gain,...

    1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits in decision tree construction.Let D= (x,y), D = n be a dataset with n samples. The entropy of the dataset is defined as H(D)= P(c|D)log2P(c|D), where P(CD) is the fraction of samples in class i. A split on an attribute of the form X, <c partitions the dataset into two subsets Dy and Dn based on...

  • is the second level of conformity, in which the motive to conform is to please or...

    is the second level of conformity, in which the motive to conform is to please or be like others. O Reward dependence O Reward interdependence Information independence Identification A TV station employs two people to co-anchor the news - Bill and Jennifer. According to a recent review survey, Jennifer is unpopular because she is unattractive; the man is considered attractive and is quite popular. The station fires Jennifer because she is so unpopular; they are still looking for a replacement....

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT