Question

You are to build trees with varying max tree depth for the dataset provided (use maximum tree depths 2-10). For each tree of a given maximum depth, record the accuracy, precision and recall. Plot each of these metrics as a line plot (tree depth on the x axis and % on the y axis). Below is what I have attempted.In [1]: import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.neighbors imtree_clf = DecisionTreeClassifier (criterion=entropy, max_depth=2) tree_clf.fit(X_train, y_train) y_pred = tree_clf.predict

0 0
Add a comment Improve this question Transcribed image text
Answer #1
#!/usr/bin/env python
# coding: utf-8

# In[61]:


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix,precision_score,recall_score,accuracy_score


# In[62]:


from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier


# In[63]:


df=pd.read_csv("D:/advanced_ML_amit/Activity Recognition from Single Chest-Mounted Accelerometer/1.csv",header=None)


# In[64]:


print(df.head())


# In[65]:


# now lets separate dependent and indipendent Variable.(X and Y)
X=df.iloc[:,0:4] # first four column
print(X.head())


# In[66]:


Y=df.iloc[:,4] # last column(label)
print(Y.head())


# In[67]:


# now lets splits the dataset into train and test set.

x_train,x_test,y_train,y_test=train_test_split(X,Y,random_state=42,test_size=0.2)


# a loop is running from 2 to 10 which will be the  max_depth of the function DecisionTreeClassifier() and we store the scores of it .

# In[68]:


result={}
pre_list=[]
re_list=[]
acc_list=[]
for m_depth in range(2,11):
    clf_tree=DecisionTreeClassifier(random_state=0,max_depth=m_depth)
    clf_fit=clf_tree.fit(x_train,y_train)
    predicted=clf_fit.predict(x_test)
    pres_score=precision_score(y_test,predicted,average='macro')
    re_score=recall_score(y_test,predicted,average='macro')
    acc_score=accuracy_score(y_test,predicted)
    result[m_depth]=[pres_score,re_score,acc_score]
    pre_list.append(pres_score)
    re_list.append(re_score)
    acc_list.append(acc_score)


# In[69]:


print(result)


# In[70]:


x=np.arange(2,11)
plt.plot(x,pre_list)
plt.plot(x,re_list)
plt.plot(x,acc_list)
plt.legend(['precision', 'recall', 'accuracy'], loc='lower right')

plt.savefig('scores_line_plot.png')
plt.show()


# In the plot You can see higher the value of max_depth better the result but it almost saturated after max_depth=8.
# 
# so the question is can we increase the value of max_depth further for more better result?
# 
# it is very subjective to answer but larger the value of max_depth , the complexcity is higher.
# and after some value of max_depth scores will be saaturated so we need not to increase value of max_depth after that.
# 
# lets have a look the confusion matrix when max_depth=10.
# 

# In[71]:


print(confusion_matrix(y_test,predicted))


# only 13 points are wrongly predicted where 12 points are wrongly predicted label as 3 where these are actually label 5.
# and 1 points is wrongly predicted as 3 where it is in label 6 actually

I written this code in jupyter notebook and then converted it in .py file.

and this is the output plot.

1.0 0.9 0.8 0.7 0.6 0.5 1.4 precision recall accuracy 2 النا 4 5 7 8 9 10

Add a comment
Know the answer?
Add Answer to:
You are to build trees with varying max tree depth for the dataset provided (use maximum...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT