You are to build trees with varying max tree depth for the dataset provided (use maximum tree depths 2-10). For each tree of a given maximum depth, record the accuracy, precision and recall. Plot each of these metrics as a line plot (tree depth on the x axis and % on the y axis). Below is what I have attempted.
#!/usr/bin/env python
# coding: utf-8
# In[61]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix,precision_score,recall_score,accuracy_score
# In[62]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
# In[63]:
df=pd.read_csv("D:/advanced_ML_amit/Activity Recognition from Single Chest-Mounted Accelerometer/1.csv",header=None)
# In[64]:
print(df.head())
# In[65]:
# now lets separate dependent and indipendent Variable.(X and Y)
X=df.iloc[:,0:4] # first four column
print(X.head())
# In[66]:
Y=df.iloc[:,4] # last column(label)
print(Y.head())
# In[67]:
# now lets splits the dataset into train and test set.
x_train,x_test,y_train,y_test=train_test_split(X,Y,random_state=42,test_size=0.2)
# a loop is running from 2 to 10 which will be the max_depth of the function DecisionTreeClassifier() and we store the scores of it .
# In[68]:
result={}
pre_list=[]
re_list=[]
acc_list=[]
for m_depth in range(2,11):
clf_tree=DecisionTreeClassifier(random_state=0,max_depth=m_depth)
clf_fit=clf_tree.fit(x_train,y_train)
predicted=clf_fit.predict(x_test)
pres_score=precision_score(y_test,predicted,average='macro')
re_score=recall_score(y_test,predicted,average='macro')
acc_score=accuracy_score(y_test,predicted)
result[m_depth]=[pres_score,re_score,acc_score]
pre_list.append(pres_score)
re_list.append(re_score)
acc_list.append(acc_score)
# In[69]:
print(result)
# In[70]:
x=np.arange(2,11)
plt.plot(x,pre_list)
plt.plot(x,re_list)
plt.plot(x,acc_list)
plt.legend(['precision', 'recall', 'accuracy'], loc='lower right')
plt.savefig('scores_line_plot.png')
plt.show()
# In the plot You can see higher the value of max_depth better the result but it almost saturated after max_depth=8.
#
# so the question is can we increase the value of max_depth further for more better result?
#
# it is very subjective to answer but larger the value of max_depth , the complexcity is higher.
# and after some value of max_depth scores will be saaturated so we need not to increase value of max_depth after that.
#
# lets have a look the confusion matrix when max_depth=10.
#
# In[71]:
print(confusion_matrix(y_test,predicted))
# only 13 points are wrongly predicted where 12 points are wrongly predicted label as 3 where these are actually label 5.
# and 1 points is wrongly predicted as 3 where it is in label 6 actually
I written this code in jupyter notebook and then converted it in .py file.
and this is the output plot.
You are to build trees with varying max tree depth for the dataset provided (use maximum...