PLEASE USE PYTHON
training error should strictly decrease as the degree of the hypothesis polynomials increases. That is because any high degree polynomial can "simulate" a lower degree polynomial by making it's high order coefficients zero. Thus nothing is lost and something might be gained by increasing the degree.
But the code below shows that in-sample error actually starts to increase on our dataset for polynomials of very high degree. Why do you think this happens?
CODE BELOW:
## Numerical error
xmin,xmax = 0,4*np.pi
x = np.linspace(xmin,xmax,1000)
D = 30
N = 100
shuff = np.random.permutation(len(x))
x_pts = np.array(sorted(x[shuff][:N]))
K = 200
train_vals = np.zeros(D*K).reshape(K,D)
test_vals = np.zeros(D*K).reshape(K,D)
noise = np.random.randn(N)
y = np.sin(x_pts)+ noise/7
for k in range(K):
shuff = np.random.permutation(len(x))
x_pts = np.array(sorted(x[shuff][:N]))
noise = np.random.randn(N)
y = np.sin(x_pts)+ noise/7
for i,deg in enumerate(range(D)):
X = np.ones(N*deg).reshape(N,deg)
for j in range(1,deg):
X[:,j] = x_pts**j
X_train,X_test,y_train,y_test = test_train_split(X,y,0.13)
w = linear_fit(X_train,y_train)
g_train = linear_predict(X_train,w)
g_test = linear_predict(X_test,w)
r_train = MAE(g_train,y_train)
r_test = MAE(g_test,y_test)
train_vals[k][i] = r_train
test_vals[k][i] = r_test
tr_vals = np.mean(train_vals,axis=0)
te_vals = np.mean(test_vals,axis=0)
plt.plot(range(D),tr_vals)
plt.title("In sample error rises due to numerical error")
plt.xlabel("Polynomial degree")
plt.ylabel("MAE")
#plt.axis([0,D,0,2])
plt.show()
#python code is as follows
import numpy as np
import sklearn.metrics
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
xmin,xmax = 0,4*np.pi
x = np.linspace(xmin,xmax,1000)
D = 30
N = 100
shuff = np.random.permutation(len(x))
x_pts = np.array(sorted(x[shuff][:N]))
K = 200
train_vals = np.zeros(D*K).reshape(K,D)
test_vals = np.zeros(D*K).reshape(K,D)
noise = np.random.randn(N)
y = np.sin(x_pts)+ noise/7
for k in range(K):
shuff = np.random.permutation(len(x))
x_pts = np.array(sorted(x[shuff][:N]))
noise = np.random.randn(N)
y_pts = np.sin(x_pts)+ noise/7
for i,deg in enumerate(range(D)):
X = np.ones(N*deg).reshape(N,deg)
for j in range(1,deg):
X[:,j] = x_pts**j
X_train,X_test,y_train,y_test =
train_test_split(X,y,test_size=0.13)
w = LinearRegression().fit(X_train,y_train)
g_train = w.predict(X_train)
g_test = w.predict(X_test)
r_train
=sklearn.metrics.mean_absolute_error(g_train,y_train)
r_test = sklearn.metrics.mean_absolute_error(g_test,y_test)
train_vals[k][i] = r_train
test_vals[k][i] = r_test
tr_vals = np.mean(train_vals,axis=0)
te_vals = np.mean(test_vals,axis=0)
plt.plot(range(D),tr_vals)
plt.title("In sample error rises due to numerical error")
plt.xlabel("Polynomial degree")
plt.ylabel("MAE")
#plt.axis([0,D,0,2])
plt.show()
This is the graph from the code above, we can see the Error increases sharply due at higher degrees
This what we get from y graph [ plt.plot(np.linspace(-1,1,100),y) ]
But the sudden change in the X component contribute to the abrupt change in error plt.plot(np.linspace(-1,1,100),X)
when you will fix it still you wont get the perfect answer due the over fitting in case of high power therefore you would have to try with some other model that move along sinusoidal wave not in the linear fashion ,the problem you are getting can be explained by this diagram.
PLEASE USE PYTHON training error should strictly decrease as the degree of the hypothesis polynomials increases....