STATISTICS PROBABILITY AND CODE IN PYTHON TO PLOT THE GRAPH.
First creating the data frame woth the help of below code in Pandas in Python:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(columns = ["year", "gdp"])
year = [1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000,
2010]
gdp = [1.015, 1.33, 2.29, 3.26, 4.951, 6.759, 9.366, 13.131,
15.599]
df["year"] = year
df["gdp"] = gdp
Dataframe looks like:
year | gdp | |
---|---|---|
0 | 1930 | 1.015 |
1 | 1940 | 1.330 |
2 | 1950 | 2.290 |
3 | 1960 | 3.260 |
4 | 1970 | 4.951 |
5 | 1980 | 6.759 |
6 | 1990 | 9.366 |
7 | 2000 | 13.131 |
8 | 2010 | 15.599 |
1.) Plotting the above graph:
plt.plot( 'year', 'gdp', data=df, linestyle='-', marker='o')
plt.xlabel("Year")
plt.ylabel("GDP")
plt.title("Real US GDP(in trillions)")
plt.show()
2.)
Finding Mathematical relationship between year and gdp
Let's Check Correlation between Year and GDP first to check if there exists any possible linear relationship.
df["year"].corr(df["gdp"])
O/P = 0.96
The correlation is very high showing that it we can find and plot the linear relation between these 2 variables.
Let's fit Linear regression model on above data in python.
from sklearn import linear_model
model = linear_model.LinearRegression(normalize= True)
model.fit(df[["year"]], df["gdp"])
Above i am taking only absolute values.
Checking coefficient and intercept to build the equation
model.intercept_
-359.319
model.coef_
0.18565
So, Linear equation becomes:
GDP = 0.1865 * YEAr - 359.319
Now, predicting the values of gdp with the help of above model
df["gdp_predicted"] = model.predict(df[["year"]])
year | gdp | gdp_predicted | |
---|---|---|---|
0 | 1930 | 1.015 | -1.014778 |
1 | 1940 | 1.330 | 0.841722 |
2 | 1950 | 2.290 | 2.698222 |
3 | 1960 | 3.260 | 4.554722 |
4 | 1970 | 4.951 | 6.411222 |
5 | 1980 | 6.759 | 8.267722 |
6 | 1990 | 9.366 | 10.124222 |
7 | 2000 | 13.131 | 11.980722 |
8 | 2010 | 15.599 | 13.837222 |
Now checking the Coefficient of determination,
from sklearn.metrics import r2_score
r2_score(df["gdp"], df["gdp_predicted"])
0.9483301167987669
The Coefficeint of determination is very high indicating that we are able to capture large amount of variation with the hep of simple linear model only.
Checking for Root Mean Squared Error:
from sklearn.metrics import mean_squared_error
rms = np.sqrt(mean_squared_error(df["gdp"],
df["gdp_predicted"]))
print(rms)
1.19
The error is also very less indicating that model predicts very close to actual values.
3.)
Let's draw our linear graph on top of our original graph
plt.plot( 'year', 'gdp', data=df, linestyle='-',
marker='o', label = "original")
plt.plot( 'year', 'gdp_predicted', data=df, linestyle='-',
marker='o', label="predicted")
plt.xlabel("Year")
plt.ylabel("GDP")
plt.title("Real US GDP(in trillions)")
plt.legend()
plt.show()
The above plot shows that there exists some error in our predicted values, but if we are going to predict exactly same values as orginal, there are high chances of overfitting. As of now our model has very good accuracy.
STATISTICS PROBABILITY AND CODE IN PYTHON TO PLOT THE GRAPH. Consider the following Gross Domestic...
The following table provides data for life expectancy for Batiki Island. a. Check students' understanding of the tables with questions like: In 1900 to what age did women expect to live? Was there any year in which life expectancy decreased? Why do you think there is no data for the years 1940 and 1945? Comparing just the years 1890 and 1990, has the difference between the life expectancies of men and women decreased or increased? ...
Read the Overview and look at the Sample Project to understand what you will be creating. Check out the Rubric to make sure you earn every possible point. Use the Presentation Template to create the presentation you will submit. Use the spreadsheet template to organize your data and create your scatter plots.All of the files can be found in this Google Drive folder. The files are Microsoft Office files. You can download them to use with MS Office or you...
Read the Overview and look at the Sample Project to understand what you will be creating. Check out the Rubric to make sure you earn every possible point. Use the Presentation Template to create the presentation you will submit. Use the spreadsheet template to organize your data and create your scatter plots.All of the files can be found in this Google Drive folder. The files are Microsoft Office files. You can download them to use with MS Office or you...
3-The population in the city of Houston from 1900 to 2010 is given below: Year Population 1900 44,633 1910 78,800 1920 138,276 1930 292,352 1940 384,514 1950 596,163 1960 938,219 1970 1,233,505 1980 1,595,138 1990 1,631,766 2000 1,953,631 2010 2,100,263 a. Give a scatter-plot and residual plot of the data. b. Based on the graphs in part a, propose a model for the data. Show me evidence to support your conclusion. Go through all necessary steps to construct a model...
Table 12.1 (below)TABLE 12.1 Year-to-Year Total Returns: 1926–2019YearLarge-Company StocksLong-Term Government BondsU.S. Treasury BillsConsumer Price Index192611.62%7.77%3.27%–1.49%192737.498.933.12–2.08192843.61.103.56–.971929–8.423.424.75.201930–24.904.662.41–6.031931–43.34–5.311.07–9.521932–8.1916.84.96–10.30193353.99–.07.30.511934–1.4410.03.162.03193547.674.98.172.99193633.927.52.181.211937–35.03.23.313.10193831.125.53–.02–2.781939–.415.94.02–.481940–9.786.09.00.961941–11.59.93.069.72194220.343.22.279.29194325.902.08.353.16194419.752.81.332.11194536.4410.73.332.251946–8.07–.10.3518.1619475.71–2.62.509.0119485.503.40.812.71194918.796.451.10–1.80195031.71.061.205.79195124.02–3.931.495.87195218.371.161.66.881953–.993.641.82.62195452.627.19.86–.50195531.56–1.291.57.3719566.56–5.592.462.861957–10.787.463.143.02195843.36–6.091.541.76195911.96–2.262.951.501960.4713.782.661.48196126.89.972.13.671962–8.736.892.731.22196322.801.213.121.65196416.483.513.541.19196512.45.713.931.921966–10.063.654.763.35196723.98–9.184.213.04196811.06–.265.214.721969–8.50–5.076.586.1119703.8612.116.525.49197114.3013.234.393.36197219.005.693.843.411973–14.69–1.116.938.801974–26.474.358.0012.20197537.239.205.807.01197623.9316.755.084.811977–7.16–.695.126.7719786.57–1.187.189.03197918.61–1.2310.3813.31198032.50–3.9511.2412.401981–4.921.8614.718.94198221.5540.3610.543.87198322.56.658.803.8019846.2715.489.853.95198531.7330.977.723.77198618.6724.536.161.1319875.25–2.715.474.41198816.619.676.354.42198931.6918.118.374.651990–3.106.187.816.11199130.4719.305.603.0619927.628.053.512.90199310.0818.242.902.7519941.32–7.773.902.67199537.5831.675.602.54199622.96–.935.213.32199733.3615.855.261.70199828.5813.064.861.61199921.04–8.964.682.682000–9.1021.485.893.392001–11.893.703.831.552002–22.1017.841.652.38200328.681.451.021.88200410.888.511.203.2620054.917.812.983.42200615.791.194.802.5420075.499.884.664.082008–37.0025.871.60.09200926.46–14.90.102.72201015.0610.14.121.5020112.1127.10.042.96201216.003.43.061.74201332.39–12.78.021.51201413.6924.71.02.7620151.38–.65.02.73201611.961.75.202.07201721.836.24.802.112018–4.38–.571.811.91201931.4912.162.142.29Questions:a.Calculate the arithmetic average returns for large-company stocks and T-bills over this period. (Do not round intermediate calculations and enter your answers as a percent rounded to 2 decimal places, e.g., 32.16.)b.Calculate the standard deviation of the returns for large-company stocks and T-bills over this period. (Do not round intermediate calculations and enter your answers as a percent rounded to 2 decimal places, e.g., 32.16.)c-1.Calculate the observed risk premium...