Question

Problem 1 (Logistic Regression and KNN). In this problem, we predict Direction using the data Weekly.csv....

Problem 1 (Logistic Regression and KNN). In this problem, we predict Direction using the data Weekly.csv. a. i. Split the data into one training set and one testing set. The training set contains observations from 1990 to 2008 (Hint: we can use a Boolean vector train=(Year < 2009)). The testing set contains observations in 2009 and 2010 (Hint: since train is a Boolean vector here, should use ! symbol to reverse the elements of a Boolean vector to obtain the testing set, e.g. [!train]). Using the training data, develop a logistic regression model for Direction with the five lag variables as predictors. Which predictors appear to be statistically significant? What are their own odds ratio respectively? ii. Use the model got in (i) to make prediction using the testing data. Use 0.5 as the threshold for prediction. Show the confusion matrix and compute the prediction accuracy. b. Perform 5-fold cross-validation on the original data using logistic regression and 10- NN respectively with the five lag variables as predictors. Which model is better? Problem 2 (Classification Tree). In this problem, we predict Purchase with all the other variables as predictors using the data OJ.csv. Split the data into one training set containing a random sample of 80% of observations, and one testing set with the remaining 20%. Fit a tree to the training data. a. Apply the cv.tree() function with ten-fold cross-validation to the training set to determine the optimal tree size. Produce a pruned tree corresponding to the obtained optimal tree size. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes. Plot this tree. b. Apply the tree to make prediction using the testing data. Show the confusion matrix and compute the prediction accuracy. c. Use random forests to analyze this data. Does random forests provide a better result? Deliverables 1. Group submission. Submits one set of report and code. Please include a cover page on the report listing all team members’ names. 2. Two files: R code and the report are submitted as two separate files to Blackboard. Screenshot of R code is not accepted. 3. The report should contain the answer to each question. No R raw outputs or software screenshot should be included in the report except plots.

0 0
Add a comment Improve this question Transcribed image text
Request Professional Answer

Request Answer!

We need at least 10 more requests to produce the answer.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the answer will be notified once they are available.
Know the answer?
Add Answer to:
Problem 1 (Logistic Regression and KNN). In this problem, we predict Direction using the data Weekly.csv....
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Similar Homework Help Questions
  • In this assignment, you will be using a regression tree to analyze some data about contract...

    In this assignment, you will be using a regression tree to analyze some data about contract negotiations. Athlete Contract Negotiations (regression tree). Casey Deesel is a sports agent negotiating a contract for Titus Johnston, an athlete in the National Football League (NFL). An important aspect of any NFL contract is the amount of guaranteed money over the life of the contract. Casey has gathered data on 506 NFL athletes who have recently signed new contracts. Each observation (NFL athlete) includes...

  • Explain whether each scenario is a classification or regression problem, and indicate whether we are most...

    Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide n, number of observations and p, number of predictors. a. We collect a set of data on the top 500 firms in the US. For each firm we record profit, number of employees, industry and the CEO salary. We are interested in understanding which factors affect CEO salary b. We are considering launching a new product and...

  • For this exercise we will run a regression using Swiss demographic data from around 1888. The...

    For this exercise we will run a regression using Swiss demographic data from around 1888. The sample is a cross-section of French speaking counties in Switzerland This data come with the R package datasets. The first step is to load the package into your current environment by typing the command libraryldatasets) in to the R console. This loads a number of datasets including one called swiss. Type help/swiss) in the console for additional details. The basic variable definitions are as...

  • 12.1 Personal Loan Acceptance. Universal Bank is a relatively young bank growing rapidly in terms of...

    12.1 Personal Loan Acceptance. Universal Bank is a relatively young bank growing rapidly in terms of overall customer acquisition. The majority of these customers are liability customers with varying sizes of relationship with the bank. The customer base of asset customers is quite small, and the bank is interested in expanding this base rapidly to bring in more loan business. In particular, it wants to explore ways of converting its liability customers to personal loan customers. A campaign the bank...

  • Performance Metrics: Which of the following are terms used for performance metrics a. Specificity & Precision...

    Performance Metrics: Which of the following are terms used for performance metrics a. Specificity & Precision b. Precision & Recall c. Recall & Sensitivity d. band e All of the above 9. Performance Metrics: When looking at the ROC/AUC curve, what are the values being compared represented on the x-axis and y-axis? a. False Positive Rate and True Positive Rate b. Precision and True Positive Rate c. False Positive Rate and Precision d. True Positive Rate and Specificity e. None...

  • 1. Using question 12 (delaying major purchases) as the response variable (Y) compute a regression model...

    1. Using question 12 (delaying major purchases) as the response variable (Y) compute a regression model with the following questions 9, 25 (gender: males as 0 and females coded as 1) as your predictor variables. You will have to use the data set Economic Gun Legislation Survey Regression Exercise posted for Week 9 on the webpage. Please do the following in exactly this order: a. Excel Output b. Model: write down model like in form y- b, b,X, -b.X. +...

  • The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

    The Book of R (Question 20.2) Please answer using R code. Continue using the survey data frame from the package MASS for the next few exercises. The survey data set has a variable named Exer , a factor with k = 3 levels describing the amount of physical exercise time each student gets: none, some, or frequent. Obtain a count of the number of students in each category and produce side-by-side boxplots of student height split by exercise. Assuming independence...

  • 1. For each of the following regression models, write down the X matrix and 3 vector....

    1. For each of the following regression models, write down the X matrix and 3 vector. Assume in both cases that there are four observations (a) Y BoB1X1 + B2X1X2 (b) log Y Bo B1XiB2X2+ 2. For each of the following regression models, write down the X matrix and vector. Assume in both cases that there are five observations. (a) YB1XB2X2+BXE (b) VYBoB, X,a +2 log10 X2+E regression model never reduces R2, why 3. If adding predictor variables to a...

  • I need help interpreting logistic regression results to answer the following question: Does GRE scores, undergraduate...

    I need help interpreting logistic regression results to answer the following question: Does GRE scores, undergraduate GPA and the prestige (yes or no) of their undergraduate program effect admission (yes or no) into graduate school? Fit Group 4 Logistic Fit of ADMIT 2 By GRE 1.00 Contingency Analysis of ADMIT 2 By TOPNOTCH 2 4 Mosaic Plot Logistic Fit of ADMIT 2 By GPA 1.00 1.00 0.75 0.75 No 0.75 No No ADMIT 2 0.50 N 0.50 ADMIT 2 ADMIT...

  • . The data set below contains information about the gasoline mileage performance for 32 au- tomob...

    please answer the following using the r code provided . The data set below contains information about the gasoline mileage performance for 32 au- tomobiles. We are interested in developing a model to predict the miles per gallon () using related predictor variables. The variables in the study are Dependent variable: Miles per gallon (v) Independent variables: ri horsepower (ft-lb) ra: torque (ft-lb) r: horsepower+torque (ft-lb) rs: carburetor (barrels) (a) We first start by fitting a model using y and...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT