Question

- For the spam data, partition the data into 2/3 training and 1/3 test data. -...

- For the spam data, partition the data into 2/3 training and 1/3 test data.

- Find the best 12 variables whose t-test statistics (in absolute value) are highest 12.

#You may use apply function to get 12 variable names.

- Build the GAM model for spam training data using the first 12 variables whose t.test (two sample) statistic (in absolute value) are within top 12.

r coding?

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Given that

Will only provide R codes

##a1: splitting the data##

require(kernlab)
data=spam
select=sample(1:nrow(spam),ceiling(nrow(spam)/3),replace=F)
test=data[select,]
train=data[-select,]

##a2: Placing an initial model and finding the best 12 variables##
attach(train)

l=glm(type~.,data=train,family ="binomial")

best=sort.int(summary(l)$coeff[,4][-1],dec=F,index.return=T)$ix[1:12] # This gives the indices of best 12 variables

##b: FInal GLm model ##

train_final=cbind(train[,best],type)
final = glm(type~.,data=train_final,family="binomial")

Add a comment
Know the answer?
Add Answer to:
- For the spam data, partition the data into 2/3 training and 1/3 test data. -...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • (2 points) For the data set (-2,-3), (0,2), (5,4), (8,7).(12, 10), carry out the hypothesis test...

    (2 points) For the data set (-2,-3), (0,2), (5,4), (8,7).(12, 10), carry out the hypothesis test HB: = 0 HB 70 Determine the value of the test statistic and the associated p-value. Test Statistic = p-Value = Note: In order to get credit for this problem all answers must be correct

  • To test if the participation in a “communication training” will result in a higher student evaluations,...

    To test if the participation in a “communication training” will result in a higher student evaluations, a randomly selected group of 20 faculty members participate in the training and the difference of their ratings (on a 4-point scale) are recorded to test The following are the data with summary statistics Faculty before training after training difference 1 1.94 2.26 0.32 2 3.19 3.62 0.43 3 2.96 3.33 0.37 4 2.99 3.46 0.47 5 2.73 3.1 0.37 6 2.62 2.97 0.35...

  • 1- Suppose that a sample of 35 pairs of data produced a test statistic of r...

    1- Suppose that a sample of 35 pairs of data produced a test statistic of r = 0.335. A significance level of αα = 0.10 is to be used. (a) Find the critical values that would be used to test for significant linear correlation. Do not type "±±" in front of your answer (the ±± is already given in front of the answer box below). Round your answer to 3 places after the decimal point, if necessary. r = ±±...

  • Single Sample t-Test Coach Brown is training his runners for an upcoming race, but he is...

    Single Sample t-Test Coach Brown is training his runners for an upcoming race, but he is concerned about their pace. He takes down sample times from his top 7 runners to compare it to his all-time standard race speed of 7.9 minutes per kilometer (μ). The sample times from his runners were 8.55 minutes per kilometer (M) with a standard deviation estimate of 0.78 (s). Based on this data, complete the six steps of hypothesis testing given that our research...

  • 2. Homework 19, Question 2 A real estate agent is interested in estimating the value of a piece o...

    Price Lot size Trees Distance 89.7 21.8 45 62 136.1 66.3 79 34 44.7 28.2 53 77 63.2 41.9 64 65 163.4 46.7 69 27 64.1 32.1 12 0 98.7 38.5 59 77 139.9 27.6 10 0 92 47 65 37 66.6 20.7 24 51 16.4 34 22 75 131.9 31.9 56 63 11 28 12 42 27.9 40 52 84 103.5 46.6 36 70 107 23.2 11 83 51.6 46.4 53 44 133.4 32.1 55 98 101.4 35.3 38...

  • A 10-year study conducted by the American Heart Association provided data on how age, blood press...

    A 10-year study conducted by the American Heart Association provided data on how age, blood pressure, and smoking relate to the risk of strokes. Data from a portion of this study follow. Risk is interpreted as the probability (times 100) that a person will have a stroke over the next 10-year period. For the smoker variable, 1 indicates a smoker and 0 indicates a nonsmoker Click on the datafile logo to reference the data. DATA file Risk Blood Pressure Smoker...

  • Answer the within-subjects ANOVA questions using the data below. Use a -0.05. 1 2 3 53...

    Answer the within-subjects ANOVA questions using the data below. Use a -0.05. 1 2 3 53 49 47 31 51 34 44 44 39 36 34 39 30 29 46 34 39 37 36 34 47 25 35 30 27 33 28 40 a) Compute the preliminary statistics below. SSGG - ; dfbg - ; dfes SSE - ; dfe - SST ; df = SSBs b) Compute the appropriate test statistic(s) to make a decision about How critical value...

  • Answer the within-subjects ANOVA questions using the data below. Use α = 0.01. 1 2 3...

    Answer the within-subjects ANOVA questions using the data below. Use α = 0.01. 1 2 3 4 53 49 47 42 51 39 44 44 39 36 34 39 30 29 46 42 39 37 36 34 26 44 35 30 27 33 28 31 a) Compute the preliminary statistics below. SSBG =  ; dfBG =   SSBS =  ; dfBS =   SSE =  ; dfE =   SST =  ; dfT =   b) Compute the appropriate test statistic(s) to make a decision about H0. critical...

  • Problem 1 (Logistic Regression and KNN). In this problem, we predict Direction using the data Weekly.csv....

    Problem 1 (Logistic Regression and KNN). In this problem, we predict Direction using the data Weekly.csv. a. i. Split the data into one training set and one testing set. The training set contains observations from 1990 to 2008 (Hint: we can use a Boolean vector train=(Year < 2009)). The testing set contains observations in 2009 and 2010 (Hint: since train is a Boolean vector here, should use ! symbol to reverse the elements of a Boolean vector to obtain the...

  • Please answer ASAP (2 points) For the data set (-2,-3).(3,3), (6,4), (9,7).(11,10), carry out the hypothesis...

    Please answer ASAP (2 points) For the data set (-2,-3).(3,3), (6,4), (9,7).(11,10), carry out the hypothesis test H. Ⓡ, = 0 HB10 Determine the value of the test statistic and the associated p-value. Test Statistic = p-Value = Note: In order to get credit for this problem all answers must be correct

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT