library('MASS')
set.seed(548)
propTraining <- 0.5
propTesting <- 0.5nTraining <-
floor(propTraining*nrow(Boston))
nTesting <- floor(propTesting*nrow(Boston))
nrow(Boston)
nTraining
nTesting
# find indices for training and test sets
indicesTraining <-
sort(sample(1:nrow(Boston),size=nTraining))
indicesTesting <- setdiff(1:nrow(Boston),indicesTraining)
indicesTraining
indicesTesting
# make training and testing dataframe
BostonTrain <- Boston[indicesTraining,]
BostonTest <- Boston[indicesTesting,]
nrow(BostonTrain)
nrow(BostonTest)
head(BostonTrain)
head(BostonTest)
(a)
BostonTrain$noxProp <- (BostonTrain$nox > 0.5) #Check the
column 'nox' whether it is greater than 0.5 and return boolean
TRUE/FALSE and save it in new column named 'noxProp'
BostonTest$noxProp <- (BostonTest$nox > 0.5) #Check the
column 'nox' whether it is greater than 0.5 and return boolean
TRUE/FALSE and save it in new column named 'noxProp'
fit.logistic <- glm(noxProp~age+dis,family =
"binomial",data=BostonTrain)
#fit logistic regression model, where independent variable Y is noxProp with value FALSE (0) or TRUE (1)
#and dependent variables X1,X2... are 'age' and 'dis'
head(BostonTest$noxProp)
(b)
> summary(fit.logistic)
Call:
glm(formula = noxProp ~ age + dis, family = "binomial", data =
BostonTrain)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4833 -0.1914 0.1756 0.3052 2.6807
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.62335 1.13932 0.547 0.584
age 0.05405 0.01138 4.751 2.02e-06
***
dis -0.96741 0.17516 -5.523 3.33e-08
***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 334.88 on 252 degrees of freedom
Residual deviance: 127.95 on 250 degrees of freedom
AIC: 133.95
Number of Fisher Scoring iterations: 6
Going through the summary, we can conclude that age and dis are both significant as value for both of these variables are much less than 0.05
logistic regression equation would be:
Y = 0.62335 + 0.05405 * age - 0.96741 * dis
This equation would calculate the the value of Y
going by the equation, we can conclude greater the value of dis there is more chance that nox would be less than 0.5
(c) confusion matrix on training data:
BostonTrain$Predicted <- predict(fit.logistic,BostonTrain) # fitting the model on training data
table(BostonTrain$noxProp, BostonTrain$Predicted > 0.5)
FALSE TRUE
FALSE 84 11
TRUE 13 145
(d) Confusion matrix on test data :
BostonTest$Predicted <- predict(fit.logistic,BostonTest)
table(BostonTest$noxProp, BostonTest$Predicted > 0.5)
FALSE TRUE
FALSE 84 13
TRUE 22 134
(e) False positive rate mean ratio of observation with were are actually false(0) but it has been categorized as true(1) by the model
so looking at the training data we can conclude : 11 observation were predicted as true which should actually be false
hence FPR = 11/(11+84) = 0.1157895
like wise for test data set
FPR = 13 / (13+ 84) = 0.1340206
Submit the following as a R document as usual. Load the library MASS. Type the following: set.see...
Data manipulation using R Download and load Most_popular_baby_name.csv to R using the following R code: library('tidyverse') baby_names <- read_csv("http://personal.stevens.edu/~fmai/data/Most_Popular_Baby_Names.csv") The file contains the counts of baby names by sex and mother's ethnicity in NYC in 2011-2014. For example, the first record indicates that in 2011, when the mother is Hispanic and the baby is female, 13 were named GERALDINE. BRTH_YR Gender ETHCTY Name Count 2011 FEMALE HISPANIC GERALDINE 13 Only analyze the dataset for the years 2012 - 2014, so...
Using R to solve these questions: 1.Consider the following dataset: fuel <- c(0.95, 0.52, 0.82, 0.89, 0.81) The numbers correspond to the amount of fuel burnt by a new type of high-efficiency engine under a randomised test load. A value of 1 corresponds to the same fuel efficiency as the old engine, values greater than one correspond to more fuel burned (hence lower efficiency) and values less than one correspond to greater efficiency. (a) One-sided or two-sided test? Justify. (b)...
summatize the following info and break them into differeng key points. write them in yojr own words apartus 6.1 Introduction—The design of a successful hot box appa- ratus is influenced by many factors. Before beginning the design of an apparatus meeting this standard, the designer shall review the discussion on the limitations and accuracy, Section 13, discussions of the energy flows in a hot box, Annex A2, the metering box wall loss flow, Annex A3, and flanking loss, Annex...
summarizr the followung info and write them in your own words and break them into different key points. 6.5 Metering Chamber: 6.5.1 The minimum size of the metering box is governed by the metering area required to obtain a representative test area for the specimen (see 7.2) and for maintenance of reasonable test accuracy. For example, for specimens incorporating air spaces or stud spaces, the metering area shall span an integral number of spaces (see 5.5). The depth of...