Question

When doing linear regression in jupyter how does one determine what columns to drop?

When doing linear regression in jupyter how does one determine what columns to drop?

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Next > < Previous

Homework Answers

Answer #1

When doing linear regression in jupyter how does one determine what columns to drop?

Which columns to drop:

● Those randomly generated field or column for only purpose is unique identification.

● Those who leaks the data or information from the future.

● Those who contain redundant data or information that means which are already included in the other column.

● Those who requires lot of processing and some other data and information to be elected as potentially useful column.

● Those who contains two many unique values which are occurring only once. Also drop the column which don’t add any information to the model like column containing only one unique value.

● Those who don’t contain cleaned values or contains missing values or information.

Drop the column using Z-score and IQR score:

● Z-score and IQR score both are used to detect the outlier.

● Z-score measures the data points which are above the mean value of what we are measuring using standard deviations.

● Interquartile range- IQR measures the variability that is amount of spread in the middle range around 50% of a dataset that is basically works on dividing a particular application dataset into quartiles like Q1, Q2 and Q3.

● Specify the threshold value. threshold = 3 and then Apply the following code to remove the outliers:

dataset_o = dataset_o [(z < 3).all(axis=1)]

dataset_out = dataset_o1[~((dataset _o1 < (Q1 - 1.5 * IQR)) |( dataset_o1 > (Q3 + 1.5 * IQR))).any(axis=1)]

Know the answer?

Add Answer to:
When doing linear regression in jupyter how does one determine what columns to drop?

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.

Similar Homework Help Questions

Absrobance spectrophotometry 400nm 1) When doing linear regression analysis on the any data, why does it...

Absrobance spectrophotometry 400nm 1) When doing linear regression analysis on the any data, why does it make sense to "force" the fit through the origin? 2) Exacly what would be a physical explanation for non-zero intercept?
When estimating linear regression models with more than one predictor, how should one assess mode...

When estimating linear regression models with more than one predictor, how should one assess model fit? How does this differ from the simple linear model with one predictor?
Correlation/Linear Regression (20-25) The scatterplot below shows the drop height (in feet) and speed (in mph)...

Correlation/Linear Regression (20-25) The scatterplot below shows the drop height (in feet) and speed (in mph) of 75 rollercoasters. Answer the following questions using this data. Speed 80 70 60 50 40 50 100 150 200 250 300 Drop 20. Describe the scatterplot above. Be sure to use the three key descriptors 21. Which of the following is the most likely value of the correlation coefficient (r) for this data? (Circle one) A)r0.523 B)r 0.187 D)r 0.9097 D)r0.875 22. Which...

Data Mining using R question help: Why are the attribute ranges so important when doing linear regression data mining?

Data Mining using R question help: Why are the attribute ranges so important when doing linear regression data mining?
What are the assumptions about the data when using linear regression? And therefore, when using linear...

What are the assumptions about the data when using linear regression? And therefore, when using linear regression to produce a calibration curve, why is it not okay to plot the standard concentrations on the y-axis and the instrument response on the x-axis?
Bonus question: How does the strength of a linear relationship in simple linear regression change if the units of the data are converted, say from feet to inches? (5 credits) Bonus question: Why...

Bonus question: How does the strength of a linear relationship in simple linear regression change if the units of the data are converted, say from feet to inches? (5 credits) Bonus question: Why does it make sense that the variability in the estimated slope B1 is smaller when the x-values are more spread out? Feel free to include a graph in your answer. (5 credits) Bonus question: How does the strength of a linear relationship in simple linear regression change...

Data Mining using R question help: Why are the attribute ranges so important when doing linear...

Data Mining using R question help: Why are the attribute ranges so important when doing linear regression data mining?
What does the error term in the simple linear regression model account for? What are the...

What does the error term in the simple linear regression model account for? What are the parameters of the simple linear model When all the points fall on the regression line, what is the value of the correlation coefficient? Part of an Excel output relating 15 observations of X (independent variable) and Y (dependent variable) is shown below. Provide the values for a-e shown in the table below. (See section 15.5) Summary Output ANOVA df SS MS F Significance F...
How does a linear regression allow you to better estimate trends, costs, and other factors in...

How does a linear regression allow you to better estimate trends, costs, and other factors in complex situations? Provide an example

When should you use linear regression and when the correlation coefficient? How to find the amplitude of the biosignal g...

When should you use linear regression and when the correlation coefficient? How to find the amplitude of the biosignal given below?

ADVERTISEMENT

Free Homework Help App

Download From Google Play

Scan Your Homework
to Get Instant Free Answers

Need Online Homework Help?

Get Answers For Free
Most questions answered within 3 hours.

ADVERTISEMENT

ADVERTISEMENT

Active Questions

The Bureau of Labor Statistics divides the adult population into four categories: employed, underemployed, unemployed, and...
asked 34 seconds ago
Question 29 Which of the following is not needed in order to prepare a statement of...
asked 2 minutes ago
iodination of vanillin experiment 1. Briefly explain why the iodination took place at the indicated position...
asked 17 minutes ago
Hello, I have questions on Matlab. Let's say that I have an equation. Signal x(t)= cos(2*pi*10*t)...
asked 13 minutes ago
Fermentation carried out by year cells will produce 1. glucos 2. ethanol 3. phsophoglyceraldehyde 4. 36...
asked 13 minutes ago
create a class in Java for a Plane the class will include String Data Fields for...
asked 15 minutes ago
A jewellery material consists of 92.5 wt% of Au and 7.5 wt% of Cu forming a...
asked 20 minutes ago
A uniform magnetic field B=5.210 T is perpendicular to the plane of the paper (into the...
asked 27 minutes ago
A metal strip 8.80 cm long, 0.758 cm wide, and 0.594 mm thick moves with constant...
asked 28 minutes ago
The following 32-bit binary word written in hexadecimal format represents a single RISC-V assembly instruction. What...
asked 31 minutes ago
A survey found that out of a random sample of 200 workers 168 said that they...
asked 42 minutes ago
Duck Dodgers hops in his spaceship and leaves the Earth at a constant velocity of 0.6c...
asked 45 minutes ago

ADVERTISEMENT