Question

Python Assignment In this assignment, you will use Pandas library to perform analysis on the dataset stored in the following csv file: breast-cancer-wisconsin.csv. Please write script(s) to do the fol...

Python Assignment

In this assignment, you will use Pandas library to perform analysis on the dataset stored in the following csv file: breast-cancer-wisconsin.csv.

Please write script(s) to do the following:

1. Read the csv file and covert the dataset into a DataFrame object.

2. Persist the dataset into a SQL table and a JASON file. • Write the content of the DataFrame object into an SQLite database table. This will convert the dataset into a SQL table format. You can define your own database and table name. • Write the content of the DataFrame object into a JASON file. This will convert the dataset into a JASON format. You can decide which JASON format (column, record or split) you like to convert.

3. Calculate the mean and standard deviation for every (numerical) column using DataFrame methods.

4. Use DataFrame Data Visualization methods to draw either the Boxplot or Kernel Density (KDE) diagram to display the distribution function for each column of the DataFrame object. Please compare the curves generated and determined which columns have distribution functions of similar shape.

5. Use the DataFrame method to calculate the correlation coefficient between any two columns. Also draw the Scatter Plots to demonstrate how any two columns are correlated. Use the coefficient coefficients and Scatter Plots to determine if any two columns are positively correlated, negatively correlated or not correlated.

6. Use the class column to group the records in the dataset and repeat step 3 and 4 for all groups.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

import pandas as pd
data = pd.read_csv("breast-cancer-wisconsin.csv")
data.head()

import pandas
import scipy
import numpy
from sklearn.preprocessing import MinMaxScaler
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
scaler = MinMaxScaler(feature_range=(0, 1))
rescaledX = scaler.fit_transform(X)
numpy.set_printoptions(precision=3)
print(rescaledX[0:5,:])

Add a comment
Know the answer?
Add Answer to:
Python Assignment In this assignment, you will use Pandas library to perform analysis on the dataset stored in the following csv file: breast-cancer-wisconsin.csv. Please write script(s) to do the fol...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT