Question

(a) Load the data file data/tips.csv into a pandas DataFrame called tips_df using the pandas read_table()...

(a) Load the data file data/tips.csv into a pandas DataFrame called tips_df using the pandas read_table() function. Check the first five rows.

(b) Create a new dataframe called tips by randomly sampling 6 records from the dataframe tips_df. Refer to the sample() function documentation.

(c) Add a new column to tips called idx as a list ['one', 'two', 'three', 'four', 'five', 'six'] and then later assign it as the index of tips dataframe. Display the dataframe.

(d) Create a new Series called kids as Series([1, 2, 1], index = ['two', 'five', 'six']). Assign the series as a new column in the dataframe.

(e) List the various columns in the dataframe using the columns attribute of the dataframe. Also, check the various column datatypes in the dataframe.

(f) Transpose the dataframe tips.

(g) Check the name of the dataframe index. If there isn't one, assign a new name.

(h) Check the name of the dataframe columns. If there isn't one, assign a new name.

(i) List the rows in the dataframe using the values attribute of the dataframe. Check the datatype of the result.

(j) Check if 'time' is one of the columns in the dataframe. Use set-like operation in.

(k) Check if 'six' is one of the index values in the dataframe. Use set-like operation in.

(l) Check if 'seven' is one of the index values in the dataframe. Use set-like operation in.

---------------------------------------------------------------------------

(a) Add a new row with the following values [18.0, 4.0, 'Male', 'No', 'Mon', 'Lunch', 3, 1.0, True] to the tips dataframe with a duplicate index value six.

(b) Select all occurences of the index six. Hint: Use the loc attribute for retrieving rows by position.

(c) Reset the index for the dataframe. Hint: Use reset_index.

(d) Reindex using day column. Hint: Use set_index.

(e) Now, revert back to using the index column as the index.

(f) Drop the newly added row from the tips dataframe with duplicate index value six. Hint: First, reset the index, then use drop_duplicates function and reassign the index back to normal.

(g) Drop the row with index value six. Hint: Use drop.

(h) Drop the columns kids and kidcheck.

(i) Drop the column size.

------------------------------------------------------------------------

(a) Select two columns tip and sex from the dataframe.

(b) Select one column sex from the dataframe.

(c) Select the first 3 rows using slicing notation.

(d) Select the first 4 rows using the index labels. Note: Slicing with index labels behaves differently than normal Python slicing.

(e) Select the rows where the value of sex is Male. Hint: Use boolean array.

(f) Select the rows where tip is greater than 2.

(g) Select the column smoker where the row where tip is greater than 2. Hint: Use loc.

(h) Select the columns smoker and total_bill where the row where tip is greater than 3. Hint: Use loc.

(i) For the rows where sex is Male, assign the value of tip to 5.

(j) Check what happens when you compare the dataframe with the following scalar boolean expression. tips < 2. Intrepret what is happening, why.

(k) Select the third and second columns (in that order) for the third row in the dataframe using integer indexing. Hint: Use iloc.

(l) Select the third and second columns (in that order) for the third and fifth rows (in that order) in the dataframe using integer indexing. Hint: Use iloc.

(m) Select all the rows and the third and second columns (in that order) using integer indexing for cases where the tip value is greater than 3. Hint: Use iloc.

------------------------------------------------------------------------

(a) Create two sample dataframes with 6 records tips1 and tips2 from tips_df dataframe. tips_df.sample(n = 6).

(b) Append tips2 to tips1.

(c) Assign the value np.nan to all the records in tips1 where smoker is Male.

(d) Use forward fill to fill missing values in the smoker column in tips1. Hint: Use fillna.

------------------------------------------------------------------------

(a) Find the descriptive statistics for the dataframe tips1. Notice how the statistics are reported only for numeric columns.

(b) Create a new dataframe tips3 that only contains columns with numeric values from the dataframe tips1. Find the descriptive statistics for tips3.

(c) Compute the sum of all rows in each column in tips3.

(d) Compute the sum of all columns for each row in tips3.

(e) Compute the cumulative sums for values in each row for every column in tips3.

(f) Compute the correlation and covariance of the columns in tips3.

(g) Use the corrwith DataFrame method to find the correlation of all the columns with the the column total_bill.

------------------------------------------------------------------------

(a) Create a new dataframe tips4 that only contains columns with non-numeric values from the dataframe tips1. Describe tips4 data.

(b) Get the counts of unique values of the days in tips4.

(c) Create a boolean array called mask that only retrieves records in tips4 that have day values as Thur or Sat.

I have been able to complete the top sections I just need the bolded one. The questions are all linked so I am posting them all.

python programming language

0 0
Add a comment Improve this question Transcribed image text
Answer #1

NOTE : FEEL FREE TO ASK ANY DOUBTS OR CORRECTION IN COMMENT SECTION.

NOTE : AS PER THE CHEGG RULES & GUIDELINES WE HAVE TO DO ONLY 4 SUB PARTS FROM THE MULTIPLE SUB PARTS.

  • I HAD DONE FEW QUESTION PREVIOUSLY.
  • BELOW ARE THEM. HERE I'M ALSO ATTACHING THEM.
  • I'M GIVING YOU NEXT 4 SUB PARTS.
  • PLEASE RE UPLOAD QUESTIONS AGAIN WITH MENTIONING QUESTION NUMBERS. PLEASE UNDERSTAND.

PREVIOUSLY I HAD DONE BELOW

#!/usr/bin/env python
# coding: utf-8

# ### importing pandas

# In[1]:


import pandas as pd


# ### Rows and Columns list

# In[2]:


rows = ['one','two','three','four','five','six']
column = ['total_bill','tip','sex','smoker','day','time','size','kids']


# ### Data for dataframe from given Question

# In[3]:


data=[
[44.3,2.5,'Female','Yes','Sat','Dinner',3,None],
[20.27,2.83,'Female','No','Thur','Lunch',2,1.0],
[18.28,4.0,'Male','No','Thur','Lunch',2,None],
[18.433,3.0,'Male','No','Sun','Dinner',4,None],
[24.71,5.85,'Male','No','Thur','Lunch',2,2.0],
[16.4,2.5,'Female','Yes','Thur','Lunch',2,1.0]
]


# ### Creating dataframe with above data and column and row

# In[4]:


tips = pd.DataFrame(data,index=rows,columns=column)

# Printing dataframe
print(tips)


# ## Problem a

# ### Creating new data frame with row index 'six' and new column 'kidcheck'

# In[5]:


# [18.0,4.0,'Male','No','Mon','Lunch',3,1.0,True]

newData=[[18.0,4.0,'Male','No','Mon','Lunch',3,1.0,True]]
column.append('kidcheck')


# In[6]:


newRow = pd.DataFrame(newData,index=['six'],columns=column)


# ### Concatinating above data with tips data

# In[7]:


dataframes = [tips, newRow]
tips = pd.concat(dataframes,sort=False)


# In[8]:


# printing data
print(tips)


# ## Problem b

# ### Select all occurences of index 'six'

# In[9]:


sixIndex = tips.loc['six']
print(sixIndex)


# ## Problem c

# ### Reset index for the dataframe

# In[10]:


tips.reset_index(inplace = True)
print(tips)


# ## Problem d

# ### Re index using 'day' column

# In[11]:


tips=tips.set_index(['day'])
print(tips)


# ## Problem e

# ### Revert back to 'index' column as the 'index'

# In[12]:


# setting 'index' column as index
tips=tips.set_index(['index'])
print(tips)


# ## Problem f

# ### Drop the newly added index from dataframe, drop duplicate index 'six'

# In[13]:


# Reset index
tips.reset_index(inplace = True)
# Dropping duplicate rows at index
tips = tips.drop_duplicates(subset=['index'])
# Setting index as index
tips = tips.set_index(['index'])
# printing dataframe
print(tips)


# ## Problem g

# ### Drop the row with index value 'six'

# In[14]:


tips.drop(['six'],inplace=True)
print(tips)


# ## Problem h

# ### Drop the column kids and kidcheck

# In[15]:


tips.drop(['kids','kidcheck'],axis=1,inplace=True)
print(tips)


# ## Problem i

# ### Drop the column size

# In[16]:


tips.drop(['size'],axis=1,inplace=True)
print(tips)

HERE ARE THE NEXT 4 SUB PARTS FROM THE QUESTION

# # NEXT 4 SUB PARTS

# ### a) select two columns tip and sex from dataframe

# In[17]:


# printing columns tip and sex
print(tips.loc[:,['tip','sex']])


# ### b) select one column sex from dataframe

# In[18]:


# Printing one column sex from dataframe
print(tips.loc[:,['sex']])


# ### c) select first 3 rows using slicing notation

# In[19]:


# Printing rows 1 to 3
# 0 index = 1st row
print(tips.iloc[0:3])


# ### d) select thre first 4 rows using the index labels. Note: slicing with index labels behaves differently than normal python slicing.

# In[20]:


# Getting first four rows index labels as list
first_four_index_labels = list(tips.index[0:4])

# passing labels list to loc to print first four index labels
# tips.loc[[label1, label2,......., labelN]]
print(tips.loc[first_four_index_labels])

OUTPUT in JUPYTER NOTEBOOK

CONSIDER CHEGG RULES & GUIDELINES

DEAR SIR, PLEASE DON'T FORGET TO GIVE AN UP VOTE

Thank YOU :-)

Add a comment
Know the answer?
Add Answer to:
(a) Load the data file data/tips.csv into a pandas DataFrame called tips_df using the pandas read_table()...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Question:- Please create the 5*3 two dimensional data having numerical value by using pandas dataframe (You...

    Question:- Please create the 5*3 two dimensional data having numerical value by using pandas dataframe (You can give any name to the columns), please make sure that there should be at least one null value in each column. Once you are done with creating the matrix, please answer below questions:- (You have to do all the operations on jupyter notebook) e) Is there any way that you can put the restrictions on column wise or row wise to drop the...

  • Lab Exercise #15 Assignment Overview This lab exercise provides practice with Pandas data analysis library. Data...

    Lab Exercise #15 Assignment Overview This lab exercise provides practice with Pandas data analysis library. Data Files We provide three comma-separated-value file, scores.csv , college_scorecard.csv, and mpg.csv. The first file is list of a few students and their exam grades. The second file includes data from 1996 through 2016 for all undergraduate degree-granting institutions of higher education. The data about the institution will help the students to make decision about the institution for their higher education such as student completion,...

  • #importing file users = pd.read_table('u.user', sep='|', index_col='user_id') Describe and show the dataframe In [ ]: #...

    #importing file users = pd.read_table('u.user', sep='|', index_col='user_id') Describe and show the dataframe In [ ]: # describe information of all columns ​ # describe information of all numeric columns only ​ # describe information of all object columns only ​ # show first 10 rows of users dataframe detecting duplicate rows In [10]: # check wheather a row is identical to a previous row ​ # count all duplicate rows in the dataframe ​ # show only duplicate rows in...

  • Python with Pandas dataframe I have a csv file that contains a large number of columns...

    Python with Pandas dataframe I have a csv file that contains a large number of columns and rows. I need to write a script that concatenates some elements of the first row with some elements of the 2 row. Something like # if data[1][0] starts with ch then concatenate the element right below it. I have attached a picture of just a sample of my data. The booleans have to stay on there as is. But I must drop the...

  • The picture is given in a PPM file and your program should put the converted one...

    The picture is given in a PPM file and your program should put the converted one into another PPM file. •Use argv[1] for the given file and argv[2] for the converted file.In addition, you can use a temporary file called tmp.ppm. •The number of rows and columns are not fixed numbers. •The converted file should also follow the PPM format with the above simplification, and can be converted subsequently. •Read the pixel matrix into a buffer. •For each row i(...

  • write a Java console application that Create a text file called Data.txt. Within the file, use...

    write a Java console application that Create a text file called Data.txt. Within the file, use the first row for the name and data title, and use the second row for column headers. Within the columns, insure that the first column is a label column and other columns contain numeric data. In your application, read file Data.txt into parallel arrays, one for each column in the file. Create method printArrays to print the header row(s) and the (unsorted) data in...

  • 2. Write a script that implements the following design: In the Downloads table, the user_id and...

    2. Write a script that implements the following design: In the Downloads table, the user_id and product_id columns are the foreign keys. Create these tables in the ex schema. Create the sequences for the user_id, download_id, and product_id columns. Include a PL/SQL script to drop the table or sequence if it already exists. Include any indexes that you think are necessary. 3. Write a script that adds rows to the database that you created in exercise 2. Add two rows...

  • both question need to be solved Some names gain/ose popularity because of cultural phenomena such as...

    both question need to be solved Some names gain/ose popularity because of cultural phenomena such as a political figure coming to power. Below, we plot the popularity of the female name Hillary in Calfiomnia over time. What do you notice about this plot? What might be the cause of the steep drop? [46]: _baby_name - baby names[(baby_names 'Name'] == 'Hillary') * (baby_names['State'] - CA) * (baby_names 'sex' - t(hillary_baby_name[ 'Year'), hillary_baby_namel Count']) le("Hillary Popularity Over Time") bel('Year') bel('Count'); Hillary Popularity...

  • I can't attach the data due to the file being real large i can email it...

    I can't attach the data due to the file being real large i can email it to you so i can have your help on it # Assignment 1 # R Programming Language # ---- Why do Exploratory Data Analysis (EDA)? ---- # We will be looking at ## identifying outliers ## null values ## generating plots ## examining correlations # -------------------------------------------------------------- # In this video we will cover: ## univariate plots for continuous variables (boxlots, historgrams) ## bivariate plots...

  • CSCI 0229-01 C++ for Engineers, Spring 2019 Assignment 3 Due: Wednesday, April 10, 2019, 11:00 pm...

    CSCI 0229-01 C++ for Engineers, Spring 2019 Assignment 3 Due: Wednesday, April 10, 2019, 11:00 pm Format: You will be submitting a single .cpp file for this assignment File name: LastnameFirstname_TU.cpp Write a C++ Program that will print the TU logo shown below to the console: 88*8 Notice that there are 10 rows and 10 columns. You should use a for loop that will iterate 10 times to represent your rows, you will use a nested for loop that will...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT