Question

#importing file users = pd.read_table('u.user', sep='|', index_col='user_id') Describe and show the dataframe In [ ]: #...

#importing file
users = pd.read_table('u.user', sep='|', index_col='user_id')

Describe and show the dataframe

In [ ]:

 
# describe information of all columns
# describe information of all numeric columns only
# describe information of all object columns only
# show first 10 rows of users dataframe

detecting duplicate rows

In [10]:

 
# check wheather a row is identical to a previous row
# count all duplicate rows in the dataframe
# show only duplicate rows in the dataframe
# drop all duplicate rows in the dataframe
# check a single specific column for duplicates occur or not
# check specify more than one column for finding duplicates

In [11]:

 
# display the 3 most frequent occupations in 'users'
# change the data type of a column name age from int to float
# for each occupation, calculate the minimum and maximum ages

In [12]:

 
# for each occupation in 'users', count the number of occurrences
# plot barchar of upper out w.r.t each occupation 

In [13]:

# for each occupation, calculate the mean age
# plot pie chart of the upper output

In [14]:

 
# for each combination of occupation and gender, calculate the mean age
# plot barchar of upper out w.r.t each occupation and gender 

In [15]:

# sort 'users' by 'occupation' and then by 'age' (in a single command)

u.user data set

user_id|age|gender|occupation|zip_code
1|24|M|technician|85711
2|53|F|other|94043
3|23|M|writer|32067
4|24|M|technician|43537
5|33|F|other|15213
6|42|M|executive|98101
7|57|M|administrator|91344
8|36|M|administrator|05201
9|29|M|student|01002
10|53|M|lawyer|90703
11|39|F|other|30329
12|28|F|other|06405
13|47|M|educator|29206
14|45|M|scientist|55106
15|49|F|educator|97301
16|21|M|entertainment|10309
17|30|M|programmer|06355
18|35|F|other|37212
19|40|M|librarian|02138
20|42|F|homemaker|95660
21|26|M|writer|30068
22|25|M|writer|40206
23|30|F|artist|48197

0 0
Add a comment Improve this question Transcribed image text
Answer #1

#importing file
import pandas as pd
import numpy as np
users = pd.read_table('user.csv', sep='|', index_col='user_id')

# describe information of all columns
print(users.describe())

# describe information of all numeric columns only
print(users.describe(include=[np.number]))

# describe information of all object columns only
print(users.describe(include=[object]))

# show first 10 rows of users dataframe detecting duplicate rows
print(users.head(10))

# check wheather a row is identical to a previous row


# count all duplicate rows in the dataframe
print(users.pivot_table(index = ['age', 'gender','occupation','zip_code'], aggfunc ='size') )

# show only duplicate rows in the dataframe

# drop all duplicate rows in the dataframe

# check a single specific column for duplicates occur or not
print(users.pivot_table(index = [ 'gender'], aggfunc ='size') )

# check specify more than one column for finding duplicates
print(users.pivot_table(index = ['occupation','gender'], aggfunc ='size') )

# display the 3 most frequent occupations in 'users'
print(users['occupation'].value_counts()[:3])

# change the data type of a column name age from int to float
convert_col = {'age': float}
users = users.astype(convert_col)
print(users.dtypes)

# for each occupation, calculate the minimum and maximum ages
print(users.groupby(['occupation'])['age'].agg({'Min N Max':['min', 'max']}))

# for each occupation in 'users', count the number of occurrences
output = users['occupation'].value_counts()[:]
print(output)

# plot barchar of upper out w.r.t each occupation
output.plot.bar(rot=0)

# for each occupation, calculate the mean age
output=users.groupby(['occupation'])['age'].agg({'':['mean']})
print(output)

# plot pie chart of the upper output
output.plot.pie(subplots=True, figsize=(10,10))

# for each combination of occupation and gender, calculate the mean age
output=users.groupby(['occupation','gender'])['age'].agg({'':['mean']})
print(output)

# plot barchar of upper out w.r.t each occupation and gender
output.plot.bar(rot=0)

# sort 'users' by 'occupation' and then by 'age' (in a single command)
print(users.sort_values(by=['occupation', 'age']))

Add a comment
Know the answer?
Add Answer to:
#importing file users = pd.read_table('u.user', sep='|', index_col='user_id') Describe and show the dataframe In [ ]: #...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • (a) Load the data file data/tips.csv into a pandas DataFrame called tips_df using the pandas read_table()...

    (a) Load the data file data/tips.csv into a pandas DataFrame called tips_df using the pandas read_table() function. Check the first five rows. (b) Create a new dataframe called tips by randomly sampling 6 records from the dataframe tips_df. Refer to the sample() function documentation. (c) Add a new column to tips called idx as a list ['one', 'two', 'three', 'four', 'five', 'six'] and then later assign it as the index of tips dataframe. Display the dataframe. (d) Create a new...

  • 1. Answer the questions with respect to the table below. users gender id name age occ_id...

    1. Answer the questions with respect to the table below. users gender id name age occ_id city_id John 25 M 3 1 2. Sara 20 F 3 4 3 Victor 31 M 2 LO 4 Jane 27 F 1 3 occupation id name 1 2 Software Engineer Accountant Pharmacist Library Assistant 3 City id name 1 2 Halifax Calgary Boston 3 4 New York 5 Toronto (b) [2.5+2.5+5) Write query to make a copy of 'users' table known as 'users_new'....

  • You will create a new project. Type in the following program and run it to produce...

    You will create a new project. Type in the following program and run it to produce the output for the program. Good practice in writing a program in any language, including Python, is to add comments for each line and state clearly what is the program input and output. Your program must include comments, so a reader will know exactly what you are trying to do in each line of code and allow for easy maintenance. Beginning a line with...

  • IN PYTHON ONLY I am looking for 4 columns, Age, Gender, Ideal Age of a Spouse, and the message. I will have 6 rows in th...

    IN PYTHON ONLY I am looking for 4 columns, Age, Gender, Ideal Age of a Spouse, and the message. I will have 6 rows in the table, and 4 columns, followed by  averages. Calculate the ideal age of a spouse. Enter either m or f from the keyboard in lower case. You may use string data for the gender. Convert the gender to upper case Enter an age from the keyboard, probably an integer You will need prompts telling the...

  • IN PYTHON ONLY I am looking for 4 columns, Age, Gender, Ideal Age of a Spouse,...

    IN PYTHON ONLY I am looking for 4 columns, Age, Gender, Ideal Age of a Spouse, and the message. I will have 6 rows in the table, and 4 columns, followed by  averages. Calculate the ideal age of a spouse. Enter either m or f from the keyboard in lower case. You may use string data for the gender. Convert the gender to upper case Enter an age from the keyboard, probably an integer You will need prompts telling the user...

  • An m×n array A of real numbers is a Monge array if for all i,j,k, and l such that 1≤i<k≤m and ...

    An m×n array A of real numbers is a Monge array if for all i,j,k, and l such that 1≤i<k≤m and 1≤j<l≤n , we have >A[i,j]+a[k,l]≤A[i,l]+A[k,j]> In other words, whenever we pick two rows and two columns of a Monge array and consider the four elements at the intersections of the rows and columns, the sum of the upper-left and lower-right elements is less than or equal to the sum of the lower-left and upper-right elements. For example, the following...

  • We will build one Python application via which users can perform various analytics tasks on data...

    We will build one Python application via which users can perform various analytics tasks on data in 2-D table (similar to Spreadsheet) format which looks like: Column Name 1 Column Name 2 Column Name 3 Column Name N … … … … … In this table, each row represents the data of one object that we are interested in. Columns, on the other hand, represent the attributes of these objects. For example, this table could represent students’ academic records. Each...

  • Instructions This assignment has to be completed by each student individually. NO COLLABORATION I...

    I need help creating a class diagram for this please: I am not sure what more you want? it has 2 classes at least Connect4 and Connect4TextConsole. Instructions This assignment has to be completed by each student individually. NO COLLABORATION IS ALLOWED Submit YourASURitelD ProjectDeliverable2.zip compressed folder should contain the following files following This the 1. 2. Connect4.java 〔Game Logic Module) Connect4TextConsole.java (Console-based Ul to test the gamel 3. JavaDoc documentation files index.html and all other supporting files such as.cs5...

  • Make sure you show all working and describe each step in your calculations. 1. A signum function ...

    signals and communications 2 Make sure you show all working and describe each step in your calculations. 1. A signum function is defined as sgn(t) = { 1、12 0 1 t<0 Plot and express this function in terms of the unit-step function. 5 marks/ Determine and plot the even and odd parts of the signal 2. x(t) (te-3 +2)u(t). 5 marks 3. If prove that y[n-m] = x[n-m] * h[n]. 5 marks 4. Assuming α > 0, plot the signal...

  • All work must be shown. That is, you need to show the steps and calculations you used to arrive at every answer for credit - For example, I need to see which two numbers you divided to arrive...

    All work must be shown. That is, you need to show the steps and calculations you used to arrive at every answer for credit - For example, I need to see which two numbers you divided to arrive at a particular S? value, F-ratio value, etc. The "Pygmalion Effect" refers to the effect of teacher expectations on student academic achievement and performance (for the classic study on this phenomenon, see Rosenthal & Jacobson, 1968; for a more recent review of...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT