Create a Jupyter notebook containing a title of “homework 2”, your name, and the course CPSMA 4313. Load any libraries you will use in a code block at the beginning.

1. Gather the data from a fitbit provided on Kaggle and provided in the github repo for this course. The link is provided here https://raw.githubusercontent.com/nurfnick/Data Viz/main/Activity Dataset V1.csv. I have used ‘quotes’ when discussing a column in the dataset.

(a) (10 points) Store the data as a pandas dataframe. Examine each datatype and comment on the appropriateness of each.

(b) (10 points) Remove the column that repeats the indexes and is ‘unnamed’ as a column.

(c) (10 points) Clean the column names to remove the unit declaration, (%), using regular expressions. The column name should not have any trailing spaces after cleaning. You will only receive partial credit for simply renaming columns without using regular expressions.

(d) (10 points) Convert ‘activity day’ column into a datetime format.

(e) (10 points) Impute ‘total steps’ by replacing the ‘NaN’s with an appropriate number of steps. Convert to appropriate datatype.

(f) (10 points) Convert non-empty ‘avg pace’ into a float that is still representative of the information contained in the column. Recall that there are 60 seconds in one minute so 3:30 is equivalent to 3.5 minutes.

(g) (10 points) Group data by ‘workout type’ and find the mean, median, count and standard deviation of ‘calories’.

(h) (10 points) Create an indicator column that identifies if the activity achieved 30% or more ‘aerobic’ activity.

(i) (10 points) Which day of the week (Monday, Tuesday, etc.) and ‘workout type‘ has the maximum of the ‘max cadence’.

Published by
Dissertations
View all posts