Pandas Exercise 7 : Visualization
The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.
Visualizing Chipotle’s Data
This time we are going to pull data directly from the internet. Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.
Step 1. Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
# set this so the graphs open internally
%matplotlib inline
Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called chipo.
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url, sep ='\t')
Step 4. See the first 10 entries
chipo.head()
order_id | quantity | item_name | choice_description | item_price | |
---|---|---|---|---|---|
0 | 1 | 1 | Chips and Fresh Tomato Salsa | NaN | $2.39 |
1 | 1 | 1 | Izze | [Clementine] | $3.39 |
2 | 1 | 1 | Nantucket Nectar | [Apple] | $3.39 |
3 | 1 | 1 | Chips and Tomatillo-Green Chili Salsa | NaN | $2.39 |
4 | 2 | 2 | Chicken Bowl | [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | $16.98 |
Step 5. Create a histogram of the top 5 items bought
chipo.groupby(['item_name']).sum()[['quantity']].sort_values(['quantity'],ascending=False).head(5).plot(kind='bar')
<AxesSubplot:xlabel='item_name'>
Step 6. Create a scatterplot with the number of items orderered per order price
Hint: Price should be in the X-axis and Items ordered in the Y-axis
chipo['item_price'] = chipo['item_price'].apply(lambda x : x[1:]).astype('float')
chipo.groupby(['item_name']).sum()[['quantity', 'item_price']].plot(x='item_price', y='quantity',kind='scatter')
<AxesSubplot:xlabel='item_price', ylabel='quantity'>
Step 7. BONUS: Create a question and a graph to answer your own question.
chipo.groupby(['item_name']).sum()[['quantity', 'item_price']].plot(x='item_price',bins = 15, kind='hist')
<AxesSubplot:ylabel='Frequency'>
Scores Dataset
Introduction:
This time you will create the data.
Exercise based on Chris Albon work, the credits belong to him.
Step 1. Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Step 2. Create the DataFrame that should look like the one below.
dict = {'first name' : ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],\
'last_name' : ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],\
'age' : [42, 52, 36, 24, 73],\
'female' : [0, 1, 1, 0, 1],\
'preTestScore' : [4, 24, 31, 2, 3],\
'postTestScore' : [25, 94, 57, 62, 70]}
df = pd.DataFrame(dict)
df
first name | last_name | age | female | preTestScore | postTestScore | |
---|---|---|---|---|---|---|
0 | Jason | Miller | 42 | 0 | 4 | 25 |
1 | Molly | Jacobson | 52 | 1 | 24 | 94 |
2 | Tina | Ali | 36 | 1 | 31 | 57 |
3 | Jake | Milner | 24 | 0 | 2 | 62 |
4 | Amy | Cooze | 73 | 1 | 3 | 70 |
Step 3. Create a Scatterplot of preTestScore and postTestScore, with the size of each point determined by age
Hint: Don’t forget to place the labels
sns.scatterplot(data=df, x='preTestScore', y='postTestScore', size='age')
<AxesSubplot:xlabel='preTestScore', ylabel='postTestScore'>
Step 4. Create a Scatterplot of preTestScore and postTestScore.
This time the size should be 4.5 times the postTestScore and the color determined by sex
sns.scatterplot(x=df['preTestScore'], y=df['postTestScore'], s=df['age']*4.5, hue=df['female'])
<AxesSubplot:xlabel='preTestScore', ylabel='postTestScore'>
BONUS: Create your own question and answer it.
Visualizing the Titanic Disaster
Introduction:
This exercise is based on the titanic Disaster dataset avaiable at Kaggle.
To know more about the variables check here
Step 1. Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()
Step 2. Import the dataset from this address
Step 3. Assign it to a variable titanic
url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/07_Visualization/Titanic_Desaster/train.csv'
titanic = pd.read_csv(url)
Step 4. Set PassengerId as the index
titanic.set_index(['PassengerId'], inplace=True)
Step 5. Create a pie chart presenting the male/female proportion
titanic['Sex'].value_counts().plot(kind='pie')
<AxesSubplot:ylabel='Sex'>
Step 6. Create a scatterplplotwith the Fare payed and the Age, differ the plot color by gender
sns.scatterplot(x=titanic['Fare'], y=titanic['Age'], hue=titanic['Sex'])
<AxesSubplot:xlabel='Fare', ylabel='Age'>
Step 7. How many people survived?
titanic['Survived'].sum()
342
Step 8. Create a histogram with the Fare payed
sns.histplot(x=titanic['Fare'])
<AxesSubplot:xlabel='Fare', ylabel='Count'>
BONUS: Create your own question and answer it.
Leave a comment