Pandas Exercise 7 : Visualization

3 minute read

The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.

Visualizing Chipotle’s Data

This time we are going to pull data directly from the internet. Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Step 1. Import the necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter

# set this so the graphs open internally
%matplotlib inline

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called chipo.

url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url, sep ='\t')

Step 4. See the first 10 entries

chipo.head()
order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN $2.39
1 1 1 Izze [Clementine] $3.39
2 1 1 Nantucket Nectar [Apple] $3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN $2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98

Step 5. Create a histogram of the top 5 items bought

chipo.groupby(['item_name']).sum()[['quantity']].sort_values(['quantity'],ascending=False).head(5).plot(kind='bar')
<AxesSubplot:xlabel='item_name'>

png

Step 6. Create a scatterplot with the number of items orderered per order price

Hint: Price should be in the X-axis and Items ordered in the Y-axis

chipo['item_price'] = chipo['item_price'].apply(lambda x : x[1:]).astype('float')
chipo.groupby(['item_name']).sum()[['quantity', 'item_price']].plot(x='item_price', y='quantity',kind='scatter')
<AxesSubplot:xlabel='item_price', ylabel='quantity'>

png

Step 7. BONUS: Create a question and a graph to answer your own question.

chipo.groupby(['item_name']).sum()[['quantity', 'item_price']].plot(x='item_price',bins = 15, kind='hist')
<AxesSubplot:ylabel='Frequency'>

png

Scores Dataset

Introduction:

This time you will create the data.

Exercise based on Chris Albon work, the credits belong to him.

Step 1. Import the necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Step 2. Create the DataFrame that should look like the one below.

dict = {'first name' : ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],\
        'last_name' : ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],\
        'age' : [42, 52, 36, 24, 73],\
        'female' : [0, 1, 1, 0, 1],\
        'preTestScore' : [4, 24, 31, 2, 3],\
        'postTestScore' : [25, 94, 57, 62, 70]}

df = pd.DataFrame(dict)
df
first name last_name age female preTestScore postTestScore
0 Jason Miller 42 0 4 25
1 Molly Jacobson 52 1 24 94
2 Tina Ali 36 1 31 57
3 Jake Milner 24 0 2 62
4 Amy Cooze 73 1 3 70

Step 3. Create a Scatterplot of preTestScore and postTestScore, with the size of each point determined by age

Hint: Don’t forget to place the labels

sns.scatterplot(data=df, x='preTestScore', y='postTestScore', size='age')
<AxesSubplot:xlabel='preTestScore', ylabel='postTestScore'>

png

Step 4. Create a Scatterplot of preTestScore and postTestScore.

This time the size should be 4.5 times the postTestScore and the color determined by sex

sns.scatterplot(x=df['preTestScore'], y=df['postTestScore'], s=df['age']*4.5, hue=df['female'])
<AxesSubplot:xlabel='preTestScore', ylabel='postTestScore'>

png

BONUS: Create your own question and answer it.


Visualizing the Titanic Disaster

Introduction:

This exercise is based on the titanic Disaster dataset avaiable at Kaggle.
To know more about the variables check here

Step 1. Import the necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

Step 2. Import the dataset from this address

Step 3. Assign it to a variable titanic

url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/07_Visualization/Titanic_Desaster/train.csv'
titanic = pd.read_csv(url)

Step 4. Set PassengerId as the index

titanic.set_index(['PassengerId'], inplace=True)

Step 5. Create a pie chart presenting the male/female proportion

titanic['Sex'].value_counts().plot(kind='pie')
<AxesSubplot:ylabel='Sex'>

png

Step 6. Create a scatterplplotwith the Fare payed and the Age, differ the plot color by gender

sns.scatterplot(x=titanic['Fare'], y=titanic['Age'], hue=titanic['Sex'])
<AxesSubplot:xlabel='Fare', ylabel='Age'>

png

Step 7. How many people survived?

titanic['Survived'].sum()
342

Step 8. Create a histogram with the Fare payed

sns.histplot(x=titanic['Fare'])
<AxesSubplot:xlabel='Fare', ylabel='Count'>

png

BONUS: Create your own question and answer it.


Leave a comment