Pandas Exercise 7 : Visualization

3 minute read

The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.

Visualizing Chipotle’s Data

This time we are going to pull data directly from the internet. Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Step 1. Import the necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter

# set this so the graphs open internally
%matplotlib inline

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called chipo.

url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url, sep ='\t')

Step 4. See the first 10 entries

chipo.head()

	order_id	quantity	item_name	choice_description	item_price
0	1	1	Chips and Fresh Tomato Salsa	NaN	$2.39
1	1	1	Izze	[Clementine]	$3.39
2	1	1	Nantucket Nectar	[Apple]	$3.39
3	1	1	Chips and Tomatillo-Green Chili Salsa	NaN	$2.39
4	2	2	Chicken Bowl	[Tomatillo-Red Chili Salsa (Hot), [Black Beans...	$16.98

Step 5. Create a histogram of the top 5 items bought

chipo.groupby(['item_name']).sum()[['quantity']].sort_values(['quantity'],ascending=False).head(5).plot(kind='bar')

<AxesSubplot:xlabel='item_name'>

png

Step 6. Create a scatterplot with the number of items orderered per order price

Hint: Price should be in the X-axis and Items ordered in the Y-axis

chipo['item_price'] = chipo['item_price'].apply(lambda x : x[1:]).astype('float')

chipo.groupby(['item_name']).sum()[['quantity', 'item_price']].plot(x='item_price', y='quantity',kind='scatter')

<AxesSubplot:xlabel='item_price', ylabel='quantity'>

png

Step 7. BONUS: Create a question and a graph to answer your own question.

chipo.groupby(['item_name']).sum()[['quantity', 'item_price']].plot(x='item_price',bins = 15, kind='hist')

<AxesSubplot:ylabel='Frequency'>

png

Scores Dataset

Introduction:

This time you will create the data.

Exercise based on Chris Albon work, the credits belong to him.

Step 1. Import the necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Step 2. Create the DataFrame that should look like the one below.

dict = {'first name' : ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],\
        'last_name' : ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],\
        'age' : [42, 52, 36, 24, 73],\
        'female' : [0, 1, 1, 0, 1],\
        'preTestScore' : [4, 24, 31, 2, 3],\
        'postTestScore' : [25, 94, 57, 62, 70]}

df = pd.DataFrame(dict)
df

	first name	last_name	age	female	preTestScore	postTestScore
0	Jason	Miller	42	0	4	25
1	Molly	Jacobson	52	1	24	94
2	Tina	Ali	36	1	31	57
3	Jake	Milner	24	0	2	62
4	Amy	Cooze	73	1	3	70

Step 3. Create a Scatterplot of preTestScore and postTestScore, with the size of each point determined by age

Hint: Don’t forget to place the labels

sns.scatterplot(data=df, x='preTestScore', y='postTestScore', size='age')

<AxesSubplot:xlabel='preTestScore', ylabel='postTestScore'>

png

Step 4. Create a Scatterplot of preTestScore and postTestScore.

This time the size should be 4.5 times the postTestScore and the color determined by sex

sns.scatterplot(x=df['preTestScore'], y=df['postTestScore'], s=df['age']*4.5, hue=df['female'])

<AxesSubplot:xlabel='preTestScore', ylabel='postTestScore'>

png

BONUS: Create your own question and answer it.

Visualizing the Titanic Disaster

Introduction:

This exercise is based on the titanic Disaster dataset avaiable at Kaggle.
To know more about the variables check here

Step 1. Import the necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

Step 2. Import the dataset from this address

Step 3. Assign it to a variable titanic

url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/07_Visualization/Titanic_Desaster/train.csv'
titanic = pd.read_csv(url)

Step 4. Set PassengerId as the index

titanic.set_index(['PassengerId'], inplace=True)

Step 5. Create a pie chart presenting the male/female proportion

titanic['Sex'].value_counts().plot(kind='pie')

<AxesSubplot:ylabel='Sex'>

png

Step 6. Create a scatterplplotwith the Fare payed and the Age, differ the plot color by gender

sns.scatterplot(x=titanic['Fare'], y=titanic['Age'], hue=titanic['Sex'])

<AxesSubplot:xlabel='Fare', ylabel='Age'>

png

Step 7. How many people survived?

titanic['Survived'].sum()

Step 8. Create a histogram with the Fare payed

sns.histplot(x=titanic['Fare'])

<AxesSubplot:xlabel='Fare', ylabel='Count'>

png

BONUS: Create your own question and answer it.

Share on

Twitter Facebook LinkedIn

Gama Candra Tri Kartika

Pandas Exercise 7 : Visualization

Visualizing Chipotle’s Data

Step 1. Import the necessary libraries

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called chipo.

Step 4. See the first 10 entries

Step 5. Create a histogram of the top 5 items bought

Step 6. Create a scatterplot with the number of items orderered per order price

Hint: Price should be in the X-axis and Items ordered in the Y-axis

Step 7. BONUS: Create a question and a graph to answer your own question.

Scores Dataset

Introduction:

Step 1. Import the necessary libraries

Step 2. Create the DataFrame that should look like the one below.

Step 3. Create a Scatterplot of preTestScore and postTestScore, with the size of each point determined by age

Hint: Don’t forget to place the labels

Step 4. Create a Scatterplot of preTestScore and postTestScore.

This time the size should be 4.5 times the postTestScore and the color determined by sex

BONUS: Create your own question and answer it.

Visualizing the Titanic Disaster

Introduction:

Step 1. Import the necessary libraries

Step 2. Import the dataset from this address

Step 3. Assign it to a variable titanic

Step 4. Set PassengerId as the index

Step 5. Create a pie chart presenting the male/female proportion

Step 6. Create a scatterplplotwith the Fare payed and the Age, differ the plot color by gender

Step 7. How many people survived?

Step 8. Create a histogram with the Fare payed

BONUS: Create your own question and answer it.

Share on

Leave a comment

You may also enjoy

Busy From Works

Long Updates

Back to back meetings

Lucky weekends