Pandas Exercise 5 : Merge

3 minute read

The continuity of my practice on Pandas exercise from guisapmora.

Fictitious Names Dataset

Introduction:

This time you will create a data again

Special thanks to Chris Albon for sharing the dataset and materials. All the credits to this exercise belongs to him.

In order to understand about it go here.

Step 1. Import the necessary libraries

import pandas as pd

Step 2. Create the 3 DataFrames based on the following raw data

raw_data_1 = {
        'subject_id': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}

raw_data_2 = {
        'subject_id': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}

raw_data_3 = {
        'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
        'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}

Step 3. Assign each to a variable called data1, data2, data3

data1 = pd.DataFrame(raw_data_1)
data2 = pd.DataFrame(raw_data_2)
data3 = pd.DataFrame(raw_data_3)

Step 4. Join the two dataframes along rows and assign all_data

all_data = pd.concat((data1, data2), axis=0)

Step 5. Join the two dataframes along columns and assing to all_data_col

all_data_col = pd.concat((data1, data2), axis=1)

Step 6. Print data3

data3

	subject_id	test_id
0	1	51
1	2	15
2	3	15
3	4	61
4	5	16
5	7	14
6	8	15
7	9	1
8	10	61
9	11	16

Step 7. Merge all_data and data3 along the subject_id value

all_data.merge(data3, on=['subject_id'])

	subject_id	first_name	last_name	test_id
0	1	Alex	Anderson	51
1	2	Amy	Ackerman	15
2	3	Allen	Ali	15
3	4	Alice	Aoni	61
4	4	Billy	Bonder	61
5	5	Ayoung	Atiches	16
6	5	Brian	Black	16
7	7	Bryce	Brice	14
8	8	Betty	Btisan	15

Step 8. Merge only the data that has the same ‘subject_id’ on both data1 and data2

data1.merge(data2, on=['subject_id'], how='inner')

	subject_id	first_name_x	last_name_x	first_name_y	last_name_y
0	4	Alice	Aoni	Billy	Bonder
1	5	Ayoung	Atiches	Brian	Black

Step 9. Merge all values in data1 and data2, with matching records from both sides where available.

data1.merge(data2, on=['subject_id'], how='outer')

	subject_id	first_name_x	last_name_x	first_name_y	last_name_y
0	1	Alex	Anderson	NaN	NaN
1	2	Amy	Ackerman	NaN	NaN
2	3	Allen	Ali	NaN	NaN
3	4	Alice	Aoni	Billy	Bonder
4	5	Ayoung	Atiches	Brian	Black
5	6	NaN	NaN	Bran	Balwner
6	7	NaN	NaN	Bryce	Brice
7	8	NaN	NaN	Betty	Btisan

MPG Dataset

Introduction:

The following exercise utilizes data from UC Irvine Machine Learning Repository

Step 1. Import the necessary libraries

import pandas as pd

Step 2. Import the first dataset cars1 and cars2.

### Step 3. Assign each to a variable called cars1 and cars2

url1 = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv'
url2 = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv'

cars1 = pd.read_csv(url1)
cars2 = pd.read_csv(url2)

Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

cars1 = cars1.dropna(how ='all', axis=1)

Step 5. What is the number of observations in each dataset?

print(cars1.shape[0])
print(cars2.shape[0])

198
200

Step 6. Join cars1 and cars2 into a single DataFrame called cars

cars = pd.concat((cars1, cars1), axis=0)
cars.columns

Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'model', 'origin', 'car'],
      dtype='object')

Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

import numpy as np

owners = np.random.randint(low=15000, high=73000, size= cars.shape[0])

Step 8. Add the column owners to cars

cars['owner'] = owners

Share on

Twitter Facebook LinkedIn

Gama Candra Tri Kartika

Pandas Exercise 5 : Merge

Fictitious Names Dataset

Introduction:

Step 1. Import the necessary libraries

Step 2. Create the 3 DataFrames based on the following raw data

Step 3. Assign each to a variable called data1, data2, data3

Step 4. Join the two dataframes along rows and assign all_data

Step 5. Join the two dataframes along columns and assing to all_data_col

Step 6. Print data3

Step 7. Merge all_data and data3 along the subject_id value

Step 8. Merge only the data that has the same ‘subject_id’ on both data1 and data2

Step 9. Merge all values in data1 and data2, with matching records from both sides where available.

MPG Dataset

Introduction:

Step 1. Import the necessary libraries

Step 2. Import the first dataset cars1 and cars2.

Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

Step 5. What is the number of observations in each dataset?

Step 6. Join cars1 and cars2 into a single DataFrame called cars

Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

Step 8. Add the column owners to cars

Share on

Leave a comment

You may also enjoy

Busy From Works

Long Updates

Back to back meetings