Stocks Finance Exercise

7 minute read

In this data project we will focus on exploratory data analysis of stock prices. Keep in mind, this project is just meant to practice your visualization and pandas skills, it is not meant to be a robust financial analysis or be taken as financial advice.

NOTE: This project is extremely challenging because it will introduce a lot of new concepts and have you looking things up on your own (we’ll point you in the right direction) to try to solve the tasks issued. Feel free to just go through the solutions lecture notebook and video as a “walkthrough” project if you don’t want to have to look things up yourself. You’ll still learn a lot that way! **

We’ll focus on bank stocks and see how they progressed throughout the financial crisis all the way to early 2016. You can download the notebooks here if you want to try it all by yourself.

Now Lets get started

Exercises

Get the Data

In this section we will learn how to use pandas to directly read data from Google finance using pandas!

First we need to start with the proper imports, which we’ve already laid out for you here.

Note: You’ll need to install pandas-datareader for this to work! Pandas datareader allows you to read stock information directly from the internet Use these links for install guidance (pip install pandas-datareader), or just follow along with the video lecture.

The Imports

Already filled out for you.

from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
plt.rcParams['figure.figsize'] = (20.0, 10.0)

Data

We need to get data using pandas datareader. We will get stock information for the following banks:

Bank of America
CitiGroup
Goldman Sachs
JPMorgan Chase
Morgan Stanley
Wells Fargo

** Figure out how to get the stock data from Jan 1st 2006 to Jan 1st 2016 for each of these banks. Set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This will involve a few steps:**

Use datetime to set start and end datetime objects.
Figure out the ticker symbol for each bank.
Figure out how to use datareader to grab info on the stock.

** Use this documentation page for hints and instructions (it should just be a matter of replacing certain values. Use google finance as a source, for example:**

# Bank of America
BAC = data.DataReader("BAC", 'google', start, end)

start = pd.to_datetime("2006-1-1")
end = pd.to_datetime("2016-1-1")

BAC = data.DataReader("BAC", "yahoo", start, end)
C = data.DataReader("C", "yahoo", start, end)
GS = data.DataReader("GS", "yahoo", start, end)
JPM = data.DataReader("JPM", "yahoo", start, end)
MS = data.DataReader("MS", "yahoo", start, end)
WFC = data.DataReader("WFC", "yahoo", start, end)
# https://github.com/pydata/pandas-datareader/issues/768

** Create a list of the ticker symbols (as strings) in alphabetical order. Call this list: tickers**

tickers = ("BAC C GS JPM MS WFC").split()

** Use pd.concat to concatenate the bank dataframes together to a single data frame called bank_stocks. Set the keys argument equal to the tickers list. Also pay attention to what axis you concatenate on.**

bank_stocks = pd.concat([BAC, C, GS, JPM, MS, WFC], axis=1, keys=tickers)

** Set the column name levels (this is filled out for you):**

bank_stocks.columns.names = ['Bank Ticker','Stock Info']

** Check the head of the bank_stocks dataframe.**

bank_stocks.head()

Bank Ticker	BAC						C				...	MS				WFC
Stock Info	High	Low	Open	Close	Volume	Adj Close	High	Low	Open	Close	...	Open	Close	Volume	Adj Close	High	Low	Open	Close	Volume	Adj Close
Date
2006-01-03	47.180000	46.150002	46.919998	47.080002	16296700.0	34.811729	493.799988	481.100006	490.000000	492.899994	...	57.169998	58.310001	5377000.0	37.170383	31.975000	31.195000	31.600000	31.900000	11016400.0	20.324865
2006-01-04	47.240002	46.450001	47.000000	46.580002	17757900.0	34.442013	491.000000	483.500000	488.600006	483.799988	...	58.700001	58.349998	7977800.0	37.195881	31.820000	31.365000	31.799999	31.530001	10870000.0	20.089125
2006-01-05	46.830002	46.320000	46.580002	46.639999	14970700.0	34.486385	487.799988	484.000000	484.399994	486.200012	...	58.549999	58.509998	5778000.0	37.297882	31.555000	31.309999	31.500000	31.495001	10158000.0	20.066822
2006-01-06	46.910000	46.349998	46.799999	46.570000	12599800.0	34.434616	489.000000	482.000000	488.799988	486.200012	...	58.770000	58.570000	6889800.0	37.336124	31.775000	31.385000	31.580000	31.680000	8403800.0	20.184694
2006-01-09	46.970001	46.360001	46.720001	46.599998	15619400.0	34.456806	487.399994	483.000000	486.000000	483.899994	...	58.630001	59.189999	4144500.0	37.731342	31.825001	31.555000	31.674999	31.674999	5619600.0	20.181515

5 rows × 36 columns

EDA

Let’s explore the data a bit! Before continuing, I encourage you to check out the documentation on Multi-Level Indexing and Using .xs. Reference the solutions if you can not figure out how to use .xs(), since that will be a major part of this project.

** What is the max Close price for each bank’s stock throughout the time period?**

bank_stocks.xs(key='Close',axis=1,level='Stock Info').max()

Bank Ticker
BAC     54.900002
C      564.099976
GS     247.919998
JPM     70.080002
MS      89.300003
WFC     58.520000
dtype: float64

** Create a new empty DataFrame called returns. This dataframe will contain the returns for each bank’s stock. returns are typically defined by:**

\[r_t = \frac{p_t - p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} - 1\]

returns = pd.DataFrame()

** We can use pandas pct_change() method on the Close column to create a column representing this return value. Create a for loop that goes and for each Bank Stock Ticker creates this returns column and set’s it as a column in the returns DataFrame.**

for tick in tickers:
    returns[tick+' Return'] = bank_stocks[tick]['Close'].pct_change()
returns.head()

	BAC Return	C Return	GS Return	JPM Return	MS Return	WFC Return
2006-01-03	NaN	NaN	NaN	NaN	NaN	NaN
2006-01-04	-0.010620	-0.018462	-0.013812	-0.014183	0.000686	-0.011599
2006-01-05	0.001288	0.004961	-0.000393	0.003029	0.002742	-0.001110
2006-01-06	-0.001501	0.000000	0.014169	0.007046	0.001025	0.005874
2006-01-09	0.000644	-0.004731	0.012030	0.016242	0.010586	-0.000158

** Create a pairplot using seaborn of the returns dataframe. What stock stands out to you? Can you figure out why?**

import seaborn as sns
sns.pairplot(returns)

<seaborn.axisgrid.PairGrid at 0x2b15117fe50>

png

** Using this returns DataFrame, figure out on what dates each bank stock had the best and worst single day returns. You should notice that 4 of the banks share the same day for the worst drop, did anything significant happen that day?**

returns.idxmax()

BAC Return   2009-04-09
C Return     2008-11-24
GS Return    2008-11-24
JPM Return   2009-01-21
MS Return    2008-10-13
WFC Return   2008-07-16
dtype: datetime64[ns]

returns.idxmin()

BAC Return   2009-01-20
C Return     2009-02-27
GS Return    2009-01-20
JPM Return   2009-01-20
MS Return    2008-10-09
WFC Return   2009-01-20
dtype: datetime64[ns]

There is something happen in 2009-01-20, Let’s check out. According to https://www.onthisday.com/date/2009/january/20. Barack Obama, inaugurated as the 44th President of the United States of America, becomes the United States’ first African-American president

** You should have noticed that Citigroup’s largest drop and biggest gain were very close to one another, did anythign significant happen in that time frame? **

** Take a look at the standard deviation of the returns, which stock would you classify as the riskiest over the entire time period? Which would you classify as the riskiest for the year 2015?**

returns.std()

BAC Return    0.036647
C Return      0.038672
GS Return     0.025390
JPM Return    0.027667
MS Return     0.037819
WFC Return    0.030238
dtype: float64

returns.loc["2015-01-01" : "2015-12-31"].std()

BAC Return    0.016163
C Return      0.015289
GS Return     0.014046
JPM Return    0.014017
MS Return     0.016249
WFC Return    0.012591
dtype: float64

According to Standart Deviation, Morgan Stanley is the most riskiest to get invested in 2015

** Create a distplot using seaborn of the 2015 returns for Morgan Stanley **

sns.distplot(returns["MS Return"].loc["2015-01-01" : "2015-12-31"])

<AxesSubplot:xlabel='MS Return'>

png

** Create a distplot using seaborn of the 2008 returns for CitiGroup **

sns.distplot(returns["C Return"].loc["2008-01-01" : "2008-12-31"])

<AxesSubplot:xlabel='C Return'>

png

More Visualization

A lot of this project will focus on visualizations. Feel free to use any of your preferred visualization libraries to try to recreate the described plots below, seaborn, matplotlib, plotly and cufflinks, or just pandas.

Imports

** Create a line plot showing Close price for each bank for the entire index of time. (Hint: Try using a for loop, or use .xs to get a cross section of the data.)**

sns.lineplot(data=returns, palette="tab10")

<AxesSubplot:>

png

sns.lineplot(data=returns.rolling(window=90).mean(), palette="tab10")

<AxesSubplot:>

png

sns.lineplot(data=returns.rolling(window=365).mean(), palette="tab10")

<AxesSubplot:>

png

Moving Averages

Let’s analyze the moving averages for these stocks in the year 2008.

** Plot the rolling 30 day average against the Close Price for Bank Of America’s stock for the year 2008**

sns.lineplot(data=BAC["Close"].loc['2008-01-01':'2009-01-01'])
sns.lineplot(data=BAC["Close"].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean())

<AxesSubplot:>

png

** Create a heatmap of the correlation between the stocks Close Price.**

sns.heatmap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr())

<AxesSubplot:xlabel='Bank Ticker', ylabel='Bank Ticker'>

png

** Optional: Use seaborn’s clustermap to cluster the correlations together:**

sns.clustermap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr())

<seaborn.matrix.ClusterGrid at 0x2b166301c70>

png

Great Job!

Definitely a lot of more specific finance topics here, so don’t worry if you didn’t understand them all! The only thing you should be concerned with understanding are the basic pandas and visualization operations.

Share on

Twitter Facebook LinkedIn

Gama Candra Tri Kartika

Stocks Finance Exercise

Exercises

Get the Data

The Imports

Data

EDA

More Visualization

Imports

Moving Averages

Great Job!

Share on

Leave a comment

You may also enjoy

Busy From Works

Long Updates

Back to back meetings

Lucky weekends