8 minute read

The continuity of my practice on Pandas exercise from guisapmora.

United States - Crime Rates - 1960 - 2014

Introduction:

This time you will create a data

Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Step 1. Import the necessary libraries Dataset

import pandas as pd

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called crime.

url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Apply/US_Crime_Rates/US_Crime_Rates_1960_2014.csv'
crime = pd.read_csv(url)

Step 4. What is the type of the columns?

crime.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype
---  ------              --------------  -----
 0   Year                55 non-null     int64
 1   Population          55 non-null     int64
 2   Total               55 non-null     int64
 3   Violent             55 non-null     int64
 4   Property            55 non-null     int64
 5   Murder              55 non-null     int64
 6   Forcible_Rape       55 non-null     int64
 7   Robbery             55 non-null     int64
 8   Aggravated_assault  55 non-null     int64
 9   Burglary            55 non-null     int64
 10  Larceny_Theft       55 non-null     int64
 11  Vehicle_Theft       55 non-null     int64
dtypes: int64(12)
memory usage: 5.3 KB
Have you noticed that the type of Year is int64. But pandas has a different type to work with Time Series. Let’s see it now.

Step 5. Convert the type of the column Year to datetime64

crime['Year'] = pd.to_datetime(crime['Year'], format='%Y')

Step 6. Set the Year column as the index of the dataframe

crime.set_index(['Year'])
Population Total Violent Property Murder Forcible_Rape Robbery Aggravated_assault Burglary Larceny_Theft Vehicle_Theft
Year
1960-01-01 179323175 3384200 288460 3095700 9110 17190 107840 154320 912100 1855400 328200
1961-01-01 182992000 3488000 289390 3198600 8740 17220 106670 156760 949600 1913000 336000
1962-01-01 185771000 3752200 301510 3450700 8530 17550 110860 164570 994300 2089600 366800
1963-01-01 188483000 4109500 316970 3792500 8640 17650 116470 174210 1086400 2297800 408300
1964-01-01 191141000 4564600 364220 4200400 9360 21420 130390 203050 1213200 2514400 472800
1965-01-01 193526000 4739400 387390 4352000 9960 23410 138690 215330 1282500 2572600 496900
1966-01-01 195576000 5223500 430180 4793300 11040 25820 157990 235330 1410100 2822000 561200
1967-01-01 197457000 5903400 499930 5403500 12240 27620 202910 257160 1632100 3111600 659800
1968-01-01 199399000 6720200 595010 6125200 13800 31670 262840 286700 1858900 3482700 783600
1969-01-01 201385000 7410900 661870 6749000 14760 37170 298850 311090 1981900 3888600 878500
1970-01-01 203235298 8098000 738820 7359200 16000 37990 349860 334970 2205000 4225800 928400
1971-01-01 206212000 8588200 816500 7771700 17780 42260 387700 368760 2399300 4424200 948200
1972-01-01 208230000 8248800 834900 7413900 18670 46850 376290 393090 2375500 4151200 887200
1973-01-01 209851000 8718100 875910 7842200 19640 51400 384220 420650 2565500 4347900 928800
1974-01-01 211392000 10253400 974720 9278700 20710 55400 442400 456210 3039200 5262500 977100
1975-01-01 213124000 11292400 1039710 10252700 20510 56090 470500 492620 3265300 5977700 1009600
1976-01-01 214659000 11349700 1004210 10345500 18780 57080 427810 500530 3108700 6270800 966000
1977-01-01 216332000 10984500 1029580 9955000 19120 63500 412610 534350 3071500 5905700 977700
1978-01-01 218059000 11209000 1085550 10123400 19560 67610 426930 571460 3128300 5991000 1004100
1979-01-01 220099000 12249500 1208030 11041500 21460 76390 480700 629480 3327700 6601000 1112800
1980-01-01 225349264 13408300 1344520 12063700 23040 82990 565840 672650 3795200 7136900 1131700
1981-01-01 229146000 13423800 1361820 12061900 22520 82500 592910 663900 3779700 7194400 1087800
1982-01-01 231534000 12974400 1322390 11652000 21010 78770 553130 669480 3447100 7142500 1062400
1983-01-01 233981000 12108600 1258090 10850500 19310 78920 506570 653290 3129900 6712800 1007900
1984-01-01 236158000 11881800 1273280 10608500 18690 84230 485010 685350 2984400 6591900 1032200
1985-01-01 238740000 12431400 1328800 11102600 18980 88670 497870 723250 3073300 6926400 1102900
1986-01-01 240132887 13211869 1489169 11722700 20613 91459 542775 834322 3241410 7257153 1224137
1987-01-01 242282918 13508700 1483999 12024700 20096 91110 517704 855088 3236184 7499900 1288674
1988-01-01 245807000 13923100 1566220 12356900 20680 92490 542970 910090 3218100 7705900 1432900
1989-01-01 248239000 14251400 1646040 12605400 21500 94500 578330 951710 3168200 7872400 1564800
1990-01-01 248709873 14475600 1820130 12655500 23440 102560 639270 1054860 3073900 7945700 1635900
1991-01-01 252177000 14872900 1911770 12961100 24700 106590 687730 1092740 3157200 8142200 1661700
1992-01-01 255082000 14438200 1932270 12505900 23760 109060 672480 1126970 2979900 7915200 1610800
1993-01-01 257908000 14144800 1926020 12218800 24530 106010 659870 1135610 2834800 7820900 1563100
1994-01-01 260341000 13989500 1857670 12131900 23330 102220 618950 1113180 2712800 7879800 1539300
1995-01-01 262755000 13862700 1798790 12063900 21610 97470 580510 1099210 2593800 7997700 1472400
1996-01-01 265228572 13493863 1688540 11805300 19650 96250 535590 1037050 2506400 7904700 1394200
1997-01-01 267637000 13194571 1634770 11558175 18208 96153 498534 1023201 2460526 7743760 1354189
1998-01-01 270296000 12475634 1531044 10944590 16914 93103 446625 974402 2329950 7373886 1240754
1999-01-01 272690813 11634378 1426044 10208334 15522 89411 409371 911740 2100739 6955520 1152075
2000-01-01 281421906 11608072 1425486 10182586 15586 90178 408016 911706 2050992 6971590 1160002
2001-01-01 285317559 11876669 1439480 10437480 16037 90863 423557 909023 2116531 7092267 1228391
2002-01-01 287973924 11878954 1423677 10455277 16229 95235 420806 891407 2151252 7057370 1246646
2003-01-01 290690788 11826538 1383676 10442862 16528 93883 414235 859030 2154834 7026802 1261226
2004-01-01 293656842 11679474 1360088 10319386 16148 95089 401470 847381 2144446 6937089 1237851
2005-01-01 296507061 11565499 1390745 10174754 16740 94347 417438 862220 2155448 6783447 1235859
2006-01-01 299398484 11401511 1418043 9983568 17030 92757 447403 860853 2183746 6607013 1192809
2007-01-01 301621157 11251828 1408337 9843481 16929 90427 445125 855856 2176140 6568572 1095769
2008-01-01 304374846 11160543 1392628 9767915 16442 90479 443574 842134 2228474 6588046 958629
2009-01-01 307006550 10762956 1325896 9337060 15399 89241 408742 812514 2203313 6338095 795652
2010-01-01 309330219 10363873 1251248 9112625 14772 85593 369089 781844 2168457 6204601 739565
2011-01-01 311587816 10258774 1206031 9052743 14661 84175 354772 752423 2185140 6151095 716508
2012-01-01 313873685 10219059 1217067 9001992 14866 85141 355051 762009 2109932 6168874 723186
2013-01-01 316497531 9850445 1199684 8650761 14319 82109 345095 726575 1931835 6018632 700294
2014-01-01 318857056 9475816 1197987 8277829 14249 84041 325802 741291 1729806 5858496 689527

Step 7. Delete the Total column

crime = crime.drop(['Total'], axis=1)

Step 8. Group the year by decades and sum the values

Pay attention to the Population column number, summing this column is a mistake

crime.groupby((crime['Year'].dt.year // 10) * 10).sum()
Population Violent Property Murder Forcible_Rape Robbery Aggravated_assault Burglary Larceny_Theft Vehicle_Theft
Year
1960 1915053175 4134930 45160900 106180 236720 1633510 2158520 13321100 26547700 5292100
1970 2121193298 9607930 91383800 192230 554570 4159020 4702120 28486000 53157800 9739900
1980 2371370069 14074328 117048900 206439 865639 5383109 7619130 33073494 72040253 11935411
1990 2612825258 17527048 119053499 211664 998827 5748930 10568963 26750015 77679366 14624418
2000 2947969117 13968056 100944369 163068 922499 4230366 8652124 21565176 67970291 11412834
2010 1570146307 6072017 44095950 72867 421059 1749809 3764142 10125170 30401698 3569080

Step 9. What is the most dangerous decade to live in the US?

crime.groupby((crime['Year'].dt.year // 10) * 10).sum().\
drop(['Population'], axis=1).sum(axis=1).sort_values(ascending=False).head(1)
Year
1990    273162730
dtype: int64

Student Alcohol Consumption Dateset

Introduction:

This time you will download a dataset from the UCI.

Step 1. Import the necessary libraries

import pandas as pd

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called df.

url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Apply/Students_Alcohol_Consumption/student-mat.csv'
df = pd.read_csv(url)

Step 4. For the purpose of this exercise slice the dataframe from ‘school’ until the ‘guardian’ column

df = df.loc[:, 'school':'guardian']

Step 5. Create a lambda function that will capitalize strings.

cap_fun = lambda x : x.capitalize()

Step 6. Capitalize both Mjob and Fjob

df['Mjob'].apply(cap_fun)
df['Fjob'].apply(cap_fun)
0       Teacher
1         Other
2         Other
3      Services
4         Other
         ...   
390    Services
391    Services
392       Other
393       Other
394     At_home
Name: Fjob, Length: 395, dtype: object

Step 7. Print the last elements of the data set.

df.tail()
school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian
390 MS M 20 U LE3 A 2 2 Services Services course other
391 MS M 17 U LE3 T 3 1 Services Services course mother
392 MS M 21 R GT3 T 1 1 Other Other course other
393 MS M 18 R LE3 T 3 2 Services Other course mother
394 MS M 19 U LE3 T 1 1 Other At_home course father

Step 8. Did you notice the original dataframe is still lowercase? Why is that? Fix it and capitalize Mjob and Fjob.

df['Mjob'] = df['Mjob'].apply(cap_fun)
df['Fjob'] = df['Fjob'].apply(cap_fun)
majority = lambda x: True if(x > 17) else False
df['legal_drinker'] = df['age'].apply(majority)

Step 10. Multiply every number of the dataset by 10.

I know this makes no sense, don’t forget it is just an exercise
df.apply(lambda x : x*10)
school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian legal_drinker
0 GPGPGPGPGPGPGPGPGPGP FFFFFFFFFF 180 UUUUUUUUUU GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 AAAAAAAAAA 40 40 At_homeAt_homeAt_homeAt_homeAt_homeAt_homeAt_h... TeacherTeacherTeacherTeacherTeacherTeacherTeac... coursecoursecoursecoursecoursecoursecoursecour... mothermothermothermothermothermothermothermoth... 10
1 GPGPGPGPGPGPGPGPGPGP FFFFFFFFFF 170 UUUUUUUUUU GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 TTTTTTTTTT 10 10 At_homeAt_homeAt_homeAt_homeAt_homeAt_homeAt_h... OtherOtherOtherOtherOtherOtherOtherOtherOtherO... coursecoursecoursecoursecoursecoursecoursecour... fatherfatherfatherfatherfatherfatherfatherfath... 0
2 GPGPGPGPGPGPGPGPGPGP FFFFFFFFFF 150 UUUUUUUUUU LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 TTTTTTTTTT 10 10 At_homeAt_homeAt_homeAt_homeAt_homeAt_homeAt_h... OtherOtherOtherOtherOtherOtherOtherOtherOtherO... otherotherotherotherotherotherotherotherothero... mothermothermothermothermothermothermothermoth... 0
3 GPGPGPGPGPGPGPGPGPGP FFFFFFFFFF 150 UUUUUUUUUU GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 TTTTTTTTTT 40 20 HealthHealthHealthHealthHealthHealthHealthHeal... ServicesServicesServicesServicesServicesServic... homehomehomehomehomehomehomehomehomehome mothermothermothermothermothermothermothermoth... 0
4 GPGPGPGPGPGPGPGPGPGP FFFFFFFFFF 160 UUUUUUUUUU GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 TTTTTTTTTT 30 30 OtherOtherOtherOtherOtherOtherOtherOtherOtherO... OtherOtherOtherOtherOtherOtherOtherOtherOtherO... homehomehomehomehomehomehomehomehomehome fatherfatherfatherfatherfatherfatherfatherfath... 0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
390 MSMSMSMSMSMSMSMSMSMS MMMMMMMMMM 200 UUUUUUUUUU LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 AAAAAAAAAA 20 20 ServicesServicesServicesServicesServicesServic... ServicesServicesServicesServicesServicesServic... coursecoursecoursecoursecoursecoursecoursecour... otherotherotherotherotherotherotherotherothero... 10
391 MSMSMSMSMSMSMSMSMSMS MMMMMMMMMM 170 UUUUUUUUUU LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 TTTTTTTTTT 30 10 ServicesServicesServicesServicesServicesServic... ServicesServicesServicesServicesServicesServic... coursecoursecoursecoursecoursecoursecoursecour... mothermothermothermothermothermothermothermoth... 0
392 MSMSMSMSMSMSMSMSMSMS MMMMMMMMMM 210 RRRRRRRRRR GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 TTTTTTTTTT 10 10 OtherOtherOtherOtherOtherOtherOtherOtherOtherO... OtherOtherOtherOtherOtherOtherOtherOtherOtherO... coursecoursecoursecoursecoursecoursecoursecour... otherotherotherotherotherotherotherotherothero... 10
393 MSMSMSMSMSMSMSMSMSMS MMMMMMMMMM 180 RRRRRRRRRR LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 TTTTTTTTTT 30 20 ServicesServicesServicesServicesServicesServic... OtherOtherOtherOtherOtherOtherOtherOtherOtherO... coursecoursecoursecoursecoursecoursecoursecour... mothermothermothermothermothermothermothermoth... 10
394 MSMSMSMSMSMSMSMSMSMS MMMMMMMMMM 190 UUUUUUUUUU LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 TTTTTTTTTT 10 10 OtherOtherOtherOtherOtherOtherOtherOtherOtherO... At_homeAt_homeAt_homeAt_homeAt_homeAt_homeAt_h... coursecoursecoursecoursecoursecoursecoursecour... fatherfatherfatherfatherfatherfatherfatherfath... 10

395 rows × 13 columns

Leave a comment