Pandas Exercise 4 : Apply
The continuity of my practice on Pandas exercise from guisapmora.
United States - Crime Rates - 1960 - 2014
Introduction:
This time you will create a data
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.
Step 1. Import the necessary libraries Dataset
import pandas as pd
Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called crime.
url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Apply/US_Crime_Rates/US_Crime_Rates_1960_2014.csv'
crime = pd.read_csv(url)
Step 4. What is the type of the columns?
crime.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 55 non-null int64
1 Population 55 non-null int64
2 Total 55 non-null int64
3 Violent 55 non-null int64
4 Property 55 non-null int64
5 Murder 55 non-null int64
6 Forcible_Rape 55 non-null int64
7 Robbery 55 non-null int64
8 Aggravated_assault 55 non-null int64
9 Burglary 55 non-null int64
10 Larceny_Theft 55 non-null int64
11 Vehicle_Theft 55 non-null int64
dtypes: int64(12)
memory usage: 5.3 KB
Have you noticed that the type of Year is int64. But pandas has a different type to work with Time Series. Let’s see it now.
Step 5. Convert the type of the column Year to datetime64
crime['Year'] = pd.to_datetime(crime['Year'], format='%Y')
Step 6. Set the Year column as the index of the dataframe
crime.set_index(['Year'])
Population | Total | Violent | Property | Murder | Forcible_Rape | Robbery | Aggravated_assault | Burglary | Larceny_Theft | Vehicle_Theft | |
---|---|---|---|---|---|---|---|---|---|---|---|
Year | |||||||||||
1960-01-01 | 179323175 | 3384200 | 288460 | 3095700 | 9110 | 17190 | 107840 | 154320 | 912100 | 1855400 | 328200 |
1961-01-01 | 182992000 | 3488000 | 289390 | 3198600 | 8740 | 17220 | 106670 | 156760 | 949600 | 1913000 | 336000 |
1962-01-01 | 185771000 | 3752200 | 301510 | 3450700 | 8530 | 17550 | 110860 | 164570 | 994300 | 2089600 | 366800 |
1963-01-01 | 188483000 | 4109500 | 316970 | 3792500 | 8640 | 17650 | 116470 | 174210 | 1086400 | 2297800 | 408300 |
1964-01-01 | 191141000 | 4564600 | 364220 | 4200400 | 9360 | 21420 | 130390 | 203050 | 1213200 | 2514400 | 472800 |
1965-01-01 | 193526000 | 4739400 | 387390 | 4352000 | 9960 | 23410 | 138690 | 215330 | 1282500 | 2572600 | 496900 |
1966-01-01 | 195576000 | 5223500 | 430180 | 4793300 | 11040 | 25820 | 157990 | 235330 | 1410100 | 2822000 | 561200 |
1967-01-01 | 197457000 | 5903400 | 499930 | 5403500 | 12240 | 27620 | 202910 | 257160 | 1632100 | 3111600 | 659800 |
1968-01-01 | 199399000 | 6720200 | 595010 | 6125200 | 13800 | 31670 | 262840 | 286700 | 1858900 | 3482700 | 783600 |
1969-01-01 | 201385000 | 7410900 | 661870 | 6749000 | 14760 | 37170 | 298850 | 311090 | 1981900 | 3888600 | 878500 |
1970-01-01 | 203235298 | 8098000 | 738820 | 7359200 | 16000 | 37990 | 349860 | 334970 | 2205000 | 4225800 | 928400 |
1971-01-01 | 206212000 | 8588200 | 816500 | 7771700 | 17780 | 42260 | 387700 | 368760 | 2399300 | 4424200 | 948200 |
1972-01-01 | 208230000 | 8248800 | 834900 | 7413900 | 18670 | 46850 | 376290 | 393090 | 2375500 | 4151200 | 887200 |
1973-01-01 | 209851000 | 8718100 | 875910 | 7842200 | 19640 | 51400 | 384220 | 420650 | 2565500 | 4347900 | 928800 |
1974-01-01 | 211392000 | 10253400 | 974720 | 9278700 | 20710 | 55400 | 442400 | 456210 | 3039200 | 5262500 | 977100 |
1975-01-01 | 213124000 | 11292400 | 1039710 | 10252700 | 20510 | 56090 | 470500 | 492620 | 3265300 | 5977700 | 1009600 |
1976-01-01 | 214659000 | 11349700 | 1004210 | 10345500 | 18780 | 57080 | 427810 | 500530 | 3108700 | 6270800 | 966000 |
1977-01-01 | 216332000 | 10984500 | 1029580 | 9955000 | 19120 | 63500 | 412610 | 534350 | 3071500 | 5905700 | 977700 |
1978-01-01 | 218059000 | 11209000 | 1085550 | 10123400 | 19560 | 67610 | 426930 | 571460 | 3128300 | 5991000 | 1004100 |
1979-01-01 | 220099000 | 12249500 | 1208030 | 11041500 | 21460 | 76390 | 480700 | 629480 | 3327700 | 6601000 | 1112800 |
1980-01-01 | 225349264 | 13408300 | 1344520 | 12063700 | 23040 | 82990 | 565840 | 672650 | 3795200 | 7136900 | 1131700 |
1981-01-01 | 229146000 | 13423800 | 1361820 | 12061900 | 22520 | 82500 | 592910 | 663900 | 3779700 | 7194400 | 1087800 |
1982-01-01 | 231534000 | 12974400 | 1322390 | 11652000 | 21010 | 78770 | 553130 | 669480 | 3447100 | 7142500 | 1062400 |
1983-01-01 | 233981000 | 12108600 | 1258090 | 10850500 | 19310 | 78920 | 506570 | 653290 | 3129900 | 6712800 | 1007900 |
1984-01-01 | 236158000 | 11881800 | 1273280 | 10608500 | 18690 | 84230 | 485010 | 685350 | 2984400 | 6591900 | 1032200 |
1985-01-01 | 238740000 | 12431400 | 1328800 | 11102600 | 18980 | 88670 | 497870 | 723250 | 3073300 | 6926400 | 1102900 |
1986-01-01 | 240132887 | 13211869 | 1489169 | 11722700 | 20613 | 91459 | 542775 | 834322 | 3241410 | 7257153 | 1224137 |
1987-01-01 | 242282918 | 13508700 | 1483999 | 12024700 | 20096 | 91110 | 517704 | 855088 | 3236184 | 7499900 | 1288674 |
1988-01-01 | 245807000 | 13923100 | 1566220 | 12356900 | 20680 | 92490 | 542970 | 910090 | 3218100 | 7705900 | 1432900 |
1989-01-01 | 248239000 | 14251400 | 1646040 | 12605400 | 21500 | 94500 | 578330 | 951710 | 3168200 | 7872400 | 1564800 |
1990-01-01 | 248709873 | 14475600 | 1820130 | 12655500 | 23440 | 102560 | 639270 | 1054860 | 3073900 | 7945700 | 1635900 |
1991-01-01 | 252177000 | 14872900 | 1911770 | 12961100 | 24700 | 106590 | 687730 | 1092740 | 3157200 | 8142200 | 1661700 |
1992-01-01 | 255082000 | 14438200 | 1932270 | 12505900 | 23760 | 109060 | 672480 | 1126970 | 2979900 | 7915200 | 1610800 |
1993-01-01 | 257908000 | 14144800 | 1926020 | 12218800 | 24530 | 106010 | 659870 | 1135610 | 2834800 | 7820900 | 1563100 |
1994-01-01 | 260341000 | 13989500 | 1857670 | 12131900 | 23330 | 102220 | 618950 | 1113180 | 2712800 | 7879800 | 1539300 |
1995-01-01 | 262755000 | 13862700 | 1798790 | 12063900 | 21610 | 97470 | 580510 | 1099210 | 2593800 | 7997700 | 1472400 |
1996-01-01 | 265228572 | 13493863 | 1688540 | 11805300 | 19650 | 96250 | 535590 | 1037050 | 2506400 | 7904700 | 1394200 |
1997-01-01 | 267637000 | 13194571 | 1634770 | 11558175 | 18208 | 96153 | 498534 | 1023201 | 2460526 | 7743760 | 1354189 |
1998-01-01 | 270296000 | 12475634 | 1531044 | 10944590 | 16914 | 93103 | 446625 | 974402 | 2329950 | 7373886 | 1240754 |
1999-01-01 | 272690813 | 11634378 | 1426044 | 10208334 | 15522 | 89411 | 409371 | 911740 | 2100739 | 6955520 | 1152075 |
2000-01-01 | 281421906 | 11608072 | 1425486 | 10182586 | 15586 | 90178 | 408016 | 911706 | 2050992 | 6971590 | 1160002 |
2001-01-01 | 285317559 | 11876669 | 1439480 | 10437480 | 16037 | 90863 | 423557 | 909023 | 2116531 | 7092267 | 1228391 |
2002-01-01 | 287973924 | 11878954 | 1423677 | 10455277 | 16229 | 95235 | 420806 | 891407 | 2151252 | 7057370 | 1246646 |
2003-01-01 | 290690788 | 11826538 | 1383676 | 10442862 | 16528 | 93883 | 414235 | 859030 | 2154834 | 7026802 | 1261226 |
2004-01-01 | 293656842 | 11679474 | 1360088 | 10319386 | 16148 | 95089 | 401470 | 847381 | 2144446 | 6937089 | 1237851 |
2005-01-01 | 296507061 | 11565499 | 1390745 | 10174754 | 16740 | 94347 | 417438 | 862220 | 2155448 | 6783447 | 1235859 |
2006-01-01 | 299398484 | 11401511 | 1418043 | 9983568 | 17030 | 92757 | 447403 | 860853 | 2183746 | 6607013 | 1192809 |
2007-01-01 | 301621157 | 11251828 | 1408337 | 9843481 | 16929 | 90427 | 445125 | 855856 | 2176140 | 6568572 | 1095769 |
2008-01-01 | 304374846 | 11160543 | 1392628 | 9767915 | 16442 | 90479 | 443574 | 842134 | 2228474 | 6588046 | 958629 |
2009-01-01 | 307006550 | 10762956 | 1325896 | 9337060 | 15399 | 89241 | 408742 | 812514 | 2203313 | 6338095 | 795652 |
2010-01-01 | 309330219 | 10363873 | 1251248 | 9112625 | 14772 | 85593 | 369089 | 781844 | 2168457 | 6204601 | 739565 |
2011-01-01 | 311587816 | 10258774 | 1206031 | 9052743 | 14661 | 84175 | 354772 | 752423 | 2185140 | 6151095 | 716508 |
2012-01-01 | 313873685 | 10219059 | 1217067 | 9001992 | 14866 | 85141 | 355051 | 762009 | 2109932 | 6168874 | 723186 |
2013-01-01 | 316497531 | 9850445 | 1199684 | 8650761 | 14319 | 82109 | 345095 | 726575 | 1931835 | 6018632 | 700294 |
2014-01-01 | 318857056 | 9475816 | 1197987 | 8277829 | 14249 | 84041 | 325802 | 741291 | 1729806 | 5858496 | 689527 |
Step 7. Delete the Total column
crime = crime.drop(['Total'], axis=1)
Step 8. Group the year by decades and sum the values
Pay attention to the Population column number, summing this column is a mistake
crime.groupby((crime['Year'].dt.year // 10) * 10).sum()
Population | Violent | Property | Murder | Forcible_Rape | Robbery | Aggravated_assault | Burglary | Larceny_Theft | Vehicle_Theft | |
---|---|---|---|---|---|---|---|---|---|---|
Year | ||||||||||
1960 | 1915053175 | 4134930 | 45160900 | 106180 | 236720 | 1633510 | 2158520 | 13321100 | 26547700 | 5292100 |
1970 | 2121193298 | 9607930 | 91383800 | 192230 | 554570 | 4159020 | 4702120 | 28486000 | 53157800 | 9739900 |
1980 | 2371370069 | 14074328 | 117048900 | 206439 | 865639 | 5383109 | 7619130 | 33073494 | 72040253 | 11935411 |
1990 | 2612825258 | 17527048 | 119053499 | 211664 | 998827 | 5748930 | 10568963 | 26750015 | 77679366 | 14624418 |
2000 | 2947969117 | 13968056 | 100944369 | 163068 | 922499 | 4230366 | 8652124 | 21565176 | 67970291 | 11412834 |
2010 | 1570146307 | 6072017 | 44095950 | 72867 | 421059 | 1749809 | 3764142 | 10125170 | 30401698 | 3569080 |
Step 9. What is the most dangerous decade to live in the US?
crime.groupby((crime['Year'].dt.year // 10) * 10).sum().\
drop(['Population'], axis=1).sum(axis=1).sort_values(ascending=False).head(1)
Year
1990 273162730
dtype: int64
Student Alcohol Consumption Dateset
Introduction:
This time you will download a dataset from the UCI.
Step 1. Import the necessary libraries
import pandas as pd
Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called df.
url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Apply/Students_Alcohol_Consumption/student-mat.csv'
df = pd.read_csv(url)
Step 4. For the purpose of this exercise slice the dataframe from ‘school’ until the ‘guardian’ column
df = df.loc[:, 'school':'guardian']
Step 5. Create a lambda function that will capitalize strings.
cap_fun = lambda x : x.capitalize()
Step 6. Capitalize both Mjob and Fjob
df['Mjob'].apply(cap_fun)
df['Fjob'].apply(cap_fun)
0 Teacher
1 Other
2 Other
3 Services
4 Other
...
390 Services
391 Services
392 Other
393 Other
394 At_home
Name: Fjob, Length: 395, dtype: object
Step 7. Print the last elements of the data set.
df.tail()
school | sex | age | address | famsize | Pstatus | Medu | Fedu | Mjob | Fjob | reason | guardian | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
390 | MS | M | 20 | U | LE3 | A | 2 | 2 | Services | Services | course | other |
391 | MS | M | 17 | U | LE3 | T | 3 | 1 | Services | Services | course | mother |
392 | MS | M | 21 | R | GT3 | T | 1 | 1 | Other | Other | course | other |
393 | MS | M | 18 | R | LE3 | T | 3 | 2 | Services | Other | course | mother |
394 | MS | M | 19 | U | LE3 | T | 1 | 1 | Other | At_home | course | father |
Step 8. Did you notice the original dataframe is still lowercase? Why is that? Fix it and capitalize Mjob and Fjob.
df['Mjob'] = df['Mjob'].apply(cap_fun)
df['Fjob'] = df['Fjob'].apply(cap_fun)
Step 9. Create a function called majority that returns a boolean value to a new column called legal_drinker (Consider majority as older than 17 years old)
majority = lambda x: True if(x > 17) else False
df['legal_drinker'] = df['age'].apply(majority)
Step 10. Multiply every number of the dataset by 10.
I know this makes no sense, don’t forget it is just an exercise
df.apply(lambda x : x*10)
school | sex | age | address | famsize | Pstatus | Medu | Fedu | Mjob | Fjob | reason | guardian | legal_drinker | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GPGPGPGPGPGPGPGPGPGP | FFFFFFFFFF | 180 | UUUUUUUUUU | GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 | AAAAAAAAAA | 40 | 40 | At_homeAt_homeAt_homeAt_homeAt_homeAt_homeAt_h... | TeacherTeacherTeacherTeacherTeacherTeacherTeac... | coursecoursecoursecoursecoursecoursecoursecour... | mothermothermothermothermothermothermothermoth... | 10 |
1 | GPGPGPGPGPGPGPGPGPGP | FFFFFFFFFF | 170 | UUUUUUUUUU | GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 | TTTTTTTTTT | 10 | 10 | At_homeAt_homeAt_homeAt_homeAt_homeAt_homeAt_h... | OtherOtherOtherOtherOtherOtherOtherOtherOtherO... | coursecoursecoursecoursecoursecoursecoursecour... | fatherfatherfatherfatherfatherfatherfatherfath... | 0 |
2 | GPGPGPGPGPGPGPGPGPGP | FFFFFFFFFF | 150 | UUUUUUUUUU | LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 | TTTTTTTTTT | 10 | 10 | At_homeAt_homeAt_homeAt_homeAt_homeAt_homeAt_h... | OtherOtherOtherOtherOtherOtherOtherOtherOtherO... | otherotherotherotherotherotherotherotherothero... | mothermothermothermothermothermothermothermoth... | 0 |
3 | GPGPGPGPGPGPGPGPGPGP | FFFFFFFFFF | 150 | UUUUUUUUUU | GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 | TTTTTTTTTT | 40 | 20 | HealthHealthHealthHealthHealthHealthHealthHeal... | ServicesServicesServicesServicesServicesServic... | homehomehomehomehomehomehomehomehomehome | mothermothermothermothermothermothermothermoth... | 0 |
4 | GPGPGPGPGPGPGPGPGPGP | FFFFFFFFFF | 160 | UUUUUUUUUU | GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 | TTTTTTTTTT | 30 | 30 | OtherOtherOtherOtherOtherOtherOtherOtherOtherO... | OtherOtherOtherOtherOtherOtherOtherOtherOtherO... | homehomehomehomehomehomehomehomehomehome | fatherfatherfatherfatherfatherfatherfatherfath... | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
390 | MSMSMSMSMSMSMSMSMSMS | MMMMMMMMMM | 200 | UUUUUUUUUU | LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 | AAAAAAAAAA | 20 | 20 | ServicesServicesServicesServicesServicesServic... | ServicesServicesServicesServicesServicesServic... | coursecoursecoursecoursecoursecoursecoursecour... | otherotherotherotherotherotherotherotherothero... | 10 |
391 | MSMSMSMSMSMSMSMSMSMS | MMMMMMMMMM | 170 | UUUUUUUUUU | LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 | TTTTTTTTTT | 30 | 10 | ServicesServicesServicesServicesServicesServic... | ServicesServicesServicesServicesServicesServic... | coursecoursecoursecoursecoursecoursecoursecour... | mothermothermothermothermothermothermothermoth... | 0 |
392 | MSMSMSMSMSMSMSMSMSMS | MMMMMMMMMM | 210 | RRRRRRRRRR | GT3GT3GT3GT3GT3GT3GT3GT3GT3GT3 | TTTTTTTTTT | 10 | 10 | OtherOtherOtherOtherOtherOtherOtherOtherOtherO... | OtherOtherOtherOtherOtherOtherOtherOtherOtherO... | coursecoursecoursecoursecoursecoursecoursecour... | otherotherotherotherotherotherotherotherothero... | 10 |
393 | MSMSMSMSMSMSMSMSMSMS | MMMMMMMMMM | 180 | RRRRRRRRRR | LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 | TTTTTTTTTT | 30 | 20 | ServicesServicesServicesServicesServicesServic... | OtherOtherOtherOtherOtherOtherOtherOtherOtherO... | coursecoursecoursecoursecoursecoursecoursecour... | mothermothermothermothermothermothermothermoth... | 10 |
394 | MSMSMSMSMSMSMSMSMSMS | MMMMMMMMMM | 190 | UUUUUUUUUU | LE3LE3LE3LE3LE3LE3LE3LE3LE3LE3 | TTTTTTTTTT | 10 | 10 | OtherOtherOtherOtherOtherOtherOtherOtherOtherO... | At_homeAt_homeAt_homeAt_homeAt_homeAt_homeAt_h... | coursecoursecoursecoursecoursecoursecoursecour... | fatherfatherfatherfatherfatherfatherfatherfath... | 10 |
395 rows × 13 columns
Leave a comment