3 minute read

This is going to be a short post. This is really interesting for me personally. As a Data Scientist and avid Dota 2 player, what could be better than doing data analysis on Dota 2 matches? In this post, I used the API from opendota.com. This API is free to use at least for your personal Dota 2 data which I assume is not that much and not exceeding the free tier limits. For the data cleaning and data collection, I will use Pandas and requests.

Get the necessary library

import pandas as pd
import numpy as np
import requests

Check your call status just to make sure.

r = requests.get('https://api.opendota.com/api')
r.status_code

If it is showing 200 so it is successfully accessing the API. Now put in your personal Dota2 ID. You can find it based on your profile in opendota or the ID from your in-game.

Make a call on Dota 2 player API

myDota2ID = '296360583'

r = requests.get('https://api.opendota.com/api/players/{}/matches'.format(myDota2ID))

jsondata = pd.json_normalize(r.json())
jsondata.sample(5)
match_id player_slot radiant_win duration game_mode lobby_type hero_id start_time version kills deaths assists skill average_rank leaver_status party_size
157 6736613083 1 True 1967 22 0 119 2022-09-02 10:17:16 NaN 5 8 23 NaN 62.0 0 1.0
2010 4751540797 130 True 533 22 0 119 2019-05-14 17:03:25 21.0 1 1 3 1.0 NaN 3 5.0
284 6646342814 132 False 2635 22 0 96 2022-07-03 18:00:18 21.0 5 6 24 NaN NaN 0 1.0
1569 5296101869 1 True 2232 22 0 128 2020-03-16 15:02:32 21.0 6 5 21 1.0 NaN 0 4.0
2504 3889956318 132 False 2760 22 0 68 2018-05-14 14:57:06 21.0 5 9 14 NaN NaN 0 5.0
jsondata.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3081 entries, 0 to 3080
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   match_id       3081 non-null   int64         
 1   player_slot    3081 non-null   int64         
 2   radiant_win    3081 non-null   bool          
 3   duration       3081 non-null   int64         
 4   game_mode      3081 non-null   int64         
 5   lobby_type     3081 non-null   int64         
 6   hero_id        3081 non-null   int64         
 7   start_time     3081 non-null   datetime64[ns]
 8   version        2619 non-null   float64       
 9   kills          3081 non-null   int64         
 10  deaths         3081 non-null   int64         
 11  assists        3081 non-null   int64         
 12  skill          1483 non-null   float64       
 13  average_rank   282 non-null    float64       
 14  leaver_status  3081 non-null   int64         
 15  party_size     2543 non-null   float64       
dtypes: bool(1), datetime64[ns](1), float64(4), int64(10)
memory usage: 364.2 KB

So there you go the preview we gathered on my personal Dota2 matches. Of course, you could gather more data by accessing more match API based on my Dota2 personal data. It could take you a lot of time because the match details are really detailed including the different 10 players in each game and each player has their own stats.

You could try to access every match but beware it is going to be a lot of time.

Get the match details on every match ID based on personal data

matchlist = []
for match in jsondata['match_id']:
    r = requests.get('https://api.opendota.com/api/matches/{}'.format(match))
    matchlist.append(r.json())

pd.json_normalize(matchlist[0]).columns
Index(['match_id', 'barracks_status_dire', 'barracks_status_radiant', 'chat',
       'cluster', 'cosmetics', 'dire_score', 'dire_team_id', 'draft_timings',
       'duration', 'engine', 'first_blood_time', 'game_mode', 'human_players',
       'leagueid', 'lobby_type', 'match_seq_num', 'negative_votes',
       'objectives', 'picks_bans', 'positive_votes', 'radiant_gold_adv',
       'radiant_score', 'radiant_team_id', 'radiant_win', 'radiant_xp_adv',
       'skill', 'start_time', 'teamfights', 'tower_status_dire',
       'tower_status_radiant', 'version', 'replay_salt', 'series_id',
       'series_type', 'players', 'patch', 'region', 'replay_url'],
      dtype='object')

Creating the Dataframe and normalizing the JSON data from matchlist to save it later into .csv.

matches_df = pd.DataFrame()

for match in matchlist:
    matches_df = pd.concat([matches_df, pd.json_normalize(match)], axis=0) 

matches_df.to_csv('Yourdataname.csv')

Leave a comment