1 minute read

There are hundreds of zettabytes of data available on internet, but most of them is not publicly accessed. Today, i will share where to find open public datasets.

1. Kaggle

Kaggle

Kaggle not only provides us with a lot of datasets, but also provides us with great community, tutorial, competition, and the latest one is providing their own notebooks.

2. UCI Machine Learning Repository

UCIML Repo

UCIML contributed datasets in various levels of cleanliness. It’s one of the classics, and you can download datasets without having to register anything. The best thing about this repository is the data itself is pretty clean and good for beginner who started practicing.

Google

Of course i must include the boss of data, Google. Google Dataset Search is similar to how Google Scholar (for students or scientist who use this for research and citing) works, Dataset Search lets you find datasets wherever they are hosted, whether it’s a publisher’s site, a digital library, or an author’s web page. It is good if you curious to understand various kinds of data.

4. VisualData.io

Visual Data

If you want to practicing Machine Vision and Neural Network, Visual Data will be one of your recommendation to find Image Datasets.

5. Awesome Public Dataset

Github

Awesome Public Datasets is a repository on GitHub of popular and rising topic public data sources. They are collected and tidied from many different sources. To helps them, you should give them at least a star for their amazing effort.

6. Data is Plural

RSS Image

Data is Plural is a weekly newsletter of useful/curious datasets. You can find a huge archive of datasets on their google doc. The thing i found this site interesting is how updated and thousands of headlinesyou can found. It also helps you to be more updated with world news.

7. Data World

data.world

Data World started to rivalling Kaggle on community-drive dataset resources. Their stuffs domain is really interesting like open public data and Government Data. This site is contains really hard to find data from. In particular, the healthcare field is one of the more difficult industries to get publicly available data from(due to privacy concerns).

Leave a comment