Day 7 Algorit.ma : Capstone Project
Fraud Prediction
Fraud Prediction
Day 6, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 5, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 4, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 3, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 2, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 1, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Well this post (I hope I can make it as a series) will be my personal notes and documentation of data science bootcamp session from Algorit.ma. Please notes ...
This is going to be a short post. This is really interesting for me personally. As a Data Scientist and avid Dota 2 player, what could be better than doing d...
The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.
Pandas library has became the “one must installed” library for data manipulation in python and is widely used by data scientist and analyst. Pandas provide a...
The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
So in this exercise is we are going to use dataset from the internet to make it easier. You could download the exercise from here. I just bored and keep tryi...
The latest version of Python has been released on 24th October 2022 last week. The 3.11 changelog consist of a lot of bug fixes, improvements, and additional...
The surge of available data we can find on the internet is insane. With this surge, data analytics has become a hugely important part of the way organization...
Introduction
Data imbalance usually reflects an unequal distribution of classes within a dataset. In class imbalance, one trains on a dataset that contains a large number...
Creating a visualization may not as easier as it looks. Some of the visualizations may look cool but not interpret what they mean. Imagine after a hard and l...
After I reviewed my knowledge of exploratory data analysis (EDA) here, I am wondering if there is some way or a new way to understand your dataset more easil...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Another reference and shared post from https://www.mygreatlearning.com/blog/label-encoding-in-python/
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Today, i will try Exploratory Data Analysis and regression with insurance data from Kaggle. Let’s take a look
This week I will dedicate my time to solve all exercise from Jose Portilla Python for Data Science Bootcamp.
Most data nowadays is huge and massive. Dataset often comes with many irrelevant features that do not contribute much to the accuracy of your predictive mode...
Practically, in real dataset, the dataset contain categorical value. So what is the difference between casual string value and categorical value ? Well, some...
Machine learning can’t process non-numeric value. Then how to process image or text data ? Before you train your image or text data, you need to transform th...
After rescaling or normalize the data, there is another way to change the distribution of the data by transformation. There are 3 different ways to transform...
Numerical data is already digestible by machine learning or mathematical formula. But it doesn’t mean that is no longer need feature engineering or preproces...
Missing value in your data is pretty common in real life. In fact, the chance that at least one data point is missing increases as the data set size increase...
In this data project we will focus on exploratory data analysis of stock prices. Keep in mind, this project is just meant to practice your visualization and ...
For this capstone project we will be analyzing some 911 call data from Kaggle. The data contains the following fields:
This day i will completing data visualization with Pandas Exercise. If you want to solve it all by yourself, you can download notebooks file here and sample ...
This day i will completing Matplotlib Exercise. If you want to solve it all by yourself, you can download notebooks file here
Today i will completing Pandas Exercise using SF Salaries. If you want to solve it all by yourself, you can download notebooks file here and dataset here
Today i will completing Pandas Exercise using Ecommerce Purchase. If you want to solve it all by yourself, you can download notebooks file here and dataset h...
Today i will completing Numpy Exercise. If you want to solve it all by yourself, you can download notebooks file here
This week I will dedicate my time to solve all exercise from Jose Portilla Python for Data Science Bootcamp. Today i will completing some exercise from Pytho...
The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, fina...
Multiplot grid are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by fe...
After discussing basic visualization with Matplotlib, now let’s try another but more attractive visualization library called Seaborn. Seaborn is a Python dat...
When it comes to data visualization, the first and the most critical step is to select the correct visualization for the data that you want to present. With ...
Matplotlib is the most basic library of data visualization with Python. It created to try to replicate MatLab’s (another programming language) plotting capab...
Continue from last post, lets continue about the features in pandas.
Next, let’s discussing Pandas. Preparing the data and munging the same was the initial outcomes of python before the introduction of Panda libraries. after ...
Let’s continue with Numpy. NumPy is a python library used for working with arrays. It also has functions for working in domain of linear algebra and matrices...
Before we dive into Pandas, Numpy and Matplotlib, let’s try remind us the basic of the python first. I wont cover all Python stuff because it took too much t...
First of all, why using Python for Data Science ? According to recent surveys by KDNugget, Python is the preferred programming language for data scientists. ...
Pagi gan, semoga udah pada bangun. Jadi untuk mengisi kegabutan saya di kampus, pada hari ini saya akan mulai posting tentang algoritma dan struktur data. Ka...
Midnight post lagi -_-. Enaknya bahas apaan ya ??? Karena bakal nggak asik kalo ML tanpa illustrasi ( ͡° ͜ʖ ͡°), mendingan bahas tentang illustrasi yang bisa...
Midnight post nih gan mumpung lagi gabut. Pikir-pikir enaknya lanjut bahas ML kayak kemaren ( ͡° ͜ʖ ͡°). Pandas adalah semacam library dari Python yang biasa...
Machine Learning adalah studi tentang software yang menggunakan pengalaman masa lalu untuk membuat keputusan di masa depan. Tujuan dasar dari Machine Learnin...
The latest version of Python has been released on 24th October 2022 last week. The 3.11 changelog consist of a lot of bug fixes, improvements, and additional...
Natural Language Processing (NLP) is broadly defined as the automatic manipulation of natural language, like speech and text. Natural language is primarily ...
The performance of machine learning algorithms can degrade with too many input variables. Having a large number of dimensions in the feature space can mean t...
Clustering is a technique widely used to find groups of observations (clusters) that share similar characteristics. This process is not driven by a specific ...
Exercise from Jose Portilla Python for Data Science Bootcamp.
The support vector machine is a generalization of a classifier called maximal margin classifier. The maximal margin classifier is simple, but it cannot be ap...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Bagging is an ensemble algorithm that fits multiple models on different subsets of a training dataset, then combines the predictions from all models. Random ...
Decision trees are very popular machine learning algorithm. They are popular because a variety of reasons, being their interpretability probably their most i...
Exercise from Jose Portilla Python for Data Science Bootcamp.
K Nearest Neighbour (KNN) works by choosing the best $k$ of neighbour. Neighbour by definition is a person living near or next door to the speaker or person ...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. It is a statistical machine learning algorithm th...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Today, i will try Exploratory Data Analysis and regression with insurance data from Kaggle. Let’s take a look
Linear regression is useful for finding relationship between two continuous variables. Linear regression is a linear model, a model that creates a linear rel...
The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, fina...
Multiplot grid are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by fe...
After discussing basic visualization with Matplotlib, now let’s try another but more attractive visualization library called Seaborn. Seaborn is a Python dat...
When it comes to data visualization, the first and the most critical step is to select the correct visualization for the data that you want to present. With ...
Matplotlib is the most basic library of data visualization with Python. It created to try to replicate MatLab’s (another programming language) plotting capab...
Continue from last post, lets continue about the features in pandas.
Next, let’s discussing Pandas. Preparing the data and munging the same was the initial outcomes of python before the introduction of Panda libraries. after ...
Let’s continue with Numpy. NumPy is a python library used for working with arrays. It also has functions for working in domain of linear algebra and matrices...
Before we dive into Pandas, Numpy and Matplotlib, let’s try remind us the basic of the python first. I wont cover all Python stuff because it took too much t...
Ok, now after 1 day break and dilly dally learning theory about Machine Learning and Evaluation Metric (I’m kind of regret tell the theory first because it t...
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its dis...
Clustering is an important part of the machine learning pipeline for business or scientific enterprises utilizing data science. As the name suggests, it help...
Regression task is the prediction of the state of an outcome variable at a particular timepoint with the help of other correlated independent variables. The ...
Classifier performance depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given proble...
To know how good your model is to actually try it on new cases or different cases. But if your model not doing good as it expected, surely you will hesitate ...
At a some level, running machine learning systems at scale is challenging for several reasons. The systems issues are often misunderstood. Although best prac...
….. Continue from last post
There are many different types of machine learning system. At a high-level, machine learning is simply the study of teaching a computer program or algorithm ...
First of all, why using Python for Data Science ? According to recent surveys by KDNugget, Python is the preferred programming language for data scientists. ...
There are hundreds of zettabytes of data available on internet, but most of them is not publicly accessed. Today, i will share where to find open public data...
Midnight post lagi -_-. Enaknya bahas apaan ya ??? Karena bakal nggak asik kalo ML tanpa illustrasi ( ͡° ͜ʖ ͡°), mendingan bahas tentang illustrasi yang bisa...
Midnight post nih gan mumpung lagi gabut. Pikir-pikir enaknya lanjut bahas ML kayak kemaren ( ͡° ͜ʖ ͡°). Pandas adalah semacam library dari Python yang biasa...
Machine Learning adalah studi tentang software yang menggunakan pengalaman masa lalu untuk membuat keputusan di masa depan. Tujuan dasar dari Machine Learnin...
Day 5, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 4, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 3, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 2, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 1, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.
Pandas library has became the “one must installed” library for data manipulation in python and is widely used by data scientist and analyst. Pandas provide a...
The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
So in this exercise is we are going to use dataset from the internet to make it easier. You could download the exercise from here. I just bored and keep tryi...
Another reference and shared post from https://www.mygreatlearning.com/blog/label-encoding-in-python/
Numerical data is already digestible by machine learning or mathematical formula. But it doesn’t mean that is no longer need feature engineering or preproces...
Missing value in your data is pretty common in real life. In fact, the chance that at least one data point is missing increases as the data set size increase...
In this data project we will focus on exploratory data analysis of stock prices. Keep in mind, this project is just meant to practice your visualization and ...
For this capstone project we will be analyzing some 911 call data from Kaggle. The data contains the following fields:
This day i will completing data visualization with Pandas Exercise. If you want to solve it all by yourself, you can download notebooks file here and sample ...
Today i will completing Pandas Exercise using SF Salaries. If you want to solve it all by yourself, you can download notebooks file here and dataset here
Today i will completing Pandas Exercise using Ecommerce Purchase. If you want to solve it all by yourself, you can download notebooks file here and dataset h...
Continue from last post, lets continue about the features in pandas.
Next, let’s discussing Pandas. Preparing the data and munging the same was the initial outcomes of python before the introduction of Panda libraries. after ...
There are many different kind of data types. In this blog, i will explain these data types based on most common understanding in Data Science. Specifically i...
Midnight post nih gan mumpung lagi gabut. Pikir-pikir enaknya lanjut bahas ML kayak kemaren ( ͡° ͜ʖ ͡°). Pandas adalah semacam library dari Python yang biasa...
Fraud Prediction
The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.
The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
The continuity of my practice on Pandas exercise from guisapmora.
So in this exercise is we are going to use dataset from the internet to make it easier. You could download the exercise from here. I just bored and keep tryi...
In this postwe will build out a Multi Layer Perceptron model to try to classify hand written digits using TensorFlow (a very famous example).
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Today, i will try Exploratory Data Analysis and regression with insurance data from Kaggle. Let’s take a look
This week I will dedicate my time to solve all exercise from Jose Portilla Python for Data Science Bootcamp.
In this data project we will focus on exploratory data analysis of stock prices. Keep in mind, this project is just meant to practice your visualization and ...
For this capstone project we will be analyzing some 911 call data from Kaggle. The data contains the following fields:
This day i will completing data visualization with Pandas Exercise. If you want to solve it all by yourself, you can download notebooks file here and sample ...
This day i will completing Matplotlib Exercise. If you want to solve it all by yourself, you can download notebooks file here
Today i will completing Pandas Exercise using SF Salaries. If you want to solve it all by yourself, you can download notebooks file here and dataset here
Today i will completing Pandas Exercise using Ecommerce Purchase. If you want to solve it all by yourself, you can download notebooks file here and dataset h...
Today i will completing Numpy Exercise. If you want to solve it all by yourself, you can download notebooks file here
This week I will dedicate my time to solve all exercise from Jose Portilla Python for Data Science Bootcamp. Today i will completing some exercise from Pytho...
Natural Language Processing (NLP) is broadly defined as the automatic manipulation of natural language, like speech and text. Natural language is primarily ...
The performance of machine learning algorithms can degrade with too many input variables. Having a large number of dimensions in the feature space can mean t...
Clustering is a technique widely used to find groups of observations (clusters) that share similar characteristics. This process is not driven by a specific ...
The support vector machine is a generalization of a classifier called maximal margin classifier. The maximal margin classifier is simple, but it cannot be ap...
Bagging is an ensemble algorithm that fits multiple models on different subsets of a training dataset, then combines the predictions from all models. Random ...
Decision trees are very popular machine learning algorithm. They are popular because a variety of reasons, being their interpretability probably their most i...
Another reference and shared post from https://www.mygreatlearning.com/blog/label-encoding-in-python/
K Nearest Neighbour (KNN) works by choosing the best $k$ of neighbour. Neighbour by definition is a person living near or next door to the speaker or person ...
Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. It is a statistical machine learning algorithm th...
Linear regression is useful for finding relationship between two continuous variables. Linear regression is a linear model, a model that creates a linear rel...
Most data nowadays is huge and massive. Dataset often comes with many irrelevant features that do not contribute much to the accuracy of your predictive mode...
Practically, in real dataset, the dataset contain categorical value. So what is the difference between casual string value and categorical value ? Well, some...
Machine learning can’t process non-numeric value. Then how to process image or text data ? Before you train your image or text data, you need to transform th...
After rescaling or normalize the data, there is another way to change the distribution of the data by transformation. There are 3 different ways to transform...
Numerical data is already digestible by machine learning or mathematical formula. But it doesn’t mean that is no longer need feature engineering or preproces...
Missing value in your data is pretty common in real life. In fact, the chance that at least one data point is missing increases as the data set size increase...
Day 5, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Natural Language Processing (NLP) is broadly defined as the automatic manipulation of natural language, like speech and text. Natural language is primarily ...
Welcome to the NLP Project for this section of the course. In this NLP project you will be attempting to classify Yelp Reviews into 1 star or 5 star categori...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Decision trees are very popular machine learning algorithm. They are popular because a variety of reasons, being their interpretability probably their most i...
Exercise from Jose Portilla Python for Data Science Bootcamp.
K Nearest Neighbour (KNN) works by choosing the best $k$ of neighbour. Neighbour by definition is a person living near or next door to the speaker or person ...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. It is a statistical machine learning algorithm th...
Exercise from Jose Portilla Python for Data Science Bootcamp.
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its dis...
Classifier performance depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given proble...
Creating a visualization may not as easier as it looks. Some of the visualizations may look cool but not interpret what they mean. Imagine after a hard and l...
After I reviewed my knowledge of exploratory data analysis (EDA) here, I am wondering if there is some way or a new way to understand your dataset more easil...
Welcome to the NLP Project for this section of the course. In this NLP project you will be attempting to classify Yelp Reviews into 1 star or 5 star categori...
Welcome to the code notebook for Recommender Systems with Python. In this lecture we will develop basic recommendation systems using Python and pandas. There...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
Today, i will try Exploratory Data Analysis and regression with insurance data from Kaggle. Let’s take a look
In this data project we will focus on exploratory data analysis of stock prices. Keep in mind, this project is just meant to practice your visualization and ...
For this capstone project we will be analyzing some 911 call data from Kaggle. The data contains the following fields:
GPT (Generative Pre-training Transformer) is a type of artificial intelligence model developed by OpenAI that can be used for tasks such as language translat...
This is going to be a short post. This is really interesting for me personally. As a Data Scientist and avid Dota 2 player, what could be better than doing d...
Well now let’s talk about the next season DPC and the drama all around it. Of course, a lot of things happening so to start with congratulations to Tundra Es...
So for the second post, I am going to show you what is the configuration of my personal Dota2 settings. It is going to be a little bit of a long explanation ...
It’s been a long time since I wanted to make content about Dota. I like to do random analyses about Dota sometimes. I hope in this series of posts I can shar...
So today i will start my journey with recollecting from the internet about what is Data Science ? So what is Data Science ? Why is Data Science so popular be...
Short blog post. I hope in the future i can try to be productive. Recently, i feel bored with the Covid-19 situation. So i hope in the next few days i can ma...
• February 23, 1996 • By John Perry
Syntax highlighting is a feature that displays source code, in different colors and fonts according to the category of terms. This feature facilitates writin...
This day i will completing data visualization with Pandas Exercise. If you want to solve it all by yourself, you can download notebooks file here and sample ...
This day i will completing Matplotlib Exercise. If you want to solve it all by yourself, you can download notebooks file here
The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, fina...
Multiplot grid are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by fe...
After discussing basic visualization with Matplotlib, now let’s try another but more attractive visualization library called Seaborn. Seaborn is a Python dat...
When it comes to data visualization, the first and the most critical step is to select the correct visualization for the data that you want to present. With ...
Matplotlib is the most basic library of data visualization with Python. It created to try to replicate MatLab’s (another programming language) plotting capab...
Midnight post lagi -_-. Enaknya bahas apaan ya ??? Karena bakal nggak asik kalo ML tanpa illustrasi ( ͡° ͜ʖ ͡°), mendingan bahas tentang illustrasi yang bisa...
Fraud Prediction
Day 6, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 5, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 4, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 3, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 2, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 1, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Well this post (I hope I can make it as a series) will be my personal notes and documentation of data science bootcamp session from Algorit.ma. Please notes ...
Another reference and shared post from https://www.mygreatlearning.com/blog/label-encoding-in-python/
Most data nowadays is huge and massive. Dataset often comes with many irrelevant features that do not contribute much to the accuracy of your predictive mode...
Practically, in real dataset, the dataset contain categorical value. So what is the difference between casual string value and categorical value ? Well, some...
Machine learning can’t process non-numeric value. Then how to process image or text data ? Before you train your image or text data, you need to transform th...
After rescaling or normalize the data, there is another way to change the distribution of the data by transformation. There are 3 different ways to transform...
Numerical data is already digestible by machine learning or mathematical formula. But it doesn’t mean that is no longer need feature engineering or preproces...
Missing value in your data is pretty common in real life. In fact, the chance that at least one data point is missing increases as the data set size increase...
First of all, why using Python for Data Science ? According to recent surveys by KDNugget, Python is the preferred programming language for data scientists. ...
There are hundreds of zettabytes of data available on internet, but most of them is not publicly accessed. Today, i will share where to find open public data...
There are many different kind of data types. In this blog, i will explain these data types based on most common understanding in Data Science. Specifically i...
One of the simple ways to think about data is wether it is structured or not. Well, the first thing not all data is created equal or the same. Some data is s...
After finding a reason and motivation to start learning about Data Science, lets we talk about the Tools to practicing Data Science. Machine learning tools m...
So today i will start my journey with recollecting from the internet about what is Data Science ? So what is Data Science ? Why is Data Science so popular be...
Day 6, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
The performance of machine learning algorithms can degrade with too many input variables. Having a large number of dimensions in the feature space can mean t...
Clustering is a technique widely used to find groups of observations (clusters) that share similar characteristics. This process is not driven by a specific ...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Clustering is an important part of the machine learning pipeline for business or scientific enterprises utilizing data science. As the name suggests, it help...
Well, this post going to be my finaly year in the Netherlands. A bit of drama here and there but let;s see what happens.
A long break between the previous post. I’ve been busy with my new job (more like training) recently so yeah I might update this blog a little bit next year ...
It’s been a while I created a blog post. Apparently my last post was almost 8 months ago. But now I created a farewell post to The Netherlands because I am g...
So for the second post, I am going to show you what is the configuration of my personal Dota2 settings. It is going to be a little bit of a long explanation ...
It’s been a long time since I wanted to make content about Dota. I like to do random analyses about Dota sometimes. I hope in this series of posts I can shar...
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its dis...
Clustering is an important part of the machine learning pipeline for business or scientific enterprises utilizing data science. As the name suggests, it help...
Regression task is the prediction of the state of an outcome variable at a particular timepoint with the help of other correlated independent variables. The ...
Classifier performance depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given proble...
Day 4, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Today, i will try Exploratory Data Analysis and regression with insurance data from Kaggle. Let’s take a look
Linear regression is useful for finding relationship between two continuous variables. Linear regression is a linear model, a model that creates a linear rel...
Regression task is the prediction of the state of an outcome variable at a particular timepoint with the help of other correlated independent variables. The ...
After rescaling or normalize the data, there is another way to change the distribution of the data by transformation. There are 3 different ways to transform...
For this capstone project we will be analyzing some 911 call data from Kaggle. The data contains the following fields:
Today i will completing Numpy Exercise. If you want to solve it all by yourself, you can download notebooks file here
Let’s continue with Numpy. NumPy is a python library used for working with arrays. It also has functions for working in domain of linear algebra and matrices...
Day 2, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.
The continuity of my practice on Pandas exercise from guisapmora. This one is interesting because it covers the basic exercise of visualization in Matplotlib.
Creating a visualization may not as easier as it looks. Some of the visualizations may look cool but not interpret what they mean. Imagine after a hard and l...
This is going to be a short post. This is really interesting for me personally. As a Data Scientist and avid Dota 2 player, what could be better than doing d...
Well now let’s talk about the next season DPC and the drama all around it. Of course, a lot of things happening so to start with congratulations to Tundra Es...
So for the second post, I am going to show you what is the configuration of my personal Dota2 settings. It is going to be a little bit of a long explanation ...
It’s been a long time since I wanted to make content about Dota. I like to do random analyses about Dota sometimes. I hope in this series of posts I can shar...
Day 2, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 1, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
This is a quick post on how to create a shortcut for Jupyter Notebook. In this case, you need to connect your PATH of your Python Conda. Here’s how:
Well this post (I hope I can make it as a series) will be my personal notes and documentation of data science bootcamp session from Algorit.ma. Please notes ...
In this data project we will focus on exploratory data analysis of stock prices. Keep in mind, this project is just meant to practice your visualization and ...
Multiplot grid are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by fe...
After discussing basic visualization with Matplotlib, now let’s try another but more attractive visualization library called Seaborn. Seaborn is a Python dat...
You can import the library:
In this postwe will build out a Multi Layer Perceptron model to try to classify hand written digits using TensorFlow (a very famous example).
Artificial Neural Network (ANN) is a computational model that is inspired by the way biological neural networks in the human brain process information. Artif...
Mencari info kost adalah sesuatu yang wajib dilakukan oleh anak rantau. Melanjutkan pendidikan di luar kota atau bekerja di luar kota tentunya sudah bukan ha...
There are many different kind of data types. In this blog, i will explain these data types based on most common understanding in Data Science. Specifically i...
One of the simple ways to think about data is wether it is structured or not. Well, the first thing not all data is created equal or the same. Some data is s...
Broadly speaking, you’ll get a TOEFL independent writing question based on one of the following styles:
TOEFL Speaking Question 1 (Opinion about something)
Broadly speaking, you’ll get a TOEFL independent writing question based on one of the following styles:
TOEFL Speaking Question 1 (Opinion about something)
Decision trees are very popular machine learning algorithm. They are popular because a variety of reasons, being their interpretability probably their most i...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Bagging is an ensemble algorithm that fits multiple models on different subsets of a training dataset, then combines the predictions from all models. Random ...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Welcome to the NLP Project for this section of the course. In this NLP project you will be attempting to classify Yelp Reviews into 1 star or 5 star categori...
Welcome to the code notebook for Recommender Systems with Python. In this lecture we will develop basic recommendation systems using Python and pandas. There...
You can import the library:
Artificial Neural Network (ANN) is a computational model that is inspired by the way biological neural networks in the human brain process information. Artif...
Day 4, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Day 3, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...
Pagi gan, semoga udah pada bangun. Jadi untuk mengisi kegabutan saya di kampus, pada hari ini saya akan mulai posting tentang algoritma dan struktur data. Ka...
Mencari info kost adalah sesuatu yang wajib dilakukan oleh anak rantau. Melanjutkan pendidikan di luar kota atau bekerja di luar kota tentunya sudah bukan ha...
After finding a reason and motivation to start learning about Data Science, lets we talk about the Tools to practicing Data Science. Machine learning tools m...
There are hundreds of zettabytes of data available on internet, but most of them is not publicly accessed. Today, i will share where to find open public data...
There are hundreds of zettabytes of data available on internet, but most of them is not publicly accessed. Today, i will share where to find open public data...
First of all, why using Python for Data Science ? According to recent surveys by KDNugget, Python is the preferred programming language for data scientists. ...
To know how good your model is to actually try it on new cases or different cases. But if your model not doing good as it expected, surely you will hesitate ...
TOEFL Speaking Question 1 (Opinion about something)
Broadly speaking, you’ll get a TOEFL independent writing question based on one of the following styles:
Ok, now after 1 day break and dilly dally learning theory about Machine Learning and Evaluation Metric (I’m kind of regret tell the theory first because it t...
The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, fina...
Linear regression is useful for finding relationship between two continuous variables. Linear regression is a linear model, a model that creates a linear rel...
Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. It is a statistical machine learning algorithm th...
K Nearest Neighbour (KNN) works by choosing the best $k$ of neighbour. Neighbour by definition is a person living near or next door to the speaker or person ...
Bagging is an ensemble algorithm that fits multiple models on different subsets of a training dataset, then combines the predictions from all models. Random ...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Exercise from Jose Portilla Python for Data Science Bootcamp.
The support vector machine is a generalization of a classifier called maximal margin classifier. The maximal margin classifier is simple, but it cannot be ap...
Exercise from Jose Portilla Python for Data Science Bootcamp.
Clustering is a technique widely used to find groups of observations (clusters) that share similar characteristics. This process is not driven by a specific ...
Welcome to the NLP Project for this section of the course. In this NLP project you will be attempting to classify Yelp Reviews into 1 star or 5 star categori...
Natural Language Processing (NLP) is broadly defined as the automatic manipulation of natural language, like speech and text. Natural language is primarily ...
Let’s learn how to use Spark with Python by using the pyspark library! Make sure to view the video lecture explaining Spark and RDDs before continuing on wit...
Let’s learn how to use Spark with Python by using the pyspark library! Make sure to view the video lecture explaining Spark and RDDs before continuing on wit...
Artificial Neural Network (ANN) is a computational model that is inspired by the way biological neural networks in the human brain process information. Artif...
In this postwe will build out a Multi Layer Perceptron model to try to classify hand written digits using TensorFlow (a very famous example).
Hello guys, it’s been 3 months since my last post in Machine Learning. I’ll admit that I am a little bit rusty nowadays. Because of my interviews in some com...
Hello guys, it’s been 3 months since my last post in Machine Learning. I’ll admit that I am a little bit rusty nowadays. Because of my interviews in some com...
The surge of available data we can find on the internet is insane. With this surge, data analytics has become a hugely important part of the way organization...
Well now let’s talk about the next season DPC and the drama all around it. Of course, a lot of things happening so to start with congratulations to Tundra Es...
GPT (Generative Pre-training Transformer) is a type of artificial intelligence model developed by OpenAI that can be used for tasks such as language translat...
This is a quick post on how to create a shortcut for Jupyter Notebook. In this case, you need to connect your PATH of your Python Conda. Here’s how:
Day 6, here I will share my notes of Inclass notebook. For further example you can check out on https://github.com/Saltfarmer/Algoritma-BFLP-DS-Audit/tree/ma...