Recent posts

Dimensionality Reduction

2 minute read

The performance of machine learning algorithms can degrade with too many input variables. Having a large number of dimensions in the feature space can mean t...

K Means

2 minute read

Clustering is a technique widely used to find groups of observations (clusters) that share similar characteristics. This process is not driven by a specific ...

K-Means Exercise

7 minute read

Exercise from Jose Portilla Python for Data Science Bootcamp.

Support Vector Machine

5 minute read

The support vector machine is a generalization of a classifier called maximal margin classifier. The maximal margin classifier is simple, but it cannot be ap...

Random Forest

2 minute read

Bagging is an ensemble algorithm that fits multiple models on different subsets of a training dataset, then combines the predictions from all models. Random ...

Decision Tree

6 minute read

Decision trees are very popular machine learning algorithm. They are popular because a variety of reasons, being their interpretability probably their most i...

Categorical Encoding 2

11 minute read

Another reference and shared post from https://www.mygreatlearning.com/blog/label-encoding-in-python/

K Nearest Neighbour

6 minute read

K Nearest Neighbour (KNN) works by choosing the best $k$ of neighbour. Neighbour by definition is a person living near or next door to the speaker or person ...