In this blog post, we will explore a relatively novel dimensionality reduction technique in Machine Learning known as Uniform Manifold Approximation and Projection (UMAP). UMAP is a non-linear dimensionality reduction algorithm that is very effective for visualizing high-dimensional datasets. It provides better visuals and interpretations than techniques such as PCA, t-SNE, etc.
Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualization similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data:
In Python, the umap-learn library is used to work with UMAP. This library can be installed using pip:
!pip install umap-learn
Let's apply the UMAP algorithm on the popular Iris dataset:
import umap from sklearn.datasets import load_iris import matplotlib.pyplot as plt # Load Iris dataset data, labels = load_iris(return_X_y=True) # Apply UMAP and reduce the data dimensions reducer = umap.UMAP() embedding = reducer.fit_transform(data) # Plot the data plt.scatter(embedding[:, 0], embedding[:, 1], c=labels, cmap='Spectral', s=5) plt.gca().set_aspect('equal', 'datalim') plt.title('UMAP projection of the Iris dataset', fontsize=12) plt.show()
In the above snippet, the UMAP algorithm is initialized with default hyperparameters and the fit_transform method is called with our data, which will reduce the data dimensions. The results are then plotted using matplotlib
.
UMAP is a powerful dimensionality reduction tool that stands out among its competitors like t-SNE and PCA for its balance of speed, accuracy, and versatility. It can also be used on large datasets and offers quite intriguing results when used for data visualization tasks.