ShaRP: Shape Regularized Multidimensional Projections
Problem
Multidimensional projections are well-established methods for visualizing large datasets in which every observation has tens up to hundreds of dimensions. Many such algorithms exist, such as PCA, t-SNE, UMAP, or our own Neural Network Projection.
However, most projection algorithms do not control the shapes they generate. For example, take the well-known MNIST dataset. The image below shows the projection an autoencoder (AE, a) and SSNP (b) would create. Clearly, the shapes of the same-color points (data points having the same label, thus similar), are kind of arbitrary. Surely, this can confuse the user to infer a specific distribution of these points -- which is not the case.
data:image/s3,"s3://crabby-images/c2c25/c2c257c3e0a8b130fcab1261e961bf276dcd1fb3" alt=""
Solution
The ShaRP method we propose solves this simply and efficiently. It essentially extends SSNP (which itself is an autoencoder with an additional label-based cost) to force same-label points to obey a distribution given by the user. In the image above, we force this distribution to be Gaussian. Thus, we get same-label points in rather ''circular' clusters.
Much more shape control is possible. The image below shows the same MNIST dataset, now forcing the same-label points to group into squares. This helps e.g. annotating point clusters by (rectangular) image thumbnails.
data:image/s3,"s3://crabby-images/58ff5/58ff5da85867f25985aec1e12538d18b4f011b51" alt=""
The image below shows three datasets (MNIST, HAR, Reuters, from left to right) projected by ShaRP to create triangular clusters.
data:image/s3,"s3://crabby-images/2cd08/2cd08749419ca446bb7efcaa6642021ca76315cc" alt=""
Results
Why is this useful? Consider the image below, comparing ShaRP (favoring round clusters) with t-SNE, UMAP, and SSNP for three datasets. We see that ShaRP produces easier-to-understand, less shape-biased, images.
data:image/s3,"s3://crabby-images/c0ae5/c0ae544e6a9a33088b94ee66f8618d685039740a" alt=""
Implementation
ShaRP is implemented in Python. The full source code is available here.
Publications
ShaRP: Shape-Regularized Multidimensional Projections. A. Machado, M. Behrisch, A. Telea. Proc. EuroVA 2023 (Best paper award)