Self-Supervised Neural Projection (SSNP)

Problem

Multidimensional projections are the methods of choice for depicting large and high-dimensional datasets. Tens of such methods exist. So, which is the best? We evaluated over 40 of such methods quantitatively and concluded that there's no winner. Speed, quality, stability, out-of-sample ability, ease of use, and implementation simplicity seem to compete.

Recently, we proposed NNP, a projection method that uses deep learning to achieve all above features. Nearly. NNP is supervised, so it requires a training projection, which costs effort and attention to generate.

Solution

We propose here a different road. We use deep learning to enhance a classical autoencoder architecture with a cost based on point labels, either supplied with the data or computed by clustering. This removes the need for supervision but keeps all other desirable aspects of NNP.

Results

The image below compares SSNP's results (using agglomerative clustering (Agg), K-means clustering (Km), and ground truth labels (GT)) to those of three state-of-the-art methods: t-SNE, UMAP, and autoencoders. We see that SSNP's results are better than autoencoders, whereas the method is much faster, simpler to implement and use, and is deterministic, as compared to t-SNE and UMAP.

Performance

The graph below shows SSNP's performance compared to NNP, t-SNE, autoencoders, and UMAP. Our method is as fast as NNP and autoencoders (but higher quality, see previous image) and orders of magnitude faster than t-SNE and UMAP.