Thesne: Stable t-SNE method for projecting high-dimensional data
Context
Projection is a generic name for creating 2D or 3D scatterplots representing data having tens or even thousands of dimensions. Projections are useful in unsupervised machine learning (detecting data clusters) and also in supervised machine learning (assessing the effectiveness of a classification method).
For static high-dimensional data t-SNE is arguably one of the most known, and most used, projection algorithms. Unlike other methods, it can stretch and compress high-dimensional distances so that data clusters are easily recognizable in 2D or 3D. However, naively applying t-SNE to time-dependent data causes major problems such as instability of the projections even for relatively slow-changing (stable) high-dimensional data.
Solution: Thesne
We adapted t-SNE so that it can stably project time-dependent data. Simply put, we do not only aim to keep similar data points close in the projection (as t-SNE), but also aim to keep slowly-changing data points change slowly in the projection.
Results
The images below show how Thesne (right columns in the two figures) behaves as opposed to standard t-SNE (left columns). The left image shows a simple time-dependent Gaussian dataset. The right image shows a more complex SVHN dataset. In both cases, Thesne preserves the absolute position of data clusters (modulo their change in time). In contrast, t-SNE exhibits random and confusing jumps between time moments.
References
Visualizing Time-Dependent Data Using Dynamic t-SNE P. Rauber, A. Falcao, A. Telea. Proc. EuroVis (short papers), 2016, Honorable Mention Paper Award
Implementation
Code for Thesne is available here.