Interactive exploration of high-dimensional data for classifier design
Context
Designing classifiers is a key goal of machine learning. However, despite the age of the field, the practical tools for doing this are quite limited: One typically splits (labeled) data into training and test sets; chooses a classifier technique based on prior experience (or, sometimes, hearsay); then trains and tests the classifier, and hopes for the best. When results are suboptimal, one is left with the challenge of exploring the very high dimensional space spanned by classifier techniques, their hyperparameters, and the split of training/test data, largely blindly. This costs large amounts of time and generates equally large amounts of frustration.
Featured tool
To help classifier engineers, we designed featured. This tool offers the following functions:
- import high-dimensional (labeled) data in a generic (table) form
- integrate most major classification algorithms (e.g. KNN, RFC, SVN, LVQ, OPF, LR) via an intuitive GUI
- allow users to set up training, testing, and cross-validation by a few clicks
- show the data, including the ground-truth and inferred labels, via multidimensional projections
- allow one to select subsets of data points and determine which features (dimensions) are most discriminant between them
- show, also via multidimensional projections, which dimensions contribute to the inference of each class
All in all, using featured, one can conduct classifier engineering easily, intuitively, and with practically no programming.
Examples
The image below shows the main views of featured:
- observation view: shows all data samples (in this case, dermatoscopic images that we want to classify into benign and malignant)
- feature view: shows all data features (dimensions). Users can select feature subsets to generate classifiers and/or do unsupervised learning
- projection view: shows the data using dimensionality reduction. Similar observations naturally cluster here. Also shows how a trained classifier scores on the available data
- group view: shows user-selected groups of observations, interactively created by users in the projection view. This allows one to quickly compare why several observation groups are different
- feature scoring view: Given two observation groups in the above view, this shows how much each feature is responsible for the difference between the two groups
Implementation
featured is available here.
Projects
featured was used in several real-world projects for classifier design. We acknowledge the support of ANCS Romania (MelanoImage grant) and Philips Research (NL).
References
Visual Analytics for Classifier Construction and Evaluation for Medical Data J. Kustra, A. Telea. Data Science for Healthcare, Springer, 2018
Projections as Visual Aids for Classification System Design P. Rauber, A. Falcao, A. Telea. Information Visualization, 2017
Interactive Image Feature Selection Aided by Dimensionality Reduction P. Rauber, R. da Silva, S. Feringa, M. Celebi, A. Falcao, A. Telea. Proc. EuroVA, 2015