Interactive exploration of high-dimensional data for classifier design

Context

Designing classifiers is a key goal of machine learning. However, despite the age of the field, the practical tools for doing this are quite limited: One typically splits (labeled) data into training and test sets; chooses a classifier technique based on prior experience (or, sometimes, hearsay); then trains and tests the classifier, and hopes for the best. When results are suboptimal, one is left with the challenge of exploring the very high dimensional space spanned by classifier techniques, their hyperparameters, and the split of training/test data, largely blindly. This costs large amounts of time and generates equally large amounts of frustration.

Featured tool

To help classifier engineers, we designed featured. This tool offers the following functions:

import high-dimensional (labeled) data in a generic (table) form
integrate most major classification algorithms (e.g. KNN, RFC, SVN, LVQ, OPF, LR) via an intuitive GUI
allow users to set up training, testing, and cross-validation by a few clicks
show the data, including the ground-truth and inferred labels, via multidimensional projections
allow one to select subsets of data points and determine which features (dimensions) are most discriminant between them
show, also via multidimensional projections, which dimensions contribute to the inference of each class

All in all, using featured, one can conduct classifier engineering easily, intuitively, and with practically no programming.

Examples

The image below shows the main views of featured:

observation view: shows all data samples (in this case, dermatoscopic images that we want to classify into benign and malignant)
feature view: shows all data features (dimensions). Users can select feature subsets to generate classifiers and/or do unsupervised learning
projection view: shows the data using dimensionality reduction. Similar observations naturally cluster here. Also shows how a trained classifier scores on the available data
group view: shows user-selected groups of observations, interactively created by users in the projection view. This allows one to quickly compare why several observation groups are different
feature scoring view: Given two observation groups in the above view, this shows how much each feature is responsible for the difference between the two groups

Implementation

featured is available here.

Projects

featured was used in several real-world projects for classifier design. We acknowledge the support of ANCS Romania (MelanoImage grant) and Philips Research (NL).

References

Visual Analytics for Classifier Construction and Evaluation for Medical Data J. Kustra, A. Telea. Data Science for Healthcare, Springer, 2018

Projections as Visual Aids for Classification System Design P. Rauber, A. Falcao, A. Telea. Information Visualization, 2017

Interactive Image Feature Selection Aided by Dimensionality Reduction P. Rauber, R. da Silva, S. Feringa, M. Celebi, A. Falcao, A. Telea. Proc. EuroVA, 2015

Information Visualization

Projects

Other Research Topics

Research funding

Interactive exploration of high-dimensional data for classifier design

Context

Featured tool

Examples

Implementation

Projects

References