Additional material
This page gives a list of pointers to code packages (libraries, toolkits, code snippets, repositories) which are useful for both understanding the material presented during the lectures and, also, executing (parts of) the practical assignment.
Important note: You do not need to become an expert in using all these software resources. They are a superset of what one needs to be familiar with to complete the assignment well and pass the course with a good grade. Rather, we present this extensive set of resources here as a knowledge-base for students interested to gain more in-depth knowledge about data visualization (with a focus on network and high-dimensional data), so as to
- better understand the topics presented during the lectures
- try out a wide range of visualization techniques
- compare different implementations of the same technique(s)
- compare the results of one's own implementation with those of state-of-the-art ones
- see the trade-off between toy systems (easy-to-use but limited) and professional ones (powerful but harder to learn)
- explore different designs (APIs, programming languages, philosophies) for the same task
- find out additional datasets beside the ones discussed during the lectures and labs
- get further interested in data visualization!
Simple graph drawing
GraphViz
GraphViz is one of the simplest and oldest libraries for graph visualization. Its key added value is simplicity: It comes with a trivial file format for storing graphs and/or their drawings and a few simple-to-use tools to create customized drawings of such graphs. It includes the main layout algorithms discussed during the course (hierarchical, nested, force-directed). Highly effective for learning how graph drawing works and also producing visually finely-tuned drawings of relatively small graphs. Can be used with virtually no coding.
D3 Graph Layouts
D3 is likely the most famous library for creating data charts for the web. Among tons of other features, it contains algorithms for many graph layouts discussed in the course (force-directed, treemaps, bubble trees, and more). Coded in JavaScript, it allows one to quickly create interactive visualizations of small-to-medium-sized graphs (hundreds of elements). Learning D3 can be challenging for non-JS experts and connecting to real data sources requires some web-client-server engineering. For a quick try, see
- Force-directed layouts
- Force-directed layouts (another variant)
- Music visualization (demo of D3's bubble layout for interactive exploration of over 150K songs!)
Advanced graph visualization
Gephi
Gephi is the next step in graph drawing from GraphViz. It comes with significantly more scalability (drawing larger graphs) and tools for interactively exploring graphs. Mainly centered on force-directed layout algorithms. Gephi is coded in Java.
yEd
Graph Drawing application produced by yFiles, company that developed as a start up from Tuebingen University. Noticeably, the system offers "layered layout" with many parametrization options. An online version is also available.
OGDF
OGDF is a self-contained C++ library for graph algorithms, in particular for (but not restricted to) automatic graph drawing. It offers sophisticated algorithms and data structures to use within your own applications or scientific projects. This library offers many layouts that are not offered by other libraries, in particular it covers all the planar layouts.
Tulip framework
Tulip is the best visualization framework for large graphs and can be seen as a massive next step beyond Gephi. It contains hundreds of layout algorithms and endless configuration options for rendering and interacting with graphs of millions of nodes and edges. It can be used both via a GUI or a powerful API offering full control, and can be deployed on the desktop, mobile devices, or as a visualization server. Highly recommended for anyone who aims to build commercial-grade visualization solutions involving large graphs.
Kernel density bundling (KDEEB)
KDEEB is by now the standard algorithm for bundling general graphs. The code provided above is from the original paper, written in C# and OpenGL. Since then, many alternative KDEEB implementations have appeared, e.g.
- C++ implementation
- JS implementation (including live demo!)
- JS/D3 implementation
- CUDA implementation (fastest ever, many tunable parameters)
Cola.js - Constraint-Based Layout in the Browser
Cola.js is an open-source JavaScript library for arranging HTML5 documents and diagrams using constraint-based optimization techniques, including variations of force-directed layout.
Low-dimensional data visualization
Table lens in D3
Basic but easy-to-learn implementation of table lenses in D3.
Table lens in C++
A very powerful framework for table lensing (including multiple sort/group, treemapping, and more). Code likely to need cleanups for modern C++ compilers.
Parallel coordinates in D3
Simple demo of how to implement parallel coordinate plots (PCPs) in D3, including interaction.
Parallel coordinates in Python
Advanced library for PCP from Facebook Research. Scales to tens of thousands of samples and tens of dimensions. Very easy to use, professional code. Highly recommended.
Scatterplot matrix in Python
Standard implementation of scatterplot matrices (SPLOMs) in Python using Matplotlib.
Scatterplot matrix in D3
SPLOMs with interactive selection in D3.
High-dimensional data visualization
Dimensionality reduction demo
No-coding, web-based, simple introduction to dimensionality reduction with real-world datasets and projection techniques. If you're new to dimensionality reduction, check this out first!
t-SNE toolkit
The original t-SNE implementation (in many programming languages), written and documented by its author.
t-SNE explained
A very readable blog walking you step-by-step through the entire t-SNE theory and how to implement it!
How to use t-SNE effectively
A simple interactive webpage to try out t-SNE with various parameters and understanding its perplexity parameter setting.
Approximate Nearest Neighbors (ANN)
A complete, easy-to-use, C++ library for performing nearest neighbors search in n dimensions.
Autoencoders tutorial
A very readable but comprehensive tutorial on autoencoders (a type of dimensionality reduction), including Python source code.
Dimensionality reduction in scikit.learn
scikit.learn is one of the most used toolkits for data analysis and machine learning out there. The above link points to the part of scikit.learn that implements projections. Its learning curve can be steep, but once done, this offers you tens of powerful algorithms for doing dimensionality reduction of virtually any kind of dataset!
Projections benchmark
This is the largest-scale benchmark for comparing projection algorithms in existence. It covers 45 projection techniques, 19 datasets, and 6 quality metrics. If you are interested to understand how algorithm X is better than Y, or overall learn more about the plethora of projection algorithms, this is the place to start. For a shorter intro, read the paper.