Parallel Algorithms (WISM 459), 2021/2022
Teacher
Rob Bisseling.
Teaching assistant
Katharina Klein
Time and place
Every Wednesday from 10.0012.45 hour on location at Utrecht Science Park:
Room BBG017 (Buys Ballot Building), except on Oct. 13, when we occupy room HFG611.
The format is 2x45 minutes lectures (1010.45 hour, 1111.45 hour),
followed by 45 minutes laboratory class (1212.45 hour),
which is an interactive session. In between we have 15 minute breaks.
We follow the current RIVM rules to fight the pandemic.
Last year's lectures were recorded, and they will be made available
on the Mastermath Vimeo server. If you miss a class because you have a cold, or are otherwise
ill disposed you can view these recordings instead, and contact me for interaction
by any of the other electronic means (email, Teams, Zoom).
But nothing beats a live lecture, with human interaction,
and you will get a lot more from class if you physically attend.
First lecture: September 15, 2021.
Midterm exam: Oct 20, 11.3013.30 hour. Room EDU ALFA in the Educatorium.
Retake exam: Dec 15, 11.0013.00 hour. Room Minnaert 206.
The exams will be open book.
Intended audience
This is a Master's course which is an elective course for students
of the Master's programme Mathematical Sciences
of Utrecht University,
part of the track Applied Mathematics, Complex Systems, and
Scientific Computing.
The course is recommended for students in Mathematics or Computer Science
who are interested in scientific computing
and would like to obtain a first `handson experience'
in parallel programming.
It is also highly suitable for the Honours master
programme in Theoretical Physics and Mathematical Sciences
and for students interested in Computational Physics.
The course is part of the Dutch National Master
Mastermath and
hence is open for all interested master students in Mathematics
in the Netherlands.
Registration for everybody is through Mastermath.
Course aim
Students will learn how to design a parallel algorithm for a
problem from the area of scientific computing
and how to write a parallel program that
solves the problem.
Learning goals
After completion of the course, the student is able to

design a parallel algorithm for a problem from the area of scientific computing or big data;

analyse the cost of the algorithm in terms of computing time, communication time, and synchronisation time;
 write a parallel program based on an algorithm that solves the problem;
 write a report on the algorithm, its analysis, and numerical experiments performed, in a concise and readable form,
 present the results in an oral presentation, in a clear manner, highlighting the most important ideas.
Credits
You get 8 ECTS credit points.
Contents
Today, parallel computers are appearing on our desktops. The advent of dualcore, quadcore, and octacore computers and the expected increase in the number of cores in the coming years causes a major change in our approach to software, such as the mathematical software we use in scientific computations and in the emerging area of big data computations.
Parallel computers drastically increase our computational capabilities and thus enable us to model more realistically in many application areas. To make efficient use of parallel computers, it is necessary to reorganise the structure of our computational methods. In particular, attention must be paid to the division of work among the different processors solving a problem in parallel and to the communication between them. Suitable parallel algorithms and systems software are needed to realise the capabilities of parallel computers.
We will discuss extensively the most recent developments in the area of parallel computers, ranging from multicore laptop or desktop PCs to clusters of PCs connected by switching devices, to massively parallel computers with distributed memory such as our national supercomputer Snellius at SURF in Amsterdam, to be officially opened by our Queen on September 16, 2021.
The following subjects will be treated:
 Types of existing parallel computers.
 Principles of parallel computation: distributing the work evenly
and avoiding communication.
 The Bulk Synchronous Parallel (BSP) model as an idealised model
of a parallel computer.
 Use of BSPlib software as the basis for architectureindependent
programs.

Parallel algorithms for the following problems: inner product computation, sorting, prime number sieving, LU decomposition, sparse matrixvector multiplication, iterative solution of linear systems, graph matching.
 Analysis of the computation. communication and synchronisation time requirements of these algorithms by the BSP model.
 Handson experience using your own laptop or the Snellius supercomputer.
Language
The course will be given in English.
All reading material will be in English.
Prerequisites
Introductory course in linear algebra. Some knowledge of algorithms. Knowledge of a modern programming language such as C, C++, C#, Python, or Java. Basic knowledge of C is helpful, as we will use this language in class, but we will organise the course in such a way that there will be time to adapt to it for those coming from another language.
There are plenty of books on C;
you may consider the book
Practical C Programming by Steve Oualline.
You can also use C++ in class, if you already know that language.
Some knowledge of the UNIX/Linux operating system is helpful.
Software
We will make use of the
MulticoreBSP for C software developed by AlbertJan Yzelman, version 2.0.4 (March 2019).
This sofware runs on sharedmemory multicore PCs, and you can also run
your program without any changes on distributedmemory machines
such as Snellius.
C++ programmers can use
Bulk, version 2.0.0 (March 27, 2020)
as a communications library, instead of MulticoreBSP for C.
It is a very modern library that can run on both sharedmemory and distributedmemory machines,
and on hybrids of the two, without any change.
See
Bulk: a Modern C++ Interface for BulkSynchronous Parallel Programs by JanWillem Buurlage, Tom Bannink, and Rob H. Bisseling,
in Proceedings EuroPar 2018, Lecture Notes in Computer Science, Vol. 11014, Springer, 2018, pp. 519532.
Published version.
Hardware
Recommended: Use your own device!
Please install the MulticoreBSP library for C (in most cases)
or Bulk for C++ (only if you are an experienced C++ user).
On Macs and Linux computers this is straightforward.
On Macs you have UNIX through your terminal.
On Windows machines you may want to use the Windows Subsystem for Linux (WSL).
For Windows, it is more difficult to get the software running and
you may want to ask assistance.
You will get access to the national supercomputer Snellius,
where you can
use BSPonMPI which will be installed by SURF,
or Bulk, both giving access to thousands of cores.
Examination
The examination is in the form of an initial assignment (20%), a final assignment (40%), a presentation on the final assignment (20%), and a midterm exam (20%). The two assignments are carried out at home.
A written report must be returned for each assignment before the deadline. Students can work individually or in pairs (but not in larger teams) on the computer programs and can hand in a single report, provided each did a fair share of the work and can account for that. The contribution of each individual must then be stated briefly in the report. Each participant has to give an individual presentation on the chosen topic of the final assignment.
Presentations will be scheduled on 8 and 22 December 2021 (with additional
dates outside the regular schedule if needed).
The midterm exam must be passed with a grade of 5.0 (on a scale of 10).
All students should submit reports for the assignments
electronically in PDF format through the ELO system of Mastermath.
There is no further homework.
All students must use LaTeX for the assignments.
The midterm exam can be retaken near the end of the course (in December) by any course participant.
The maximum of the two exam grades counts as the midterm exam grade.
In case of illness, or for another urgent reason, the final presentation can be retaken
on an individual basis in January 2021.
In case of an insufficient final grade for the whole course (<=5), the participant
is entitled to a retake, which means that one assignment can be redone within six weeks
after the final grading of the course.
Literature
We closely follow the book
Parallel
Scientific Computation: A Structured Approach using BSP (second edition)
(PSC2),
by Rob H. Bisseling,
Oxford University Press,
September 2020. ISBN 9780198788355 (for the paperback version).
The book is available as paperback, hardcover, ebook, and also in Oxford Scholarship Online.
Supplementary material such as all slides accompanying
the book and the software package BSPedupack 2.0 with all the programs of the book are available on the supplementary material page.
Here you can find all the lecture slides of this course.
Weekly schedule and links
This year we will cover parts of chapters 1, 2, 4, 5
of the second edition.
Wednesday September 15, 2021
The Bulk Synchronous Parallel model
(PSC2 section 1.2)
Parallel Inner Product Computation
(PSC2 section 1.3)
hour. Laboratory class.
Try your luck on the theoretical exercise 1.3.
Interesting links
 Top 500 of supercomputers
in the world.
 SURF
in Amsterdam, home of the Snellius national supercomputer
which we will use in our laboratory class and homework. It is a Lenovo machine.
Snellius will contain 76,832 AMD cores en 144 Nvidia A100 GPUs.
Its theoretical peak performance will be 14 Pflop/s.
Wednesday September 22, 2021.
BSPlib, the Bulk Synchronous Parallel library.
(PSC2 section 1.4)
Benchmarking
(PSC2 sections 1.51.7)
Laboratory class. Starting with BSPlib.
This is your first handson session with BSPlib.
Download the latest version of
BSPedupack,
my package of educational programs that
teaches how to use BSP.
Solve Exercise 1.3 from PSC2: try to run the benchmark,
exploring your first parallel environment: your own laptop.
Modify your program to send larger messages.
How does this influence g?
Start working on the first assignment, Exercise 1.7 from PSC2.
Design a parallel algorithm
for prime number generation.
Write your first parallel
program.
This program should generate all prime numbers below 10,000,000 in parallel.
Choosing the right distribution is the name of the game.
You should carry out the experiments of your first assignment
on your own parallel computer.
(This may be your laptop or desktop,
or a compute server of your own university.)
Snellius will only be available for the second assignment.
Hand in a report on Exercise 1.7 presenting your algorithm and the experimental results,
before the deadline Wednesday October 27, 10.00 hour.
Note that you can work in pairs.
Interesting links:
Wednesday September 29, 2021.
Parallel sample sort
(PSC2 section 1.8)
Program bspsort and bulk synchronous message passing
(PSC2 section 1.9)
Laboratory class. Discussion of Solution Exercise 1.2
Interesting links:
Wednesday October 6, 2021.
Sequential LU Decomposition
(PSC2 sections 2.12.2)
Parallel LU Decomposition
(PSC2 section 2.3)
Laboratory class. Discussion of report requirements
Wednesday October 13, 2021. Please note the exceptional location Hans Freudenthal building Room 611
TwoPhase Broadcasting
(PSC2 section 2.4)
Highperformance LU decomposition
(PSC2 section 2.5)
Laboratory class. Discussion of solution midterm exam from October 24, 2018
We will discuss Questions 2, 3, 4, which are representative for this year's
openbook exam. Please try first to solve the questions yourself.
Interesting links:
Wednesday October 20, 2021. Location: Educatorium Alfa
11.3013.30 Midterm exam on Chapters 1 and 2.
The exact exam material will be announced.
The exam will be held on location and you may use the book either as hardcopy
(easiest to use) or ebook/OSO,
the accompanying slides, your own notes,
but no other materials or the internet.
Interesting links: previous exams from the archive. Exams from 2012, 2013 are only on Chapter 1 of the material (60 minutes).
Exams 2017 on Chapter 1, 3 (90 minutes).
Please note that this year's exam (with open book)
will have no knowledge questions, which is usually the first question
of the exams <= 2019 below.
Wednesday October 27, 2021.
Sequential sparse matrix–vector multiplication
(PSC2 section 4.1)
Sparse matrices and their data structures
(PSC2 section 4.2)
Discussion of midterm exam
Wednesday November 3, 2021.
Parallel sparse matrixvector multiplication
(PSC2 section 4.3)
Cartesian matrix distributions
(PSC2 section 4.4)
Laboratory class
Choose an assignment for your final project. Possibilities:
You can parallelise your own problem, or choose one of the exercises from the book PSC2.
For example:
 solve Exercise 1.5 on parallel sorting algorithms;
 solve Exercise 2.7 on boolean matrix multiplication;
 solve Exercise 2.8 on Strassen matrix multiplication;
 solve Exercise 3.5 on 3D Fast Fourier Transform
 solve Exercise 3.8 on the Number Theoretic Transform
 solve Exercise 3.10 on decimals of pi
 solve Exercise 4.6 on connected components in a 3D image;
 solve Exercise 4.8 on optimal hypergraph partitioning;
 solve Exercise 4.10 on sparse matrixvector multiplication in C++ using Bulk;
 solve Exercise 5.4 on the maximum independent set problem in graphs;
 solve Exercise 5.5 on solving the nqueens problem
on a chessboard;
 solve Exercise 5.6 on counting selfavoiding walks (SAWs)
on a graph or a 2D or 3D lattice;
 solve Exercise 5.7 on counting triangles in a social network graph;
 solve Exercise 5.8 on computing betweenness centrality on a graph.
 solve Exercise 5.9 on optimal bipartite graph matching.
 participate in the Huawei Hyper challenge
Please try to discuss the chosen topic with the teacher before November 17, 2021 and register
the topic by sending an email to the teacher. You can work in pairs, but have to give individual presentations
on the topic. It is OK for teams to choose the same topic, but please try to be original in your approach.
Wednesday November 10, 2021.
Mondriaan sparse matrix distribution
(PSC2 section 4.5)
Mediumgrain method for sparse matrix distribution
(PSC2 section 4.6)
Laboratory class. Discussion of first assignment
Interesting links:
Wednesday November 17, 2021. Guest lecture AlbertJan Yzelman (Huawei Research Zurich) "Algebraic programming"
The lecture (2x 45 min) will be followed by a question and answer session.
AlbertJan Yzelman has a PhD from Utrecht University. He is coauthor of the MulticoreBSP for C library that we use in class and of the Mondriaan package. One of his recent contributions to opensource software is an implementation of the GraphBLAS in C.
Abstract: Algebraic Programming annotates operations with algebraic relations explicitly, while simultaneously maintaining a datacentric view of the containers being operated on. Its systematic application allows automatic parallelisation, vectorisation, and fusion. Programmers, meanwhile, maintain an easytounderstand view of their codes, with clear semantics that follows basic mathematics.
While the recent GraphBLAS standard promotes the use of generalised linear algebra for graph algorithms specifically, the same concepts could play a role for generalpurpose computations. By involving both generic programming techniques and MLIR, we furthermore transcend the boundaries of classical library approaches by, for example, allowing automated optimisations across operations.
Finally, performance semantics attached to algebraic operations allows the systematic performance characterisation of workloads, guides good algorithm design, and may guide automatic compiletime and runtime optimisations.
Interesting links
Wednesday November 24, 2021.
Laplacian matrices
(PSC2 section 4.9)
Sequential graph matching
(PSC2 sections 5.15.2)
Exercise class
Discuss your topic for the final project with the teachers.
Wednesday December 1, 2021.
Suitors and sorting
(PSC2 section 5.3)
Parallel graph matching
(PSC2 section 5.4)
Wednesday December 8, 2021.
Presentations final project
Wednesday December 15, 2021. Location BBG020
11.0013.00 Retake midterm exam
Wednesday December 22, 2021.
More presentations final project
Deadline second assignment Monday January 17, 2022, 12.00 hour.
Hand in a report (through the Mastermath ELO, in PDF) before the deadline.
Only one report per team. No need to send in duplicates.
Please use the batch system of Snellius and not the interactive system
for test runs of your parallel program.
The interactive system is only for development.
Running your parallel program on hundreds of processor cores
will give you a bonus.
Please fill out the Mastermath digital evaluation form,
which will be sent to you around the deadline.
Frequently asked questions
Last update of this page: November 23, 2021
Back to homepage of Rob Bisseling