Parallel Algorithms (WISM 459), 2022/2023
Teacher
Rob Bisseling.
Teaching assistant
TBA
Time and place
Every Wednesday from 10.0012.45 hour on location at Utrecht Science Park:
Room HFG611.
The format is 2x45 minutes lectures (1010.45 hour, 1111.45 hour),
followed by 45 minutes laboratory class (1212.45 hour),
which is an interactive session. In between we have 15 minute breaks.
I have made a set of 26 videos of length 1020 minutes, which contain
the most important parts of my lectures.
The videos also have interactive questions and final questions with
solutions given in a separate document (as a PDF file).
The videos will become available through YouTube
and a Utrecht University video repository.
If you miss a class because you have a cold, or are otherwise
ill disposed you can view these recordings instead,
and contact me for interaction
by any of the other electronic means (email, Teams).
But nothing beats a live lecture, with human interaction,
and important practicals,
and you will get a lot more from class if you physically attend.
First lecture: September 14, 2022.
Midterm exam: Oct 26, 2022, 10.0012.00 hour. Room EDU ALFA in the Educatorium.
Retake exam: Dec 21, 2022, 10.0012.00 hour. Room Ruppert Paars.
The exams will be closed book.
Intended audience
This is a Master's course which is an elective course for students
of the Master's programme Mathematical Sciences
of Utrecht University,
part of the track Applied Mathematics, Complex Systems, and
Scientific Computing.
The course is recommended for students in Mathematics or Computer Science
who are interested in scientific computing
and would like to obtain a first `handson experience'
in parallel programming.
It is also highly suitable for the Honours master
programme in Theoretical Physics and Mathematical Sciences
and for students interested in Computational Physics.
The course is part of the Dutch National Master
Mastermath and
hence is open for all interested master students in Mathematics
in the Netherlands.
Registration for everybody is through Mastermath.
Course aim
Students will learn how to design a parallel algorithm for a
problem from the area of scientific computing
and how to write a parallel program that
solves the problem.
Learning goals
After completion of the course, the student is able to

design a parallel algorithm for a problem from the area of scientific computing or big data;

analyse the cost of the algorithm in terms of computing time, communication time, and synchronisation time;
 write a parallel program based on an algorithm that solves the problem;
 write a report on the algorithm, its analysis, and numerical experiments performed, in a concise and readable form,
 present the results in an oral presentation, in a clear manner, highlighting the most important ideas.
Credits
You get 8 ECTS credit points.
Contents
Today, parallel computers are appearing on our desktops. The advent of dualcore, quadcore, and octacore computers and the expected increase in the number of cores in the coming years causes a major change in our approach to software, such as the mathematical software we use in scientific computations and in the emerging area of big data computations.
Parallel computers drastically increase our computational capabilities and thus enable us to model more realistically in many application areas. To make efficient use of parallel computers, it is necessary to reorganise the structure of our computational methods. In particular, attention must be paid to the division of work among the different processors solving a problem in parallel and to the communication between them. Suitable parallel algorithms and systems software are needed to realise the capabilities of parallel computers.
We will discuss extensively the most recent developments in the area of parallel computers, ranging from multicore laptop or desktop PCs to clusters of PCs connected by switching devices, to massively parallel computers with distributed memory such as our national supercomputer Snellius at SURF in Amsterdam, which was officially opened by our Queen on September 16, 2021.
The following subjects will be treated:
 Types of existing parallel computers.
 Principles of parallel computation: distributing the work evenly
and avoiding communication.
 The Bulk Synchronous Parallel (BSP) model as an idealised model
of a parallel computer.
 Use of BSPlib software as the basis for architectureindependent
programs.

Parallel algorithms for the following problems: inner product computation, sorting, prime number sieving, LU decomposition, sparse matrixvector multiplication, iterative solution of linear systems, graph matching.
 Analysis of the computation, communication, and synchronisation time requirements of these algorithms by the BSP model.
 Handson experience using your own laptop or the Snellius supercomputer.
Language
The course will be given in English.
All reading material will be in English.
Prerequisites
Introductory course in linear algebra. Some knowledge of algorithms.
Knowledge of a modern programming language such as C, C++, C#, Python, or Java. Basic knowledge of C is helpful, as we will use this language in class, but we will organise the course in such a way that there will be time to adapt to it for those coming from another language. For those with little or no experience with C, this course is an opportunity to learn C, which is much faster than Python and hence commonly used in high performance computing.
There are plenty of books on C;
you may consider the book
Practical C Programming by Steve Oualline.
You can also use C++ in class, if you already know that language.
Some knowledge of the UNIX/Linux operating system is helpful.
Software
We will make use of the
MulticoreBSP for C software developed by AlbertJan Yzelman, version 2.0.4 (March 2019).
This software runs on sharedmemory multicore PCs, and you can also run
your program without any changes on distributedmemory machines
such as Snellius.
C++ programmers can use
Bulk, version 3.0.0 (August 19, 2021)
as a communications library, instead of MulticoreBSP for C.
It is a very modern library that can run on both sharedmemory and distributedmemory machines,
and on hybrids of the two, without any change.
See
Bulk: a Modern C++ Interface for BulkSynchronous Parallel Programs by JanWillem Buurlage, Tom Bannink, and Rob H. Bisseling,
in Proceedings EuroPar 2018, Lecture Notes in Computer Science, Vol. 11014, Springer, 2018, pp. 519532.
Published version.
Hardware
Recommended: Use your own device!
Please install the MulticoreBSP library for C (in most cases)
or Bulk for C++ (only if you are an experienced C++ user).
On Macs and Linux computers this is straightforward.
On Macs you have UNIX through your terminal.
On Windows machines you may want to use the Windows Subsystem for Linux (WSL).
For Windows, it is more difficult to get the software running and
you may want to ask assistance.
You will get access to the national supercomputer Snellius,
where you can
use BSPonMPI which has been installed by SURF,
or Bulk, both giving access to thousands of cores.
Examination
The examination is in the form of an initial assignment (20%), a final assignment (40%), a presentation on the final assignment (20%), and a midterm exam (20%). The two assignments are carried out at home.
A written report must be returned for each assignment before the deadline. Students can work individually or in pairs (but not in larger teams) on the computer programs and can hand in a single report, provided each did a fair share of the work and can account for that. The contribution of each individual must then be stated briefly in the report. Each participant has to give an individual presentation on the chosen topic of the final assignment.
Presentations will be scheduled on November 30, December 7 and 14, 2022.
The midterm exam must be passed with a grade of 5.0 (on a scale of 10).
All students should submit reports for the assignments
electronically in PDF format through the ELO system of Mastermath.
There is no further homework.
All students must use LaTeX for the assignments.
The midterm exam can be retaken near the end of the course (in December) by any course participant.
The maximum of the two exam grades counts as the midterm exam grade.
In case of illness, or for another urgent reason, the final presentation can be retaken
on an individual basis in January 2023.
In case of an insufficient final grade for the whole course (<=5), the participant
is entitled to a retake, which means that one assignment can be redone within six weeks
after the final grading of the course.
Literature
We closely follow the book
Parallel
Scientific Computation: A Structured Approach using BSP (second edition)
(PSC2),
by Rob H. Bisseling,
Oxford University Press,
September 2020. ISBN 9780198788355 (for the paperback version).
The book is available as paperback, hardback, ebook, and also in Oxford Scholarship Online.
Bol.com has a small stock of the paperback version, and if you order early
this will be quicker and probably cheaper. (At some point they will have to restock.)
Supplementary material such as all slides accompanying
the book and the software package BSPedupack 2.0 with all the programs of the book are available on the supplementary material page.
Here you can find all the lecture slides of this course and also pointers to the videos
when they appear.
Weekly schedule and links
This year we will cover parts of chapters 1, 2, 4, 5
of the second edition (PSC2).
Wednesday September 14, 2022
The Bulk Synchronous Parallel model
(PSC2 section 1.2)
Parallel Inner Product Computation
(PSC2 section 1.3)
Laboratory class.
Try your luck on the theoretical exercise 1.2.
Interesting links
 Top 500 of supercomputers
in the world.
 SURF
in Amsterdam, home of the Snellius national supercomputer
which we will use in our laboratory class and homework. It is a Lenovo machine.
Snellius contains 73,728 AMD cores en 144 Nvidia A100 GPUs.
Its theoretical peak performance for the AMD cores is 3 Pflop/s,
which will be extended in the coming years to 14 Pflop/s.
Snellius ranks number 338 on the June 2022 Top500 list of supercomputers.
 Wiki on Snellius
Wednesday September 21, 2022.
BSPlib, the Bulk Synchronous Parallel library.
(PSC2 section 1.4)
Benchmarking
(PSC2 sections 1.51.7)
Laboratory class. Starting with BSPlib.
This is your first handson session with BSPlib.
Download the latest version of
BSPedupack,
my package of educational programs that
teaches how to use BSP.
Solve Exercise 1.3 from PSC2: try to run the benchmark,
exploring your first parallel environment: your own laptop.
Modify your program to send larger messages.
How does this influence g?
Start working on the first assignment, Exercise 1.7 from PSC2.
Design a parallel algorithm
for prime number generation.
Write your first parallel
program.
This program should generate all prime numbers below 10,000,000 in parallel.
Choosing the right distribution is the name of the game.
You should carry out the experiments of your first assignment
on your own parallel computer.
(This may be your laptop or desktop,
or a compute server of your own university.)
Snellius will only be available for the second assignment.
Hand in a report on Exercise 1.7 presenting your algorithm and the experimental results,
before the deadline Wednesday November 2, 10.00 hour.
Note that you can work in pairs.
Interesting links:
Wednesday September 28, 2022.
Parallel sample sort
(PSC2 section 1.8)
Program bspsort and bulk synchronous message passing
(PSC2 section 1.9)
Laboratory class. Discussion of Solution Exercise 1.2
Interesting links:
Wednesday October 5, 2022.
Sequential LU Decomposition
(PSC2 sections 2.12.2)
Parallel LU Decomposition
(PSC2 section 2.3)
Laboratory class. Discussion of report requirements
Wednesday October 12, 2022.
TwoPhase Broadcasting
(PSC2 section 2.4)
Highperformance LU decomposition
(PSC2 section 2.5)
Laboratory class.
Work on your assignment.
Interesting links:
Wednesday October 19, 2022.
Sequential sparse matrix–vector multiplication
(PSC2 section 4.1)
Sparse matrices and their data structures
(PSC2 section 4.2)
Laboratory class. Discussion of solution midterm exam from October 24, 2018
We will discuss all questions, which are representative for this year's
closedbook exam. Please try first to solve the questions yourself.
Wednesday October 26, 2022. Location: Educatorium Alfa
10.0012.00 Midterm exam on Chapters 1 and 2.
The exact exam material will be announced in the study guide.
Good material to practice are also the final questions at the end of my videos.
The solutions are provided in a separate PDF document.
Interesting links: previous exams from the archive. Exams from 2012, 2013 are only on Chapter 1 of the material (60 minutes).
Exams 2017 on Chapter 1, 3 (90 minutes).
Wednesday November 2, 2022.
Parallel sparse matrixvector multiplication
(PSC2 section 4.3)
Cartesian matrix distributions
(PSC2 section 4.4)
Laboratory class
Choose an assignment for your final project. Possibilities:
You can parallelise your own problem, or choose one of the exercises from the book PSC2.
For example:
 solve Exercise 1.5 on parallel sorting algorithms;
 solve Exercise 2.7 on boolean matrix multiplication;
 solve Exercise 2.8 on Strassen matrix multiplication;
 solve Exercise 3.5 on 3D Fast Fourier Transform;
 solve Exercise 3.8 on the Number Theoretic Transform;
 solve Exercise 3.10 on decimals of pi;
 solve Exercise 4.6 on connected components in a 3D image;
 solve Exercise 4.8 on optimal hypergraph partitioning;
 solve Exercise 4.10 on sparse matrixvector multiplication in C++ using Bulk;
 solve Exercise 5.4 on the maximum independent set problem in graphs;
 solve Exercise 5.5 on solving the nqueens problem
on a chessboard;
 solve Exercise 5.6 on counting selfavoiding walks (SAWs)
on a graph or a 2D or 3D lattice;
 solve Exercise 5.7 on counting triangles in a social network graph;
 solve Exercise 5.8 on computing betweenness centrality on a graph;
 solve Exercise 5.9 on optimal bipartite graph matching.
Please try to discuss the chosen topic with the teacher before November 16, 2022 and register
the topic by sending an email to the teacher. You can work in pairs, but have to give individual presentations
on the topic. It is OK for teams to choose the same topic, but please try to be original in your approach.
Wednesday November 9, 2022.
Mondriaan sparse matrix distribution
(PSC2 section 4.5)
Mediumgrain method for sparse matrix distribution
(PSC2 section 4.6)
Laboratory class. Discussion of the midterm exam
Interesting links:
Wednesday November 16, 2022
Laplacian matrices
(PSC2 section 4.9)
Sequential graph matching
(PSC2 sections 5.15.2)
Exercise class. Discussion of the first assignment.
Wednesday November 23, 2022.
Suitors and sorting
(PSC2 section 5.3)
Parallel graph matching
(PSC2 section 5.4)
Laboratory class
Wednesday November 30, 2022.
Presentations final project
Laboratory class
Wednesday December 7, 2022.
Presentations final project
Laboratory class
Wednesday December 14, 2022.
Presentations final project
Wednesday December 21, 2022. Location Ruppert Paars
10.0012.00 Retake midterm exam
Deadline second assignment Monday January 16, 2023, 12.00 hour.
Hand in a report (through the Mastermath ELO, in PDF) before the deadline.
Only one report per team. No need to send in duplicates.
Please use the batch system of Snellius and not the interactive system
for test runs of your parallel program.
The interactive system is only for development.
Running your parallel program on hundreds of processor cores
will give you a bonus.
Please fill out the Mastermath digital evaluation form,
which will be sent to you around the deadline.
Frequently asked questions
Last update of this page: September 14, 2022
Back to homepage of Rob Bisseling