User's guide MondriaanOpt
This page is continuously being improved and updated; therefore, a more recent version may be obtained online. This offline version is bundled with the software for your convenience.
Whereas Mondriaan uses heuristics to obtain good partitionings for sparse matrix-vector multiplication for any number of processors, MondriaanOpt calculates an optimal solution to this partitioning problem with 2 processors. More precisely, it calculates a partitioning with minimum communication volume among all solutions that obey the imbalance constraint.
A database with already solved problems with use of MondriaanOpt can be found online.
How to install MondriaanOpt
MondriaanOpt comes packaged with the Mondriaan software. Refer to this page for
instructions on using Mondriaan. MondriaanOpt is automatically compiled when you compile Mondriaan. The executable
is then available at tools/MondriaanOpt
.
How to run MondriaanOpt
The MondriaanOpt program has the following interface:
% ./tools/MondriaanOpt matrix [P [eps]] [options]
One, two or three parameters may be passed, after which further options may be given. Either [eps], -e or -k must be passed, and it is advised to pass -v (see options). Take note that while MondriaanOpt may be called with the same parameters as Mondriaan, the actual problem being solved may be slightly different.
Some equivalent examples are:
% ./tools/MondriaanOpt tests/arc130.mtx 2 0.03 -v 17
% ./tools/MondriaanOpt tests/arc130.mtx -e 0.03 -v 17
% ./tools/MondriaanOpt tests/arc130.mtx -k 660 -v 17
The above examples partition the arc130.mtx
matrix (Matrix Market file format)
for 2 processors with at most 3% load imbalance, knowing that solutions must exist with
volume at most 17. The matrix should be the full relative path; in the above example
output is saved in the Mondriaan tests folder (../tests/
).
Output
The MondriaanOpt
tool yields, after a successful run on an input matrix,
various output files. All possible output files are described below. Typically,
the output filenames are that of the input matrix filename, modified with a small
descriptor and the number of parts (=2).
Formats with free nonzeros
All assigned nonzeros are assigned a processor-number to be assigned to, either 1 or 2.
All free nonzeros will be assigned index 3.
(Free nonzeros are nonzeros that are not assigned to a processor because assigning
it to either one will not influence communication volume.)
To stress the potential presence of free nonzeros, the number of processors (2) in the
filename is followed by a suffix f
.
Processor indices (-I2f
)
The MondriaanOpt
program writes the processor indices of each
nonzero to the Matrix Market file input-2f.mtx
where the value of each
nonzero is replaced by the processor index to which the nonzero has been assigned.
Graphical output (-2f.svg
)
If the option -svg
is given, at the end of the algorithm an SVG graphic is written to the file input-2f.svg
,
containing a visualisation of the partitioning.
Formats without free nonzeros
Here, the free nonzeros of a partitioning are distributed among the two processors in
such a way that load imbalance is kept at a minimum.
Note that whenever we write P
for the number of processors below, it implicitly equals 2.
Distributed matrix (-P2
)
The MondriaanOpt
program
writes the distributed matrix to a file called input-P2
,
where input
is the name of the input matrix.
We use an adapted Matrix Market format, with this structure:
%%MatrixMarket distributed-matrix coordinate real general
( this should be 0 )
m n nnz P
Pstart[0]
...
...
...
Pstart[P]
( this should be nnz )
A.i[0] A.j[0] A.value[0]
...
...
...
A.i[nnz-1] A.j[nnz-1] A.value[nnz-1]
Here, Pstart[k]
points to the start of the nonzeroes
of processor k.
Processor indices (-I2
)
The MondriaanOpt
program
also writes the processor indices of each nonzero to the Matrix Market file input-I2
where the value of each nonzero is replaced by the processor index to which
the nonzero has been assigned. The order of the nonzeroes is exactly that of the distributed matrix (-P2
).
Cartesian submatrices (-C2
)
The program writes the row index sets I(q)
and column index sets J(q) of the Cartesian submatrix I(q) x J(q)
for the processors q=1,...,P to the file called input-C2
.
This file is additional information, useful e.g. for visualisation,
and you may not need it.
Graphical output (-2.svg
)
If the option -svg
is given, at the end of the algorithm an SVG graphic is written to the file input-2.svg
,
containing a visualisation of the partitioning.
Output to stdout
/stderr
In a successful run, at the end of execution general statistics are written to stdout
.
Also, during such a run, every 2^23 = 8388608
iterations the current depth in the tree is written to stderr
, in the format `current depth`/`maximum depth`
.
Last but not least, every time a new solution is found which improves on the previous solution regarding total volume, a message is written to stderr
reporting the newly found volume and load distribution in the format `load P0`, `load P1`, `load Free`
.
Program options
The MondriaanOpt program has the following interface:
% ./tools/MondriaanOpt matrix [P [eps]] [options]
One, two or three parameters may be passed, after which further options may be given. An overview of the available parameters and options is given below.
Parameters
Parameter | Name | Description |
---|---|---|
matrix |
Matrix file | Required. The input matrix file in Matrix Market (.mtx) format |
P |
Number of processors | Present for consistency with other Mondriaan* commands. This parameter, if given, must be equal to 2. |
eps |
Load imbalance | The maximum allowed load imbalance |
Options
Apart from the matrix, at least one of [eps], -e or -k must be given, defining the maximum allowed load imbalance.
Option | Value | Description |
---|---|---|
-v | Volume |
Recommended. The starting upper bound volume.
This defaults to min(m,n)+1 , with m and n denoting the dimensions of the matrix to be partitioned.
While this is a valid upper bound, you may wish to pass a tighter upper bound to reduce computing time.
|
-e | Load imbalance | The maximum allowed load imbalance |
-k | Number of nonzeros | The maximum allowed number of nonzeros per part |
-t | Seconds | Max running time in seconds |
-h | None | Show help |
-svg | None | Write visualisations of the partitioning to .svg files |
Difference in imbalance constraints
While the command line interface of MondriaanOpt can be used just as Mondriaan, there is a subtle difference in the problem being solved in these two.
More precisely, with N
being the total number of nonzeros, p
being the total number of processors (which equals 2) and
load
the number of nonzeros assigned to a processor, compare:
- the imbalance constraint
load <= (1+epsilon) (N/p)
which is used in Mondriaan (In the code, this amounts tofloor( ((1+epsilon)*N)/p )
.), and - the imbalance constraint
load <= (1+epsilon) ceil(N/p)
which is used in MondriaanOpt.
As p=2
, this difference may only be of importance whenever N
is odd.
In [1,p.2] it is explained that this different choice was made to ensure feasibility of the problem, even if epsilon=0
.
Using MondriaanOpt in MATLAB
For more information about MATLAB usage, please see the Mondriaan MATLAB guide.
MondriaanOpt is available in Matlab using the MatlabMondriaanOpt
MEX routine. Example matlab files
are given as mondriaanOpt.m
and mondriaanOptPlot.m
.
References
[1] An exact algorithm for sparse matrix bipartitioning, Daniel M. Pelt and Rob H. Bisseling, Journal of Parallel and Distributed Computing, 85 (2015) pp. 79-90.
Last updated: September 7, 2017.
November 4, 2016 by Marco van Oort,
September 7, 2017 by Marco van Oort.
To
the Mondriaan package home page.