Parallel computing
pmatmat functions
- lazylinop.wip.parallel.pmatmat(L, method='thread', nworkers=None, use_matvec=False, **kwargs)
Builds the parallelized
LazyLinOp
version ofL
.L
’smatmat
/__matmul__
is parallelized according to the selectedmethod
. To computeL @ A
in parallel, the columns ofA
are “evenly” assigned to several parallel/concurrent workers. Each worker computes its part of the product.Note that
L.H
/L.T
multiplication is also parallelized in the same way.For a better understanding and efficient use please see the pmatmat notebook.
- Args:
- L:
LazyLinOp
The operator to parallelize.
- method:
str
'thread'
: seepmatmat_multithread()
'process'
: seepmatmat_multiprocess()
'mpi'
: seepmatmat_mpi()
- nworkers:
int
The number of workers used for parallelization. Defaultly, this is the number of CPUs/threads available on the system (
os.cpu_count()
). This parameter is ignored for'mpi'
method (it is fixed externally by thempiexec
/mpirun
command).- use_matvec:
bool
If
True
thematvec
function ofL
is used for parallelization otherwise onlymatmat
is used (default).For
use_matvec=True
, a parallelizedmatmat
is automatically defined usingL
’s pre-definedmatvec
. Each worker makes a series of sequential calls tomatvec
in parallel (or concurrently) to other workers. The worker callsmatvec
on columns it has been assigned.Warning
use_matvec
does not go without any consequence on the computing performance. In most cases methodFalse
should be more efficient. This is the default method .- **kwargs:
unpacked dict
Specialized arguments corresponding to the method used.
- L:
Warning
To avoid CPU oversubscribing, it can be useful to disable multithreading of underlying libraries. For example, NumPy multithreading is disabled by setting
OMP_NUM_THREADS
,MKL_NUM_THREADS
orOPENBLAS_NUM_THREADS
environment variables to'1'
(it must be done before importing NumPy). An alternative to environment variables is to use the threadpoolctl or numthreads libraries.SciPy matrices
The parallelization of
matmat
only supports NumPy array operands. A way to handleL @ S
withS
a SciPy matrix, is to:first, convert
S
to aLazyLinOp
SL
(seeaslazylinop()
),second, parallelize the
LazyLinOp
productL @ SL
:pmatmat(L @ SL)
.
- Returns:
a
LazyLinOp
that is able to compute the productL @ A
in parallel according to the chosenmethod
.
Warning
The example below is not efficient. The only point is to illustrate how to basically use the function. For a better understanding and efficient use please see the pmatmat notebook.
- Example:
>>> from lazylinop import aslazylinop >>> from lazylinop.wip.parallel import pmatmat >>> shape = (15, 20) >>> import numpy as np >>> M = np.random.rand(*shape) >>> A = np.random.rand(shape[1], 32) >>> L = aslazylinop(M) >>> pL = pmatmat(L) >>> # pL matmat is parallelized using default thread method >>> LA = L @ A # seqential mul >>> pLA = pL @ A # parallel mul >>> np.allclose(pLA, LA) True >>> np.allclose(pLA, M @ A) True
- lazylinop.wip.parallel.pmatmat_multithread(L, nworkers=None, use_matvec=False)
Implements
pmatmat()
using Python threads (threading package) for parallelization.Warning
Please note that GIL can slow down significantly the computation depending on how
L.matmat
/matvec
is implemented. In case GIL issues cannot be avoided you might use process-based methods ofpmatmat()
.- Returns:
A thread parallelized
LazyLinOp
.
- lazylinop.wip.parallel.pmatmat_multiprocess(L, nworkers=None, use_matvec=False, procs_created_once=False)
Implements
pmatmat()
using Python multiprocessing.- Args:
- L:
see
pmatmat()
- nworkers:
see
pmatmat()
- use_matvec:
see
pmatmat()
- procs_created_once: (bool)
If
True
, processes are created beforehand and then consume/compute multiplications on the fly. Otherwise (False
) processes are created each time a multiplication is calculated.procs_created_once=True
hardly outperformsFalse
case because it needs to make IPC (Inter-Process Communication) to send slices ofA
(if we computeL @ A
) to processes created beforehand. In case ofprocs_created_once=False
A
is auto-shared by process forking (without any copy) but the resulting slices of multiplication still imply a need for IPC.
- lazylinop.wip.parallel.pmatmat_mpi(L, use_matvec=False, scatter_A=False)
MPILop operator
- class lazylinop.wip.parallel.MPILop(*args, **kwargs)
Implements a
LazyLinOp
based on MPI to compute the multiplication (matmat
andrmatmat
).This operator basically distributes slices of the underlying matrix to MPI processes that compute their part of the multiplication, then the multiplication result slices are aggregated by the rank-0 MPI process.
Note
This operator is not to be confound with
pmatmat_mpi()
. Besides, the functionpmatmat_mpi
should not be used to parallelize furthermore aMPILop
.See also:
pmatmat()
.MPILop constructor.
- Args:
- shape:
the shape of the operator (must match mat_npz array, it is needed because only 0-rank MPI process will read the file).
- mat_npz:
the file (in npz format) to load the array from.
- mat_dtype:
the dtype corresponding to mat_npz (it is needed because only 0-rank MPI process will read the file).
- bcast_op:
True (default) to broadcast the the multiplication operand of the operator to all MPI processes (from the rank-0 process) and False to load it locally for each process (typically, if only the 0-rank process gets the operand, for example through a prompt that a user fills with any info that defines the operand, you’ll use bcast_op==True. Otherwise if all processes have access to the operand, no need to broadcast it).
Note
This operator should be used with OpenMP enabled or it won’t be efficient to compute the distributed multiplication. Mixing multithreading (from NumPy) with multiprocessing is often counterproductive.
- Example:
>>> # disable OpenMP before all things >>> from os import environ >>> environ['OMP_NUM_THREADS'] = '1' >>> from lazylinop.wip import MPILop >>> shape = (15, 20) >>> import numpy as np >>> M = np.random.rand(*shape) >>> np.savez('M.npz', M) >>> MPILop(shape, 'M.npz', np.dtype('float64'), bcast_op=False) <15x20 MPILop with unspecified dtype>
See also bm_np_mul.py and bm_mpilop.py for valuable examples and 2023-07-04 – MPILop.