Parallel computing#
pmatmat functions#
- lazylinop.wip.parallel.pmatmat(L, method='thread', nworkers=None, use_matvec=False, **kwargs)#
Builds the parallelized
LazyLinOpversion ofL.L’smatmat/__matmul__is parallelized according to the selectedmethod. To computeL @ Ain parallel, the columns ofAare “evenly” assigned to several parallel/concurrent workers. Each worker computes its part of the product.Note that
L.H/L.Tmultiplication is also parallelized in the same way.For a better understanding and efficient use please see the pmatmat notebook.
- Args:
- L:
LazyLinOp The operator to parallelize.
- method:
str 'thread': seepmatmat_multithread()'process': seepmatmat_multiprocess()'mpi': seepmatmat_mpi()
- nworkers:
int The number of workers used for parallelization. Defaultly, this is the number of CPUs/threads available on the system (
os.cpu_count()). This parameter is ignored for'mpi'method (it is fixed externally by thempiexec/mpiruncommand).- use_matvec:
bool If
Truethematvecfunction ofLis used for parallelization otherwise onlymatmatis used (default).For
use_matvec=True, a parallelizedmatmatis automatically defined usingL’s pre-definedmatvec. Each worker makes a series of sequential calls tomatvecin parallel (or concurrently) to other workers. The worker callsmatvecon columns it has been assigned.Warning
use_matvecdoes not go without any consequence on the computing performance. In most cases methodFalseshould be more efficient. This is the default method .- **kwargs:
unpacked dict Specialized arguments corresponding to the method used.
- L:
Warning
To avoid CPU oversubscribing, it can be useful to disable multithreading of underlying libraries. For example, NumPy multithreading is disabled by setting
OMP_NUM_THREADS,MKL_NUM_THREADSorOPENBLAS_NUM_THREADSenvironment variables to'1'(it must be done before importing NumPy). An alternative to environment variables is to use the threadpoolctl or numthreads libraries.SciPy matrices
The parallelization of
matmatonly supports NumPy array operands. A way to handleL @ SwithSa SciPy matrix, is to:first, convert
Sto aLazyLinOpSL(seeaslazylinop()),second, parallelize the
LazyLinOpproductL @ SL:pmatmat(L @ SL).
- Returns:
a
LazyLinOpthat is able to compute the productL @ Ain parallel according to the chosenmethod.
Warning
The example below is not efficient. The only point is to illustrate how to basically use the function. For a better understanding and efficient use please see the pmatmat notebook.
- Example:
>>> from lazylinop import aslazylinop >>> from lazylinop.wip.parallel import pmatmat >>> shape = (15, 20) >>> import numpy as np >>> M = np.random.rand(*shape) >>> A = np.random.rand(shape[1], 32) >>> L = aslazylinop(M) >>> pL = pmatmat(L) >>> # pL matmat is parallelized using default thread method >>> LA = L @ A # seqential mul >>> pLA = pL @ A # parallel mul >>> np.allclose(pLA, LA) True >>> np.allclose(pLA, M @ A) True
- lazylinop.wip.parallel.pmatmat_multithread(L, nworkers=None, use_matvec=False)#
Implements
pmatmat()using Python threads (threading package) for parallelization.Warning
Please note that GIL can slow down significantly the computation depending on how
L.matmat/matvecis implemented. In case GIL issues cannot be avoided you might use process-based methods ofpmatmat().- Returns:
A thread parallelized
LazyLinOp.
- lazylinop.wip.parallel.pmatmat_multiprocess(L, nworkers=None, use_matvec=False, procs_created_once=False)#
Implements
pmatmat()using Python multiprocessing.- Args:
- L:
see
pmatmat()- nworkers:
see
pmatmat()- use_matvec:
see
pmatmat()- procs_created_once: (bool)
If
True, processes are created beforehand and then consume/compute multiplications on the fly. Otherwise (False) processes are created each time a multiplication is calculated.procs_created_once=Truehardly outperformsFalsecase because it needs to make IPC (Inter-Process Communication) to send slices ofA(if we computeL @ A) to processes created beforehand. In case ofprocs_created_once=FalseAis auto-shared by process forking (without any copy) but the resulting slices of multiplication still imply a need for IPC.
- lazylinop.wip.parallel.pmatmat_mpi(L, use_matvec=False, scatter_A=False)#
MPILop operator#
- class lazylinop.wip.parallel.MPILop(*args, **kwargs)#
Implements a
LazyLinOpbased on MPI to compute the multiplication (matmatandrmatmat).This operator basically distributes slices of the underlying matrix to MPI processes that compute their part of the multiplication, then the multiplication result slices are aggregated by the rank-0 MPI process.
Note
This operator is not to be confound with
pmatmat_mpi(). Besides, the functionpmatmat_mpishould not be used to parallelize furthermore aMPILop.See also:
pmatmat().MPILop constructor.
- Args:
- shape:
the shape of the operator (must match mat_npz array, it is needed because only 0-rank MPI process will read the file).
- mat_npz:
the file (in npz format) to load the array from.
- mat_dtype:
the dtype corresponding to mat_npz (it is needed because only 0-rank MPI process will read the file).
- bcast_op:
True (default) to broadcast the the multiplication operand of the operator to all MPI processes (from the rank-0 process) and False to load it locally for each process (typically, if only the 0-rank process gets the operand, for example through a prompt that a user fills with any info that defines the operand, you’ll use bcast_op==True. Otherwise if all processes have access to the operand, no need to broadcast it).
Note
This operator should be used with OpenMP enabled or it won’t be efficient to compute the distributed multiplication. Mixing multithreading (from NumPy) with multiprocessing is often counterproductive.
- Example:
>>> # disable OpenMP before all things >>> from os import environ >>> environ['OMP_NUM_THREADS'] = '1' >>> from lazylinop.wip import MPILop >>> shape = (15, 20) >>> import numpy as np >>> M = np.random.rand(*shape) >>> np.savez('M.npz', M) >>> MPILop(shape, 'M.npz', np.dtype('float64'), bcast_op=False) <15x20 MPILop with unspecified dtype>
See also bm_np_mul.py and bm_mpilop.py for valuable examples and 2023-07-04 – MPILop.
version 1.20.1 documentation