Butterfly module#

Preamble#

This module gathers utilities to manipulate and create a particular family of LazyLinOp associated to matrices with the so-called butterfly structure, i.e., which are expressed as a product of a few compatible Kronecker-Sparse factors. Classical examples of such operators are the Hadamard transform and the Discrete-Fourier-Transform.

Construction of a butterfly LazyLinOp from a given array.

The main function of the module is ksd() (Kronecker-sparse decomposition), which computes a near best approximation of a given array or of a LazyLinOp by an operator with a prescribed butterfly structure [1].

Example: Consider an \(N\times N\) Hadamard matrix (with \(N\) a power of two):

>>> import scipy as sp
>>> H = sp.linalg.hadamard(N)

Turning this NumPy array into an efficient butterfly implementation requires a simple call L = ksd(H, chain) where chain is a Chain object specifying the nature of the Kronecker-sparse factors involved in the decomposition. The classical chain associated to the Hadamard matrix is square-dyadic, and can be retrieved as follows:

>>> chain = Chain.square_dyadic(H.shape)
>>> L = ksd(H, chain)

It is also possible to use other chains (see below), leading to implementations with different tradeoffs between memory consumption, energy consumption and computational speed.

>>> chain2 = Chain.monarch(H.shape, ...)
>>> L2 = ksd(H, chain2)

Construction of individual or multiple Kronecker-sparse factors.

Under the hood, the operator L resulting from a call to ksd() is a (lazy) product of Kronecker-sparse factors, each of them benefitting from an optimized implementation on various backends via the function ksm() (Kronecker-sparse multiplication operator). A single Kronecker-sparse factor is a structure determined by a 4D array \(T\) of shape (a, b, c, d), and is fully determined as L = ksm(T) given a 4-dimensional array of values \(T_{i,j,k,l}\) of size \(a\times b\times c\times d\). The shape of L is then \(abd\times acd\), and converting it to an array would yield a structured sparse matrix with support (the set of possible nonzero entries) \(I_a\otimes 1_{b\times c}\otimes I_d\). See [1] and the documentation of ksm() for further details and illustrations.

Implementation parameters.

Options of ksm() (and of ksd()) notably allow to use either the CPU or the GPU with various implementation parameters, or to directly build a lazy product of Kronecker-sparse factors L = ksm([T1, ..., Tn], ...) to minimize memory movements between CPU and GPU when computing L @ x.

Saving and loading butterfly operators.

The LazyLinOp resulting from a call to ksm() or ksd() can be saved to disk (as a .json file) and load using:

>>> lazylinop.butterfly.save(L, 'filename')
>>> L = lazylinop.butterfly.load('filename')

Plot butterfly operators.

The LazyLinOp resulting from a call to ksm() or ksd() can be plotted or saved to disk (as a .png file and a .svg file) using:

>>> N = 16
>>> L = lazylinop.butterfly.dft(N)
>>> lazylinop.butterfly.plot(L)
>>> lazylinop.butterfly.plot(L, "dft_16x16")

Pre-built Discrete-Fourier-Transform.

We provide dft() a pre-built LazyLinOp L with the Butterfly structure corresponding to the Discrete-Fourier-Transform. The L corresponding to a DFT of a signal of size \(256\) with \(8\) factors (square-dyadic decomposition) is given by:

>>> L = lazylinop.butterfly.dft(256)

Pre-built Fast-Walsh-Hadamard-Transform.

We provide fwht() a pre-built LazyLinOp L with the Butterfly structure corresponding to the Fast-Walsh-Hadamard-Transform (FWHT). The L corresponding to a FWHT of a signal of size \(256\) with \(8\) factors (square-dyadic decomposition) is given by:

>>> L = lazylinop.butterfly.fwht(256)

Chains and their manipulation.

A chain encodes the structure of the Butterfly operator L and corresponds to a sequence of tuples

\[\begin{equation} \left(\left(a_l,~b_l,~c_l,~d_l\right)\right)_{l=1}^n \end{equation}\]

such that Kronecker-sparse operators \(L_l\) with the corresponding structure have compatible sizes, i.e. such that the product \(L = L_1\cdots L_n\) is well-defined. The shape of \(L\) is given by the attribute chain.shape. The concatenation of compatible chains is implemented via chain = chain1 @ chain2. Specifying it explicitly is simple:

>>> chain = Chain([(a1, b1, c1, d1), ..., (an, bn, cn, dn)])

The most standard chain is probably the so-called square-dyadic chain, which is behind usual fast implementations of the Hadamard transform and of DFTs. It is only defined when \(\mathtt{shape}=\left(N,~N\right)\) with \(N\) a power of two as:

>>> sd_chain = Chain.square_dyadic(shape)

Another common chain with arbitrary non-prime shape is Monarch chain with two factors.

>>> chain = Chain.monarch(shape, ...)

You can visualize the structure of all Kronecker-sparse factors in these chains as follows:

>>> sd_chain.plot()
Output file of ``sd_chain.plot("square_dyadic")``.
>>> chain.plot()
Output file of ``chain.plot("monarch")``.

and anticipate the memory footprint of the corresponding operators:

>>> sd_chain = Chain.square_dyadic((32, 32))
>>> sd_chain.mem(np.float32)
1280
>>> chain = Chain.monarch((32, 32))
>>> chain.mem(np.float32)
1536

As you can see, the memory footprint of a monarch chain is higher than that of a square-dyadic one, but when it fits in the memory of a GPUs it can lead to a faster implementation.

Chainability.

This above chains satisfy a “chainability” property, which is crucial to ensure that ksd() can be run, with approximation guarantees [1]. Chainability can be checked using the attribute chain.chainable, and ksd() will generate an error if the provided chain is not chainable.

References#

[1] Butterfly Factorization with Error Guarantees. Leon Zheng, Quoc-Tung Le, Elisa Riccietti, and Remi Gribonval https://hal.science/hal-04763712v1/document

Butterfly construction#

  1. lazylinop.butterfly.ksd()

  2. lazylinop.butterfly.ksm()

  3. lazylinop.butterfly.fuse()

  4. lazylinop.butterfly.dft()

  5. lazylinop.butterfly.fwht()

  6. lazylinop.butterfly.hadamard()

lazylinop.butterfly.ksd(A, chain, ortho=True, order='l2r', svd_backend=None, **kwargs)#

Returns a LazyLinOp corresponding to the (often called “butterfly”) factorization of A into Kronecker-sparse factors with sparsity patterns determined by a chainable instance chain of Chain.

L = ksd(...) returns a LazyLinOp corresponding to the factorization of A where L = ksm(...) @ ksm(...) @ ... @ ksm(...).

Note

L.ks_values contains the ks_values of the factorization (see ksm() for more details).

As an example, the DFT matrix is factorized as follows:

M = 16
# Build DFT matrix using lazylinop.signal.fft function.
from lazylinop.signal import fft
V = fft(M).toarray()
# Use square-dyadic decomposition.
sd_chain = Chain.square dyadic(V.shape)
# Multiply the DFT matrix with the bit-reversal permutation matrix.
from lazylinop.basicops import bitrev
P = bitrev(M)
L = ksd(V @ P.T, chain=sd_chain)
Args:
A: np.ndarray, cp.array, torch.Tensor or LazyLinOp

Matrix or LazyLinOp to factorize. when A is a LazyLinOp small it is often faster to perform ksd(A.toarray(), ...) than ksd(A, ...) if the dense array A.toarray() fits in memory.

chain: Chain

Chainable instance of the Chain class. See Chain documentation for more details.

ortho: bool, optional

Whether to use orthonormalisation or not during the algorithm, see [1] for more details. Default is True.

order: str, optional

Determines in which order partial factorizations are performed, see [1] for more details.

  • 'l2r' Left-to-right decomposition (default).

  • 'balanced'

svd_backend: str, optional

See documentation of linalg.svds() for more details.

Kwargs:

Additional arguments ksm_backend, ksm_params to pass to ksm() function. See ksm() for more details.

Returns:

L is a LazyLinOp that corresponds to the product of chain.n_patterns LazyLinOp Kronecker-sparse factors.

The namespace and device of the ks_values of all factors are determined as follows:

  • If A is an array (or aslazylinop(array)) then its namespace and device are used

  • otherwize, svd_backend determines the namespace and device

See also

References:

[1] Butterfly Factorization with Error Guarantees. Léon Zheng, Quoc-Tung Le, Elisa Riccietti, and Rémi Gribonval https://hal.science/hal-04763712v1/document

Examples:
>>> import numpy as np
>>> from lazylinop.butterfly import Chain, ksd
>>> from lazylinop.basicops import bitrev
>>> from lazylinop.signal import fft
>>> N = 256
>>> M = N
>>> V = fft(M).toarray()
>>> chain = Chain.square_dyadic(V.shape)
>>> # Use bit reversal permutations matrix.
>>> P = bitrev(N)
>>> approx = (ksd(V @ P.T, chain) @ P).toarray()
>>> error = np.linalg.norm(V - approx) / np.linalg.norm(V)
>>> np.allclose(error, 0.0)
True
lazylinop.butterfly.ksm(ks_values, params=None, backend='xp')#

Returns a LazyLinOp L for Kronecker Sparse Matrix Multiplication (KSMM see [1]). The sparsity pattern (or support) of a Kronecker-Sparse factor is defined as \(I_a\otimes 1_{b,c}\otimes I_d\) while its values are given by a 4D np.ndarray of shape (a, b, c, d).

The shape of L is \(\left(abd,~acd\right)\).

To fill a ks_values and its Kronecker-Sparse factor M:

M = np.zeros((a * b * d, a * c * d), dtype=np.float32)
ks_values = np.empty((a, b, c, d), dtype=M.dtype)
for i in range(a):
    for j in range(b):
        for k in range(c):
            for l in range(d):
                tmp = np.random.randn()
                ks_values[i, j, k, l] = tmp
                M[i * b * d + j * d + l,
                  i * c * d + k * d + l] = tmp

For a = 3, b = 3, c = 5, d = 5 we have the following pattern.

_images/abcd.svg

Note

You can access the ks_values of L = ksm(...) using L.ks_values.

With OpenCL and CUDA backend, L @ X will implicitly cast X to:

  • match the dtype of L.ks_values

  • be of contiguous type

This can incur a loss of performance, as-well-as a loss of precision if the dtype of X was initially of higher precision than that of L.ks_values.

Args:
ks_values: CuPy/NumPy arrays, torch tensors or list of arrays

It could be:

  • A 4D array of values of the Kronecker-Sparse factor.

  • List of values of the Kronecker-Sparse factors. The length of the list corresponds to the number of Kronecker-Sparse factors.

The dtype of each ks_values is either 'float16', 'float32', 'float64', 'complex64' or 'complex128'. See code above for details on the expected indexing of ks_values.

backend: optional

The available backends depend on the namespace (see array-api-compat for more details) of the ks_values. By default, use a namespace-based implementation.

  • For torch namespace:

    • backend='ksmm' to run the algorithm of [1]. It uses a CUDA device determined by ks_values and relies on cp.RawModule.

  • For cupy namespace:

    • backend='ksmm' to run the algorithm of [1]. It uses a CUDA device determined by ks_values (must be on GPU) and relies on cp.RawModule, __cuda_array_interface__, DLPack.

    • backend='cupyx' uses the cupyx.scipy.sparse.csr_matrix function.

  • For numpy namespace:

    • backend='scipy' uses the SciPy sparse functions scipy.sparse.block_diag and scipy.sparse.csr_matrix.

    • backend=(platform, device) to use OpenCL to run the algorithm of [1].

      • (None, None) uses the first platform and device.

      • (None, 'cpu') use the first platform and CPU device.

      • (None, 'gpu') use the first platform and GPU device.

      Please consider the following piece of code for advanced choices:

      import pyopencl as cl
      # Get your favorite platform.
      platform = cl.get_platforms()[pl_id]
      # To get your favorite CPU device.
      device = platform.get_devices(device_type=cl.device_type.CPU)[dev_id]
      # To get your favorite GPU device.
      device = platform.get_devices(device_type=cl.device_type.GPU)[dev_id]
      

      To check platforms and devices of your system you can also run the command line clinfo -a. See PyOpenCL documentation for more details.

    • backend=pycuda.driver.Device(id) uses a CUDA device to run the algorithm of [1]. See PyCUDA documentation for more details.

params: tuple or list of tuple, optional

Argument params only works for OpenCL and CUDA backends. It could be:

  • A tuple params of tuples where params[0] and params[1] expect a tuple of six elements (TILEX, TILEK, TILEY, TX, TY, VSIZE) (see [1] for more details). If (None, None) (default), the choice of hyper-parameters for multiplication L @ X and the multiplication L.H @ X is automatic. Because we did not run a fine-tuning for all the possible \(\left(a,~b,~c,~d\right)\) and \(\left(a,~c,~b,~d\right)\) tuples, automatic does not always correspond to the best choice.

  • List of tuple of length the number of factors. params[i][0] and params[i][1] expect a tuple of six elements (TILEX, TILEK, TILEY, TX, TY, VSIZE) (see [1] for more details). If None (default), the choice of hyper-parameters for multiplication L @ X and the multiplication L.H @ X is automatic. Because we did not run a fine-tuning for all the possible \(\left(a,~b,~c,~d\right)\) and \(\left(a,~c,~b,~d\right)\) tuples, automatic does not always correspond to the best choice.

List of assertions the tuple (TILEX, TILEK, TILEY, TX, TY, VSIZE) must satisfy:

  • TILEX = X * TX

  • TILEY = Y * TY

  • batch size % TILEX == 0 for performance reason. Consider zero-padding of the batch.

  • TILEX < batch size

  • TILEK <= c and c % TILEK == 0 for performance reason.

  • TILEX > TILEK and TILEY > TILEK

  • (VSIZE * X * Y) % TILEX == 0

  • TILEK % strideInput == 0

  • (VSIZE * X * Y) % TILEK == 0

  • TILEY % strideValues == 0

  • TILEY <= b

  • (b * d) % (d * TILEY) == 0

  • ks_values.dtype.itemsize * 2 * (TILEY * TILEK + TILEK * TILEX) < smem

where smem is the shared memory of the hardware used to compute, VSIZE ranges from \(1\) to \(4\), strideValues = VSIZE * X * Y / TILEK and strideInput = VSIZE * X * Y / TILEX.

Returns:

A LazyLinOp instance L for Kronecker Sparse Matrix Multiplication (KSMM). You can access the ks_values of L = ksm(...) using L.ks_values.

Examples:
>>> from lazylinop.butterfly.ksm import ksm
>>> import numpy as np
>>> a, b, c, d = 2, 4, 4, 2
>>> ks_values = np.full((a, b, c, d), 1.0, dtype=np.float32)
>>> L = ksm(ks_values)
>>> L.toarray(dtype='float')
array([[1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0., 1.]])
>>> # List of Kronecker-Sparse factors.
>>> a1, b1, c1, d1 = 2, 4, 3, 3
>>> ks_values1 = np.full((a1, b1, c1, d1), 1.0, dtype=np.float32)
>>> a2, b2, c2, d2 = 3, 3, 5, 2
>>> ks_values2 = np.full((a2, b2, c2, d2), 1.0, dtype=np.float32)
>>> L = ksm(ks_values1) @ ksm(ks_values2)
>>> M = ksm([ks_values1, ks_values2])
>>> np.allclose(L.toarray(dtype='float'), M.toarray(dtype='float'))
True

References:

[1] Fast inference with Kronecker-sparse matrices. Antoine Gonon and Léon Zheng and Pascal Carrivain and Quoc-Tung Le https://arxiv.org/abs/2405.15013

lazylinop.butterfly.fuse(t1, t2)#

Fuse two 4D arrays of ks_values t1, t2 of shape \(\left(a_1,~b_1,~c_1,~d_1\right)\) and \(\left(a_2,~b_2,~c_2,~d_2\right)\). The resulting 4D array of ks_values t is of shape \(\left(a_1,~\frac{b_1d_1}{d_2},~\frac{a_2c_2}{a_1},~d_2\right)\) and satisfies (ksm(t1) @ ksm(t2)).toarray() == ksm(t).toarray().

Args:
t1, t2: 4D arrays
  • First ks_values of shape \(\left(a_1,~b_1,~c_1,~d_1\right)\).

  • Second ks_values of shape \(\left(a_2,~b_2,~c_2,~d_2\right)\).

The dimensions must satisfy:

  • \(a_1c_1d_1=a_2b_2d_2\)

  • \(a_1\) divides \(a_2\)

  • \(d_2\) divides \(d_1\)

otherwize an Exception is returned.

Returns:

The resulting ks_values t is a 4D array of shape \(\left(a_1,~\frac{b_1d_1}{d_2},~\frac{a_2c_2}{a_1},~d_2\right)\).

See also

Examples:
>>> import numpy as np
>>> from lazylinop.butterfly import ksm, fuse
>>> a1, b1, c1, d1 = 2, 2, 2, 4
>>> a2, b2, c2, d2 = 4, 2, 2, 2
>>> v1 = np.random.randn(a1, b1, c1, d1)
>>> v2 = np.random.randn(a2, b2, c2, d2)
>>> v = fuse(v1, v2)
>>> v.shape
(2, 4, 4, 2)
>>> L = ksm(v)
>>> L1 = ksm(v1)
>>> L2 = ksm(v2)
>>> x = np.random.randn(L.shape[1])
>>> np.allclose(L @ x, L1 @ L2 @ x)
True
lazylinop.butterfly.dft(N, backend='numpy', dtype='complex64', device='cpu')#

Return a LazyLinOp L with the Butterfly structure corresponding to the Discrete-Fourier-Transform (DFT).

Shape of L is \(\left(N,~N\right)\) where \(N=2^n\) must be a power of two.

The number of factors \(n\) of the square-dyadic decomposition is given by \(n=\log_2\left(N\right)\)

Infer the namespace (see array-api-compat for more details) of L.ks_values from dtype and device arguments. By default, namespace of L.ks_values is numpy and dtype is 'complex64'.

Args:
N: int

Size of the DFT. \(N\) must be a power of two.

backend: str, tuple or pycuda.driver.Device, optional

See ksm() for more details.

dtype: str, optional

It could be either 'complex64' (default) or 'complex128'.

Returns:

LazyLinOp L corresponding to the DFT.

Examples:
>>> import numpy as np
>>> from lazylinop.butterfly import dft as bdft
>>> from lazylinop.signal import dft as sdft
>>> N = 2 ** 7
>>> x = np.random.randn(N).astype('float32')
>>> y = bdft(N) @ x
>>> z = sdft(N) @ x
>>> np.allclose(y, z)
True

References:

[1] Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations. Dao T, Gu A, Eichhorn M, Rudra A, Re C. Proc Mach Learn Res. 2019 Jun;97:1517-1527. PMID: 31777847; PMCID: PMC6879380.

lazylinop.butterfly.fwht(N, backend='numpy', dtype='float64', device='cpu')#

Return a LazyLinOp L with the Butterfly structure corresponding to the Fast-Walsh-Hadamard-Transform (FWHT).

Shape of L is \(\left(N,~N\right)\) where \(N=2^n\) must be a power of two.

L is orthogonal and symmetric and the inverse WHT operator is L.T = L.

The number of factors \(n\) of the square-dyadic decomposition is given by \(n=\log_2\left(N\right)\)

Infer the namespace (see array-api-compat for more details) of L.ks_values from dtype and device arguments. By default, namespace of L.ks_values is numpy and dtype is 'float64'.

Args:
N: int

Size of the FWHT. \(N\) must be a power of two.

backend: str, tuple or pycuda.driver.Device, optional

See ksm() for more details.

dtype: str, optional

The dtype of Hadamard matrix elements. The defaut value is ‘float64’.

device: optional

The device of Hadamard matrix elements. The default value is 'cpu'.

Returns:

LazyLinOp L corresponding to the FWHT. L is equivalent to hadamard(N, backend, dtype) / sqrt(N).

Examples:
>>> import numpy as np
>>> from lazylinop.butterfly import fwht as bfwht
>>> from lazylinop.signal import fwht as sfwht
>>> N = 2 ** 5
>>> x = np.random.randn(N).astype('float64')
>>> y = bfwht(N) @ x
>>> z = sfwht(N) @ x
>>> np.allclose(y, z)
True

See also

hadamard()

lazylinop.butterfly.hadamard(N, backend='numpy', dtype='float64', device='cpu')#

Return a LazyLinOp L with the Butterfly structure corresponding to the Hadamard matrix.

Shape of L is \(\left(N,~N\right)\) where \(N=2^n\) must be a power of two.

L is not orthogonal and its inverse is L.T / N. L is symmetric.

The number of factors \(n\) of the square-dyadic decomposition is given by \(n=\log_2\left(N\right)\).

Infer the namespace (see array-api-compat for more details) of L.ks_values from dtype and device arguments. By default, namespace of L.ks_values is numpy and dtype is 'float64'.

Args:
N: int

Size of the Hadamard matrix. \(N\) must be a power of two.

backend: str, tuple or pycuda.driver.Device, optional

See ksm() for more details.

dtype: str, optional

The dtype of Hadamard matrix elements. The defaut value is ‘float64’.

device: optional

The device of Hadamard matrix elements. The default value is 'cpu'.

Returns:

LazyLinOp L corresponding to the Hadamard matrix.

Examples:
>>> import numpy as np
>>> import scipy as sp
>>> from lazylinop.butterfly import hadamard
>>> N = 2 ** 5
>>> x = np.random.randn(N).astype('float64')
>>> y = hadamard(N) @ x
>>> z = sp.linalg.hadamard(N) @ x
>>> np.allclose(y, z)
True

See also

Chain class#

class lazylinop.butterfly.Chain(ks_patterns)#

A Chain instance is built by calling the constructor Chain(ks_patterns).

Args:
ks_patterns: list

List of tuples \(((a_l,~b_l,~c_l,~d_l))_{l=1}^n\) each being called a pattern. The tuples \(((a_l,~b_l,~c_l,~d_l))_{l=1}^n\) must satisfy \(a_lc_ld_l=a_{l+1}b_{l+1}d_{l+1}\).

Attributes:
ks_patterns: list

List of patterns \((a_i,~b_i,~c_i,~d_i)\).

n_patterns: int

Equal to len(ks_patterns).

shape: tuple

shape is equal to \((a_1b_1d_1,~a_nc_nd_n)\) with n = n_patterns.

chainable: bool

True if for each \(l\):

  • and \(a_l\) divides \(a_{l+1}\)

  • and \(d_{l+1}\) divides \(d_l\)

See [1] for more details.

Return:

chain with ks_patterns, n_patterns, shape and chainable attributes.

Examples:
>>> from lazylinop.butterfly import Chain
>>> chain = Chain([(2, 1, 1, 2), (2, 1, 1, 2)])
>>> chain.ks_patterns
[(2, 1, 1, 2), (2, 1, 1, 2)]
>>> # Concatenation of two chains.
>>> chain1 = Chain([(1, 2, 2, 4)])
>>> chain2 = Chain([(4, 2, 2, 1)])
>>> chain = chain1 @ chain2
>>> chain.shape
(8, 8)
>>> chain.n_patterns
2
>>> chain.ks_patterns
[(1, 2, 2, 4), (4, 2, 2, 1)]

References:

[1] Butterfly Factorization with Error Guarantees. Lu00E9on Zheng, Quoc-Tung Le, Elisa Riccietti, and Ru00E9mi Gribonval https://hal.science/hal-04763712v1/document

lazylinop.butterfly.Chain.square_dyadic(shape)#

Build a square-dyadic chain from shape.

shape must satisfy shape[0] = shape[1] = N with \(N=2^n\) a power of two. Number of ks_patterns is equal to \(n\). The l-th pattern is given by (2 ** (l - 1), 2, 2, shape[0] // 2 ** l) where 1 <= l <= n.

We can draw the square-dyadic decomposition for \(N=16\):

_images/square_dyadic.svg

using:

sq_chain = Chain.square_dyadic((16, 16))
sq_chain.plot()
Args:
shape: tuple

Shape of the input matrix must be \((N,~N)\) with \(N=2^n\).

lazylinop.butterfly.Chain.monarch(shape, p=None, q=None)#

Build a Monarch chain \(((1,~p,~q,~\frac{M}{p}),~(q,~\frac{M}{p},~\frac{N}{q},~1))\) from shape. \(p\) must divide \(M\) and \(q\) must divide \(N\). See [1] for more details.

Args:
shape: tuple

Shape of the input matrix \((M,~N)\). \(M\) and \(N\) must not be prime numbers.

p, q: int, optional

\(p\) must divide \(M\) and \(q\) must divide \(N\). If p (resp. q) is None, p (resp. q) is chosen such \(p\simeq\sqrt{m}\) (resp. \(q\simeq\sqrt{n}\)).

Examples:
>>> from lazylinop.butterfly.chain import Chain
>>> M, N = 21, 16
>>> chain = Chain.monarch((M, N))
>>> chain.ks_patterns
[(1, 3, 4, 7), (4, 7, 4, 1)]
>>> M, N = 12, 25
>>> chain = Chain.monarch((M, N))
>>> chain.ks_patterns
[(1, 4, 5, 3), (5, 3, 5, 1)]
>>> M, N = 12, 16
>>> chain = Chain.monarch((M, N))
>>> chain.ks_patterns
[(1, 4, 4, 3), (4, 3, 4, 1)]
>>> chain = Chain.monarch((M, N), p=2, q=4)
>>> chain.ks_patterns
[(1, 2, 4, 6), (4, 6, 4, 1)]

References:

[1] Monarch: Expressive structured matrices for efficient and accurate training. In International Conference on Machine Learning, pages 4690-4721. PMLR, 2022. T. Dao, B. Chen, N. S. Sohomi, A. D. Desai, M. Poli, J. Grogan, A. Liu, A. Rao, A. Rudra, and C. Ru00E9.

lazylinop.butterfly.Chain.wip_non_redundant(shape)#

This class method is still work-in-progress.

Build a non redundant chain from shape using Lemma 4.25 from [1]. The function uses a prime factorization \((q_l)_{l=1}^L\) of M = shape[0] and a prime factorization \((p_l)_{l=1}^L\) of N = shape[0] where \(L\) is the length of the prime factorization. If the two factorization do not have the same length, merge the smallest elements of the larger factorization. Raise an Exception if shape[0] > 2 and shape[1] > 2 are prime numbers.

Args:
shape: tuple

Shape of the input matrix, expect a tuple \((M,~N)\). M is equal to the number of rows \(a_1b_1d_1\) of the first factor while N is equal to the number of columns \(a_nc_nd_n\) of the last factor.

Examples:
>>> from lazylinop.butterfly import Chain
>>> sq_chain = Chain.square_dyadic((8, 8))
>>> sq_chain.ks_patterns
[(1, 2, 2, 4), (2, 2, 2, 2), (4, 2, 2, 1)]
>>> nr_chain = Chain.wip_non_redundant((8, 8))
>>> nr_chain.ks_patterns
[(1, 2, 2, 4), (2, 2, 2, 2), (4, 2, 2, 1)]

Chain methods#

lazylinop.butterfly.Chain.mem(self, dtype)#

Return the memory in bytes of the ks_values of dtype dtype to be created corresponding to self.

Args:
dtype: str

dtype of the ks_values.

Examples:
>>> import numpy as np
>>> from lazylinop.butterfly import Chain
>>> sd_chain = Chain.square_dyadic((32, 32))
>>> sd_chain.ks_patterns
[(1, 2, 2, 16), (2, 2, 2, 8), (4, 2, 2, 4), (8, 2, 2, 2), (16, 2, 2, 1)]
>>> sd_chain.mem(np.float32)
1280
>>> chain = Chain.monarch((32, 32))
>>> chain.ks_patterns
[(1, 4, 4, 8), (4, 8, 8, 1)]
>>> chain.mem(np.float32)
1536
lazylinop.butterfly.Chain.plot(self, name=None)#

Plot self.ks_patterns. The colors are randomly chosen. Matplotlib package must be installed.

Args:
name: str

Save the plot in both PNG file name + '.png' and SVG file name + '.svg'. Default value is None (it only draws the self.ks_patterns).

Examples:
>>> from lazylinop.butterfly import Chain
>>> sq_chain = Chain.square_dyadic((32, 32))
>>> sq_chain.plot("square_dyadic")
>>> chain = Chain.monarch((32, 32))
>>> chain.plot("monarch")

I/O (load/save)#

  1. lazylinop.butterfly.load()

  2. lazylinop.butterfly.save()

lazylinop.butterfly.load(name)#

Load the LazyLinOp L from file name.json. The file name.json has been created by save().

Args:
name: str

Name of the .json file where to load L.

Returns:

L is a LazyLinOp that corresponds to the product of n_patterns LazyLinOp each one returned by ksm(). If file does not exist, return None.

See also

Examples:
>>> import scipy as sp
>>> import numpy as np
>>> from lazylinop.butterfly import Chain, ksd, load, save
>>> H = sp.linalg.hadamard(8)
>>> x = np.random.randn(8)
>>> chain = Chain.square_dyadic(H.shape)
>>> A = ksd(H, chain)
>>> save(A, "hadamard_8x8")
>>> A_ = load("hadamard_8x8")
>>> y = A @ x
>>> y_ = A_ @ x
>>> np.allclose(y, y_)
True
lazylinop.butterfly.save(L, name)#

Save the instance L of LazyLinOp returned by L = ksm(...) or L = ksd(...) function. Save the result of the factorization in a json file name + '.json'.

Args:
L: LazyLinOp

The LazyLinOp L to save.

name: str

Name of the file.

See also

Examples:
>>> import scipy as sp
>>> import numpy as np
>>> from lazylinop.butterfly import Chain, ksd, load, save
>>> H = sp.linalg.hadamard(8)
>>> x = np.random.randn(8)
>>> chain = Chain.square_dyadic(H.shape)
>>> L = ksd(H, chain)
>>> save(L, "hadamard_8x8")
>>> L_ = load("hadamard_8x8")
>>> y = L @ x
>>> y_ = L_ @ x
>>> np.allclose(y, y_)
True

Plot#

  1. lazylinop.butterfly.plot()

lazylinop.butterfly.plot(L, name=None)#

Plot L.ks_values on a logarithmic scale. If ks_values is complex plot xp.sqrt(ks_values * xp.conj(ks_values)). Matplotlib package must be installed.

Args:
name: str

Save the plot in both PNG file name + '.png' and SVG file name + '.svg'. Default value is None (it only draws the L.ks_values).

Examples:
>>> from lazylinop.butterfly import dft
>>> from lazylinop.butterfly import plot
>>> N = 16
>>> L = dft(N)
>>> plot(L)