Previous notebooks already addressed a part of the pyfaust's functionalities however pyfaust.fact
which is a central module was not covered. It is mainly dedicated to the algorithms that (approximately) factorize a dense matrix into a Faust object.
This is the subject covered (partly) in this notebook.
You will see how to launch the algorithms easily, how to feed them with your own complex set of parameters, how to run them for only few steps when possible, how to use a variant or another, how to enable debugging information, etc.
NOTE: the notebook is made to be executed sequentially, otherwise, skipping some cells, you would end up on an import error.
1. The Hierarchical PALM4MSA Algorithm
1.1 Generating a Hadamard Matrix
1.2 Factorizing a Hadamard Matrix
1.2.1 Defining the Constraints
1.2.2 Setting the Rest of the Parameters and Running the Algorithm
1.2.3 Using Projectors instead of ConstraintList-s
2. The GFT Algorithm
2.0 The pygsp Toolbox
2.1 The Truncated Jacobi Algorithm
%matplotlib inline
import matplotlib.pyplot as plt
from pyfaust import wht
from numpy.linalg import norm
from numpy.random import rand
# generate a Hadamard Faust of size 32x32
FH = wht(32, normed=False) # normed=False is to avoid column normalization
H = FH.toarray() # the dense matrix version
FH.imshow()
plt.show()
from numpy import count_nonzero
count_nonzero(H)
1024
All the factors are the same, let's print the first one.
plt.imshow(FH.factors(0).toarray())
plt.show()
You might want to verify if $H$ is a proper Hadamard matrix.
from numpy import matrix, array
import numpy as np
is_hadamard = True
for _H in [H, H.T]:
is_hadamard = is_hadamard and array([ (_H[i,:].dot(_H[j,:]) == 0).all() \
for i in range(0, _H.shape[0]-1) \
for j in range(i+1, _H.shape[0]) ]).all()
is_hadamard = is_hadamard and (np.where(abs(H)==1, H, 0) == H).all()
is_hadamard
True
The ugly code above basically verifies that all column or row vectors are mutually orthogonal and made only of -1, 0 and 1 coefficients, that is exactly the Hadamard matrix definition. The response is yes, so we can go ahead.
Let's begin by factorizing $H$ the easy way with the automatic parametrization provided by pyfaust.
from pyfaust import faust_fact
FH2 = faust_fact(H, 'hadamard')
print(FH2)
FH2.imshow(name='FH2')
Faust size 32x32, density 0.3125, nnz_sum 320, 5 factor(s): - FACTOR 0 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 1 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 2 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 3 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 4 (real) SPARSE, size 32x32, density 0.0625, nnz 64
Interestingly, the FH2's factors are not the same as FH's but the resulting numpy array is exactly the same. Just to be sure let's verify the relative error against H.
print('err=', (FH2-H).norm()/norm(H))
err= 1.3080505198934866e-15
Good! The argument 'hadamard'
of faust_fact
is a shorthand to a set of parameters fitting the factorization of Hadamard matrices.
Speaking about shorthand, faust_fact
is an alias of pyfaust.fact.hierarchical.
This function implements the hiearchical factorization based on PALM4MSA. For details about the theory behind these algorithms you can read Luc Le Magoarou's thesis manuscript.
We won't go into further details nevertheless it's important to see how to define the parameters manually in order to proceed to your own custom factorizations. For that purpose we need to describe approximately the algorithm but please keep in mind this is just an insight (for instance, the norms below are voluntarily not defined).
Basically, the hierarchical algorithm works like this:
The explanation above is eluding something essential: the sparsity constraints. Indeed, the purpose of the algorithm is not only to decompose a matrix but moreover to enhance its sparsity, as hinted by the FAµST acronym: Flexible Approximate Multi-layer Sparse Transform.
So you'll need to feed the algorithm with sparsity constraints. In fact, you'll define one pair of constraints per iteration, the first is for $S_{i}$ (the main factor) and the second for $R_i$ (the residual).
The pyfaust API is here to help you define the constraints in one shot but you can if you want define constraints one by one as we'll see later.
Let's unveil the factorization constraints to decompose the Hadamard matrix $H \in \mathbb R^{n \times n}$.
Let's go back to the code and define our constraints as lists (provided by the module pyfaust.factparams
):
from pyfaust.factparams import ConstraintList
n = int(H.shape[0])
S_constraints = ConstraintList('splincol', 2, n, n,
'splincol', 2, n, n,
'splincol', 2, n, n,
'splincol', 2, n, n)
R_constraints = ConstraintList('splincol', int(n/2), n, n,
'splincol', int(n/4), n, n,
'splincol', int(n/8), n, n,
'splincol', int(n/16), n, n)
The 'splincol'
constraint used here is for defining the maximum number of nonzeros elements to respect for any row or column of the considered matrix (here $S_i$ or $R_i$).
Looking at S_constraints
initialization, we see in the first line 'splincol', 2, n, n
; the value 2 means we target 2 nonzeros per-column and per-row, the next arguments define the size of the matrix to constrain (its number of rows and columns).
So in the example of S_constraints
all the targeted matrices have 2n nonzeros.
More generally in pyfaust, the constraints are defined in terms of norms, in our case of Hadamard factorization we used 0-norm (which is not actually a norm) as follows:
Enough of explanations! I let you decrypt why R_constraints
correspond likewise to (C2).
But wait a minute, what would happen if we had a set of one hundred constraints? Don't worry! The pyfaust API allows to alternatively set the constraints one by one:
from pyfaust.factparams import ConstraintInt
S_constraints = [ConstraintInt('splincol', n, n, 2) for i in range(0,4)]
R_constraints = [ConstraintInt('splincol', n, n, int(n/2**i)) for i in range(1,5)]
S_constraints
and R_constraints
are still the exact same set of constraints as before.
ConstraintInt is a family of integer constraints. More globally, there is a hierarchy of classes whose parent class is ConstraintGeneric where you'll find other kind of constraints ; ConstraintMat, ConstraintReal.
The table here can give you an idea of each constraint definition. Most constraints are associated to a proximal operator which projects the matrix into the set of matrices defined by the constraint or at least it tries to (because it's not necessarily possible).
In the latest version of FAµST it's possible to project
matrices from the wrappers themselves, let's try one ConstraintInt:
from pyfaust.factparams import ConstraintInt
A = rand(10,10)
A_ = ConstraintInt('sp', 10, 10, 18).project(A)
print("A nnz:", count_nonzero(A), "A_ nnz:", count_nonzero(A_))
A nnz: 100 A_ nnz: 18
We asked for a sparsity of 18 nonzeros and that's what we got after calling project
.
The function project
can help you debug a factorization or even understand exactly how a prox works for a specific constraint and matrix.
Another API doc link completes the definition of constraints : ConstraintName
If you find it complicated to parameterize your own set of constraints, well, you are not the only one! You may be happy to know that it is one of the priorities of the development team to provide a simplified API for the factorization algorithms in an upcoming release.
A simplification is already available with the pyfaust.proj
package for which you might read the following specific notebook: Using The FAµST Projectors API. An example of the same configuration of constraints but the projector way is introduced in 1.2.3.
OK, let's continue defining the algorithm parameters, the constraints are the most part of it. You have optional parameters too, but one key parameter is the stopping criterion of the PALM4MSA algorithm. You have to define two stopping criteria, one for the local optimization and another for the global optimization.
from pyfaust.factparams import StoppingCriterion
loc_stop = StoppingCriterion(num_its=30)
glob_stop = StoppingCriterion(num_its=30)
The type of stopping criterion used here is the number of iterations of PALM4MSA, that is the number of times all the current factors ($R_i$ included) are updated either for the local or the global optimization. So, with the initialization of loc_stop
and glob_stop
above, if you ask J factors (through the number of constraints you set) you'd count a total of $(30+30)*(J-1)$ iterations of PALM4MSA.
Ok, I think we're done with the algorithm parameters, we can pack them into one object and give it to faust_fact
.
from pyfaust.factparams import ParamsHierarchical
params = ParamsHierarchical(S_constraints, R_constraints, loc_stop, glob_stop,
is_update_way_R2L=True) # the argument order matters!
# launch the factorization
FH3 = faust_fact(H, params)
FH3
Faust size 32x32, density 0.3125, nnz_sum 320, 5 factor(s): - FACTOR 0 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 1 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 2 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 3 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 4 (real) SPARSE, size 32x32, density 0.0625, nnz 64
You might wonder what is the boolean is_update_way_R2L
. Its role is to define if the PALM4MSA algorithm will update the $S_i$ and last $R_i$ factors from the left to right (if False
) or toward the opposite direction (if True
). It does change the results of factorization!
I must mention that there are several other optional arguments you'd want to play with for configuring, just dive into the documentation.
You also might call the PALM4MSA algorithm directly (not the hierarchical algorithm which makes use of it), the function is here. You can theoretically reproduce the hierarchical algorithm step by step calling palm4msa()
by yourself.
Well, there would be a lot to say and show about these two pyfaust algorithms and their parameters but at least with this example, I hope, you got an insight of how it works.
Now that we know how to factorize a matrix, let's show in this short section how to use a more handy pyfaust API component than ConstraintList
to define a set of projectors (or proximity operators) instead of constraints. Indeed when a Constraint*
object is defined behind the scene a projector is used to compute the matrix image (with respect to a certain structure, e.g. the sparsity of the matrix). So why don't we directly define projectors? That is the thing, there is no reason not to do that, moreover it's simpler! So let's see how to define the Hadamard matrix factorization set of projectors and run again the algorithm.
from pyfaust.proj import *
S_projs = [ splincol(H.shape, 2) for i in range(0,4) ]
R_projs = [ splincol(H.shape, int(n/2**i)) for i in range(1,5) ]
params = ParamsHierarchical(S_projs, R_projs, loc_stop, glob_stop,
is_update_way_R2L=True) # the argument order matters!
# launch the factorization
FH4 = faust_fact(H, params)
FH4
Faust size 32x32, density 0.3125, nnz_sum 320, 5 factor(s): - FACTOR 0 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 1 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 2 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 3 (real) SPARSE, size 32x32, density 0.0625, nnz 64 - FACTOR 4 (real) SPARSE, size 32x32, density 0.0625, nnz 64
As you see the factorization gives exactly the same Faust
as the one before (with ConstraintList
instead of pyfaust.proj
list). It could seem to be just a syntactic detail here to use the pyfaust.proj
package instead of ConstraintList
but it is more than that, projectors like splincol
are functors and can be used easily to project a matrix.
Again, I can only advise you to read the dedicated notebook about projectors if you want to go further into details about projectors and discover the collection of the projectors available in the pyfaust API.
Since version 2.4, FAµST is able to compute Fast Graph Fourier Transforms. Indeed, FAµST includes the Truncated Jabcobi algorithm for this goal. It allows to diagonalize symmetric positive definite matrices.
It is implemented in C++ and a Python wrapper makes it available from pyfaust.
Here again the theory isn't what matters but feel free to take a look at the following papers:
Le Magoarou L., Gribonval R., Tremblay N., “Approximate fast graph Fourier transforms via multi-layer sparse approximation“,Transactions on Signal and Information Processing over Networks
Le Magoarou L. and Gribonval R., "Are there approximate Fast Fourier Transforms on graphs ?", ICASSP, 2016
Since we're going to work with graph Laplacians it would be a pity not to plot things. The pygsp toolbox provides what you need to easily obtain Laplacians of different graph families. Please install the package with pip (or maybe conda).
Let's plot the Minnesota graph (on the right) and its associated Laplacian matrix (on the left).
from pygsp.graphs import minnesota
import matplotlib.pyplot as plt
G = minnesota.Minnesota()
fig, axes = plt.subplots(1, 2)
_ = axes[0].spy(G.L, markersize=2)
_ = G.plot(ax=axes[1])
plt.show()
The goal of FGFT algorithms is to obtain an orthonormal Faust that nearly diagonalizes a symmetric matrix. In terms of graph signal processing, the considered symmetric matrix is typically a graph Laplacian $L$, and the Faust is expected to approximate the graph Fourier basis, i.e., the basis of eigenvectors of $L$. Given $L$, the algorithm outputs an orthonormal Faust $\hat U$ and a diagonal factor $\hat D$ such that: $L \approx \hat U \hat D \hat U^T$.
Let's run the truncated Jacobi algorithm (for eigen decomposition, hence the name eigtj
) on the Minnesota Laplacian example. You'll see it's really straightforward!
import matplotlib.pyplot as plt
from pyfaust.fact import eigtj
from numpy import diag, count_nonzero
L = G.L.toarray()
Dhat, Uhat = eigtj(L, nGivens=L.shape[0]*5)
Uhat.factors(range(0,min(10,Uhat.numfactors()))).imshow(name='Uhat0-9') # For the sake of readability
# only the first nine factors of Uhat
# are displayed, as you can see they are
# sparse (actually too sparse to see the nonzeros)
# and Dhat is a diagonal matrix.
Dhat = diag(Dhat)
_a = plt.matshow(Dhat)
title = _a.axes.set_title('Dhat\n')
plt.show()
Write some code to check the first factor of $\hat U $ is orthogonal and then that $\hat U$ itself is orthogonal (with a tolerance near to the machine espilon).
from numpy import eye
Uhat0 = Uhat.factors(0)
n = Uhat0.shape[0]
((Uhat0@Uhat0.T).toarray() == eye(n)).all()
True
((Uhat@Uhat.T)-eye(n)).norm()
1.7117025144376212e-14
Ignoring the fact that $\hat U$ is approximate and maybe some slight numerical error, yes it is orthogonal.
norm(Uhat@Dhat@Uhat.T-L)/norm(L)
0.15293578101374938
Good! The Laplacian approximation defined by $\hat U$ and $\hat D$ seems not so bad according to the relative error. Do you want a smaller error ? It's time to read the documentation, the eigtj()
's keyword argument nGivens
can help you in that goal.
Is that all? No, we can evaluate the memory saving $\hat U$ brings as a Faust relatively to the matrix U obtained by numpy.linalg.eig()
.
from numpy.linalg import eig
from numpy import diag, count_nonzero
D, U = eig(L)
D = diag(D)
print(norm(U@D@U.T-L)/norm(L))
0.03146820874663113
Uhat.nnz_sum()
51866
count_nonzero(U)
6980164
The memory space to store $\hat U$ is smaller since its number of nonzeros elements is less than that of $U$.
x = rand(n,1)
timeit Uhat@x
127 µs ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
timeit U@x
15.7 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The execution time for Faust-vector multiplication is really better too (compared to the matrix-vector multiplication) ! Applying the Graph Fourier Transform $\hat U$ on x is faster.
The third notebook is ending here, I hope you'll be interested to dig into pyfaust API and maybe even give some feedback later. The API has made a lot of progress lastly but it remains a work-in-progress and some bugs might appear here or there. Besides, the documentation and the API can always receive some enhancement so don't hesitate, any suggestion is welcome!
Note: this notebook has been executed using the following version of pyfaust.
import pyfaust
pyfaust.version()
'3.2.11'