HP Forums

Full Version: permanent of square matrix
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Computation of the permanent of a matrix #7151
0-based numpy npperm(M) translated to 1-based HP Prime cas code

Code:
#cas
permanent(M) :=
BEGIN
 LOCAL n,d,j,s,f,v,p;
 n := dim(M)[j:=s:=1];
 d := makelist(2,1,n);
 f := range(1,n+1);
 v := map(sum, transpose(M));
 p := product(v);
 WHILE j < n DO
  v -= d[j]*M[j]
  d[j] := -d[j]
  s := -s
  p += s*product(v);
  f[1] := 1;
  f[j] := f[j+1];
  f[j+1] := j+1;
  j := f[1];
 END;
 RETURN p/2**(n-1)
END;
#end

Code does not generate sub-matrixes, very efficient!

> m := ranm(9,9)
Code:
[[ 86, -97, -82,   7, -27,  26, -89,  63, -49],
 [-86, -64, -30,  70,  22,  42,  56,  -9, -13],
 [ 46,  24,  49,  -6,   0,  81,  18, -49, -33],
 [-70,   8,  63, -64,   2,  62, -37, -80, -23],
 [ 65, -85,  28, -44, -22,  93,  91,  31, -21],
 [ 88,  76, -66,  66,   5, -23,  79, -88,   9],
 [  6, -69,  -8,  31,  89,   2,  97, -92,  80],
 [-24, -17,  41,  64, -49,   0,  89,  18, -51],
 [-42, -65,  22, -94,  76, -10,  16, -63,  39]]

> permanent(m) /* time = 0.008 sec */

−3500631868051794540

Lets try naive implementation

> per(m)            /* time = 95.68 sec */

−3500631868051794540
Translated above to HP71B BASIC
Code:
10 DESTROY ALL @ OPTION BASE 1 @ INPUT "N? ";N
20 DIM M(N,N),D(N),F(N),V(N) @ MAT INPUT M @ T0=TIME
30 MAT D=(2) @ MAT V=CSUM(M) @ P=1 @ S=1
40 FOR I=1 TO N @ F(I)=I @ P=P*V(I) @ NEXT I
50 WHILE F(1)<N @ I=F(1) @ T=1
60 FOR J=1 TO N @ V(J)=V(J)-D(I)*M(I,J) @ T=T*V(J) @ NEXT J
70 P=P-S*T @ S=-S @ D(I)=-D(I) @ F(1)=1 @ F(I)=F(I+1) @ F(I+1)=I+1
80 END WHILE
90 P=P/2^(N-1) @ DISP "P=";P,TIME-T0

OP 9×9 random matrix example
Code:
N? 9
M(1,1)?  86, -97, -82,   7, -27,  26, -89,  63, -49
M(2,1)? -86, -64, -30,  70,  22,  42,  56,  -9, -13
M(3,1)?  46,  24,  49,  -6,   0,  81,  18, -49, -33
M(4,1)? -70,   8,  63, -64,   2,  62, -37, -80, -23
M(5,1)?  65, -85,  28, -44, -22,  93,  91,  31, -21
M(6,1)?  88,  76, -66,  66,   5, -23,  79, -88,   9
M(7,1)?   6, -69,  -8,  31,  89,   2,  97, -92,  80
M(8,1)? -24, -17,  41,  64, -49,   0,  89,  18, -51
M(9,1)? -42, -65,  22, -94,  76, -10,  16, -63,  39
P=-3.50063186806E18   3.12

Emulator running at 40x --> Physical HP71B take 3.12*40 ≈ 125 s ≈ 2 min
(01-02-2024 10:41 PM)Albert Chan Wrote: [ -> ]d := makelist(2,1,n);

I am beginning to understand why d cells = ±2

±aij - (±2) * aij = ∓aij

Next time we process the same row, we want to flip it back, so ±2 --> ∓2

∓aij - (∓2) * aij = ±aij

> permanent([[a11,a12,a13],[a21,a22,a23],[a31,a32,a33]])
Code:
(
+( a11+a21+a31)*( a12+a22+a32)*( a13+a23+a33)
−( a11-a21+a31)*( a12-a22+a32)*( a13-a23+a33)
+(-a11-a21+a31)*(-a12-a22+a32)*(-a13-a23+a33)
−(-a11+a21+a31)*(-a12+a22+a32)*(-a13+a23+a33)
) / 4

First term = product of (v = column sums)
Other terms, v -= d[j]*M[j], flipped only sign of M j-th row.
This make it very efficient, instead of summing them all from scratch.

sign flips caused *all* same column products to cancel (nice!)
We are left with permutations with different columns (all with plus sign!)
4 terms generated too many copies, thus shrink by 2^(3-1) = 4 correction. (*)

> expand(Ans)
Code:
+a11*a22*a33
+a11*a23*a32
+a12*a21*a33
+a12*a23*a31
+a13*a21*a32
+a13*a22*a31

(*) Because correction is always pow-of-2. We may eliminate variable s of ±1

t1 - t2 + t3 - t4 = -(t4 - (t3 - (t2 - t1)))

Also, M last row sign never flipped, d only need (n-1) cells

permanent(M), version 2
Code:
#cas
permanent(M) :=
BEGIN
 LOCAL n,d,j,f,v,p;
 if (n:=len(M))<2 THEN RETURN M[1][1] END
 d := makelist(2,2,n);
 f := range(1,n+1);
 v := map(sum,transpose(M));
 p := product(v);
 WHILE (j:=f[1]) < n DO
  v -= d[j]*M[j]
  p := product(v) - p;
  d[j] := -d[j]
  f[1] := 1;
  f[j] := f[j+1];
  f[j+1] := j+1;
 END;
 RETURN -p/2**(n-1)
END;
#end
(01-04-2024 08:06 PM)Albert Chan Wrote: [ -> ]First term = product of (v = column sums)
Other terms, v -= d[j]*M[j], flipped only sign of M j-th row.
This make it very efficient, instead of summing them all from scratch.

Selection of what next j-th row is by idea of Gray Code
Quote:Gray Code is an alternative binary representation, cleverly devised so that,
between any two adjacent numbers, only one bit changes at a time.

Lets examine 5×5 matrix sign flip pattern, to confirm indeed it uses Gray Code.
Again, last row sign never flipped, thus only 2^(5-1) = 16 sign flip patterns.

I reversed bits, to match gray code pattern exactly.

lua> n = 5
lua> f = {1,2,3,4,5}
lua> b = {0,0,0,0}
lua> function bits(k) b[k]=1-b[k]; print(table.concat(b)) end
lua> while f[1]<n do j=f[1]; f[1]=1; f[j]=f[j+1]; f[j+1]=j+1; bits(n-j) end
0001
0011
0010
0110
0111
0101
0100
1100
1101
1111
1110
1010
1011
1001
1000
(01-04-2024 08:06 PM)Albert Chan Wrote: [ -> ]I am beginning to understand why d cells = ±2

...


permanent(M), version 2
Code:
#cas
permanent(M) :=
BEGIN
 LOCAL n,d,j,f,v,p;
 if (n:=len(M))<2 THEN RETURN M[1][1] END
 d := makelist(2,2,n);
 f := range(1,n+1);
 v := map(sum,transpose(M));
 p := product(v);
 WHILE (j:=f[1]) < n DO
  v -= d[j]*M[j]
  p := product(v) - p;
  d[j] := -d[j]
  f[1] := 1;
  f[j] := f[j+1];
  f[j+1] := j+1;
 END;
 RETURN -p/2**(n-1)
END;
#end

That is fantastic! Much faster than any other method that I have seen. Even better, it turns out that we can eliminate the arrays d and f entirely, making the program shorter and a bit faster.

First of all, we can notice that the number of iterations performed by the program is 2^(n-1)-1, so we can change the WHILE loop to a FOR loop and keep track of which iteration we are in.

What f[1], and thus j are actually computing is which bit is flipped to generate the next Gray code. This value is the 2-adic valuation of k, where k is the index of the current iteration. See A001511 and A007814.

Determining which values of d[j] are negative is a bit harder but once again the OEIS comes to the rescue. The value of d[j] is negative iff the odd part of k is of the form 4k+3. See A091067.

Both of these values can be found by testing the least significant bit of k and if 0, shifting right until the lsb is a 1. If binary numbers are inconvenient, we can divide by 2 and test the result MOD 2. The number of shifts required is the valuation, and the remaining value after shifting is the odd part. If the odd part MOD 4 = 3, d[j] is negative, else it is positive.

The following program is a rough translation of your Prime program implementing the changes described above. The program converts the matrix to a nested list to take advantage of list processing commands.

Code:

\<< DUP 2 *                            @ Multiply matrix by 2
  AXL DUP SIZE 1.                      @ Convert matrix to nested list
  IF >                                 @ Size > 1?
  THEN SWAP TRN AXL                    @ List of columns of original matrix
  :: \GSLIST DOSUBS                    @ Column sums
  DUP \PILIST UNROT                    @ Product of column sums
  DUP SIZE 2. OVER 1. - ^ \-> n w      @ n = size, w = 2^(n-1)
    \<< 1. w 1. -                      @ 2^(n-1) - 1 iterations
      FOR k 1. k                       @ Row index = 1
        WHILE DUP 2. MOD NOT           @ While even
        REPEAT 2. / SWAP 1. + SWAP     @ Divide by 2 and increment index
        END 4. PICK ROT GET            @ Get matrix row
        SWAP 4. MOD 3. SAME            @ Odd part MOD 4 = 3?
        :: ADD :: - IFTE               @ Add or subtract sums
        DUP \PILIST 4. ROLL - UNROT    @ Update product
      NEXT DROP2                       @ Drop lists
      NEG w R\->I /                    @ Return negative product / w
    \>>
  ELSE DROP 1. GET                     @ Else drop list, get single element
  END
\>>
(01-07-2024 07:47 PM)John Keith Wrote: [ -> ]That is fantastic! Much faster than any other method that I have seen.
Even better, it turns out that we can eliminate the arrays d and f entirely, making the program shorter and a bit faster.

Translated to Lua. Yes, code very compact!
Code:
function permanent(M)
    local p,n,v = 1,#M,{}
    if n==1 then return M[1][1] end
    for i = 1,n do 
        local t = 0
        for j = 1,n do t = t+M[j][i] end
        v[i], p = t, p*t
    end
    local terms = 2^(n-1)
    for k = 1,terms-1 do
        local j, t = bit.ffs(k), 1
        local d = bit.test(k,j) and -2 or 2
        for i = 1,n do v[i] = v[i]-d*M[j][i]; t = t*v[i] end
        p = t - p
    end
    return -p/terms
end

bit.ffs(x) internally call gcc __builtin_ffs(x)
int __builtin_ffs (int x) Wrote:Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.

Test with OP 9×9 random numbers

lua> m = {
:      { 86, -97, -82, 7, -27, 26, -89, 63, -49},
:      {-86, -64, -30, 70, 22, 42, 56, -9, -13},
:      { 46, 24, 49, -6, 0, 81, 18, -49, -33},
:      {-70, 8, 63, -64, 2, 62, -37, -80, -23},
:      { 65, -85, 28, -44, -22, 93, 91, 31, -21},
:      { 88, 76, -66, 66, 5, -23, 79, -88, 9},
:      { 6, -69, -8, 31, 89, 2, 97, -92, 80},
:      {-24, -17, 41, 64, -49, 0, 89, 18, -51},
:      {-42, -65, 22, -94, 76, -10, 16, -63, 39}}
lua>
lua> permanent(m) /* time = 0.00019 s */
-3500631868051796000
(01-04-2024 01:13 PM)Albert Chan Wrote: [ -> ]Translated above to HP71B BASIC
Code:
10 DESTROY ALL @ OPTION BASE 1 @ INPUT "N? ";N
20 DIM M(N,N),D(N),F(N),V(N) @ MAT INPUT M @ T0=TIME
30 MAT D=(2) @ MAT V=CSUM(M) @ P=1 @ S=1
40 FOR I=1 TO N @ F(I)=I @ P=P*V(I) @ NEXT I
50 WHILE F(1)<N @ I=F(1) @ T=1
60 FOR J=1 TO N @ V(J)=V(J)-D(I)*M(I,J) @ T=T*V(J) @ NEXT J
70 P=P-S*T @ S=-S @ D(I)=-D(I) @ F(1)=1 @ F(I)=F(I+1) @ F(I+1)=I+1
80 END WHILE
90 P=P/2^(N-1) @ DISP "P=";P,TIME-T0

This brings up a problem with this algorithm. The variable p is much larger than the final value, up to 3 digits larger for the sizes of matrices we are dealing with. This limits the size of the largest matrix we can use on the 71, 28 and 48 which are limited to 12-digit numbers. Valentin's program is not as fast, but it does not have this problem. Perhaps there exists a happy medium somewhere, but the math behind these algorithms is way above my head.
(01-09-2024 09:49 PM)John Keith Wrote: [ -> ]The variable p is much larger than the final value, up to 3 digits larger for the sizes of matrices we are dealing with. This limits the size of the largest matrix we can use on the 71, 28 and 48 which are limited to 12-digit numbers. Valentin's program is not as fast, but it does not have this problem. Perhaps there exists a happy medium somewhere, but the math behind these algorithms is way above my head.

Just do v/2, use dj=+/-1 again and double the result. If all vi are odd, this will not help ;-)

Cheers, Werner
(01-10-2024 08:32 AM)Werner Wrote: [ -> ]Just do v/2, use dj=+/-1 again and double the result. If all vi are odd, this will not help ;-)

Numerical issue is not 2^(n-1) scaling factor (even for OP 9×9 matrix, factor is only 2^8=256).
It is really catastrophic cancellation of p's during its iterations.

For illustration purpose, I scaled lua code final result of -p/terms to simply return p
For base 2, this scaling served no purpose, except reduce tiny chance of overflow.
For base 10, this scaling may make numerical issue worse (some vi may be odd)

Code:
function permanent(M)
    local p,n,v = -2,#M,{}
    if n==1 then return M[1][1] end
    for i = 1,n do
        local t = 0
        for j = 1,n do t = t+M[j][i] end
        v[i] = t/2
        p = p*v[i]
    end
    for k = 1,2^(n-1)-1 do
        local j, t = bit.ffs(k), -2
        local d = bit.test(k,j) and -1 or 1
        for i = 1,n do v[i] = v[i]-d*M[j][i]; t = t*v[i] end
        p = t - p
    end
    return p
end

Debug with print((-1)^k*p) (★) inside for k loop, tested with OP 9×9 random matrix.

lua> permanent(m)
-5556546744184800
20429002103295824
55255400982040850
31542151598286920
...
314811585639751230
-119381930932063740
-81181015636135740
-81169499916475780
-73135689828461120
-241381971380131900
248282227143724100
...
528128789659814300
524276224665997300
-2888266392290130400
-2753810694416846000
...
-3101014455289180000
-3584102548532368000
-3432517605155301000
-3500631868051796000      -- final p = permanent(m)

This is probably a bad example, with final p very close to true result. (error = 3 ULP)
But if true result is tiny, we could see why this algorithm have cancellation issues.

(★) (-1)^k factor to account for p = t - p sign reversal.
In other words, if rest of t's contributed nothing, this is the final p we get.
I noticed an inefficiency in the original Python program which has carried through our subsequent programs. This line, v -= 2*d[j]*M[j] results in an array (or list) multiplication in every iteration. If the matrix is multiplied by 2 (after computing the column sums), we can add the currently selected row to v if k MOD 4 = 3, else subtract it. I have updated my RPL program in post #5 to reflect this change, which results in a 33% speedup.
(01-24-2024 04:03 PM)John Keith Wrote: [ -> ]This line, v -= 2*d[j]*M[j] results in an array (or list) multiplication in every iteration.

I guess it depends on programming language used.

Python was using numpy for fast v list update, rewritten it without multiply does not gain much.
Note: 2*d[j] is just a number, not a list

For LuaJIT, the cost is smaller still (OP 9×9 matrix, speedup under 5%)
Array access cost a lot more than multiply. Bit operations cost even more.
(this is just relatively speaking ... LuaJIT is FAST!)

Below LuaJIT version is the one I kept.
Code:
function permanent(M)
    local p,n, d,f,v = 1,#M, {},{},{}
    if n==1 then return M[1][1] end
    for j = 1,n do
        d[j] = 2
        f[j] = j
        local t = 0
        for i = 1,n do t = t+M[i][j] end -- sum col
        v[j], p = t, p*t
    end
    while true do
        local i = f[1]
        if i >= n then return -p/2^(n-1) end
        local di, r, t = d[i], M[i], 1
        for j=1,n do v[j] = v[j]-di*r[j]; t = t*v[j] end
        p = t - p
        d[i] = -d[i]
        f[1] = 1
        f[i] = f[i+1]
        f[i+1] = i+1
    end
end

I gave up on the 5% speedup, for cleaner code.

Still, for OP 9×9 matrix, its speed is doubled the bitwise version. (post #9)
(01-25-2024 03:49 PM)Albert Chan Wrote: [ -> ]I guess it depends on programming language used.

...

Note: 2*d[j] is just a number, not a list

For LuaJIT, the cost is smaller still (OP 9×9 matrix, speedup under 5%)
Array access cost a lot more than multiply. Bit operations cost even more.
(this is just relatively speaking ... LuaJIT is FAST!)

2*d[j] is just a number, but it is multiplied by row j of the matrix, requiring n multiplies in every iteration for an n X n matrix.

Of course, the amount of improvement depends on the language and the hardware that the program is run on. Considering that RPL is an interpreted language running on an 80's era CPU (real or emulated) one would expect a greater degree of improvement than a compiled language running on a modern CPU. I have not tried the program on the Prime, but I expect that the degree of improvement would be somewhere in between.
(01-25-2024 03:49 PM)Albert Chan Wrote: [ -> ]Note: 2*d[j] is just a number, not a list

Yes, but d[j] is +/-1, and he performs an addition or subtraction accordingly, no list/vector multiply necessary.
In the 42S, I decided not to do that as multiplying the matrix by 2 inevitably duplicates it, and memory is relatively tight.

Werner
(01-26-2024 09:25 AM)Werner Wrote: [ -> ]Yes, but d[j] is +/-1, and he performs an addition or subtraction accordingly, no list/vector multiply necessary.
In the 42S, I decided not to do that as multiplying the matrix by 2 inevitably duplicates it, and memory is relatively tight.

Werner

Also worth noting that the column sums are computed from the un-multiplied matrix so p does not get any larger than it does in the original program.

I would think that computing the permanent on a 42S would be glacially slow, but should be fine on Free42 with greatly increased speed and 34-digit precision.
I've found an old (1978) FREE book, described how permanent algorithm was designed.

Combinatorial Algorithms for computers and calculators, 2nd edition, by Herbert S. Wilf, Albert Nijenhuis

Chapter 28, Computation of the Permanent Function Wrote:Observe that a direct application of (3) to an nxn matrix would require about n * n! operations
for the calculation of per(A). We reduce this labor in three steps.

(1) A method due to Ryser evaluates per(A) in about (n^2/2) * 2^n operations.
It requires an average of n^2/2 calculations for each of the 2^n subsets of {1,2,...,n}.
Ryser’s method is derived below and appears in Eqs. (17) and (18).

(2) We reduce the above by a factor of 2 by a method which makes it necessary to process
only the subsets of {1,2,...,n-1}. Thus in Eq. (24) we do about n^2/2 calculations for
each of the 2^(n-1) subsets of {1,2,... , n-1]}, or (n^2/4) * 2^n operations altogether.

(3) A further reduction by a factor of n/2 is accomplished by arranging the sequence of subsets
so as to follow a Hamilton walk on an (n-1)-cube (see Chapter 1), If this is done, only a slight
change in the calculation for a set S will yield the result of the calculation for the successor of S.

In this way, our final algorithm PERMAN does about n calculations for each of the 2^(n-1)
subsets of {1,2,...,n-1}, for a total of n * 2^(n-1) operations (multiplication and addition)
to compute the permanent of an nxn matrix.
How come the permanent of a matrix has been kept a virtual secret in matrix math (especialy numerical analysis) books? We all heard about the determiant and how it can tell if a matrix is invertible, but nothing about the permanent of a matrix.

Namir
I don't think that the permanent has been "kept secret", the determinant is just more useful and easier to compute.
(02-15-2024 06:14 PM)John Keith Wrote: [ -> ]I don't think that the permanent has been "kept secret", the determinant is just more useful and easier to compute.

None of my math teachers ever discussed the permanent of a matrix -- unlike the detreminant.

Namir
Reference URL's