05-09-2021, 08:48 PM
This thread is inspired by thread: Puzzle - RPL and others
We extended the search not just to decimal, but base n, and allowed diigts 1 .. n-1
One of the hot spot is the expensive (mod k) operation.
Updated code unroll loops to reduce expensive modulo.
(this cut down modulo to 1/6th, with reduce n domain, 0 < n ≤ 190)
d = ((((((d*n + X[i])*n + X[i+1])*n + X[i+2])*n + X[i+3])*n + X[i+4])*n + X[i+5]) % k
I just learned another way, by replacing modulo with 2 multiply and shift.
(Python infinite precision integer need masking M step; C code get this for free.)
It go straight for remainder, and not waste time computing quotient.
Below setup is for 0 ≤ x ≤ 2**16 = 65536
>>> M = 0xffffffff
>>> def mod17(x, c = M//17+1): return (x*c & M) * k >> 32
...
>>> print mod17(12345), 12345 % 17
3 3
>polydivisible 2
{ 1 }
>polydivisible 4
{ 1 2 3 }
{ 3 2 1 }
>polydivisible 6
{ 1 4 3 2 5 }
{ 5 4 3 2 1 }
>polydivisible 8
{ 3 2 5 4 1 6 7 }
{ 5 2 3 4 7 6 1 }
{ 5 6 7 4 3 2 1 }
>polydivisible 10
{ 3 8 1 6 5 4 7 2 9 }
>polydivisible 12
>polydivisible 14
{ 9 12 3 10 5 4 7 6 11 8 1 2 13 }
For n=60 (no solution): % take 114 sec., fastmod take 58 sec, 114/58 = 2.0X
Reference:
Faster remainders when the divisor is a constant: beating compilers and libdivide
And, for finer points, More fun with fast remainders when the divisor is a constant
Update: I messed up the code (now fixed)
I had forgotten fastmod required very wide bits to work.
For 64-bits machine, fastmod arguments is limited to 16-bits.
This prevented any unrolling (and still use 1 fastmod per loop)
2*(n-1)*n < 2^16 → Base n ≤ 181
We extended the search not just to decimal, but base n, and allowed diigts 1 .. n-1
One of the hot spot is the expensive (mod k) operation.
Updated code unroll loops to reduce expensive modulo.
(this cut down modulo to 1/6th, with reduce n domain, 0 < n ≤ 190)
d = ((((((d*n + X[i])*n + X[i+1])*n + X[i+2])*n + X[i+3])*n + X[i+4])*n + X[i+5]) % k
I just learned another way, by replacing modulo with 2 multiply and shift.
(Python infinite precision integer need masking M step; C code get this for free.)
It go straight for remainder, and not waste time computing quotient.
Below setup is for 0 ≤ x ≤ 2**16 = 65536
>>> M = 0xffffffff
>>> def mod17(x, c = M//17+1): return (x*c & M) * k >> 32
...
>>> print mod17(12345), 12345 % 17
3 3
Code:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
// note: with 32-bits c, x and k really limited to 16-bits
static uint32_t fastmod(uint32_t x, uint64_t k, const uint32_t c)
{
return (x * c * k) >> 32; // = x % k
}
uint32_t gcd(uint32_t a, uint32_t b)
{
for(;;) {
if (b == 0) return a;
a %= b;
if (a == 0) return b;
b %= a;
}
}
void printx(uint32_t *x, uint32_t n)
{
printf("{ ");
for(uint32_t i=1; i<n; i++) printf("%d ", x[i]);
printf("}\n");
}
void recurse(uint32_t *a, uint32_t n, uint32_t k, uint32_t *x)
{
if (k==n) return printx(x, k);
uint32_t c = a[2*n+k], d = 0, step = k+k;
for(uint32_t i=1; i < k; i++) {
d = fastmod((d + x[i])*n, k, c);
}
d = k - d; // valid (mod k)
if (k%2) { // odd k
if (d%2 == 0) d += k; // k = d = 1 (mod 2)
} else { // even k
if (k%4 == 0) step = k; // step = 0 (mod 4)
else if ((n+d)%4 == 0) d += k; // n+d+k = 0 (mod 4)
}
for(uint32_t bucket = a[n+k]; d < n; d += step) {
if (a[d] && a[n+d] == bucket) {
x[k] = d;
a[d] = 0; // mark taken
recurse(a, n, k+1, x);
a[d] = d; // restore
}
}
}
void puzzle(uint32_t n) { // polydivisible number, base n
if (n%2) return; // n must be even
uint32_t a[3*n], x[n], i;
for(i=1; i < n; i++) {
a[i] = i; // +1 to n-1 = duplicate check
a[n+i] = gcd(n,i); // n+1 to 2n-1 = gcd buckets
a[2*n+i] = (~0U)/i + 1; // 2n+1 to 3n-1 = fastmod factor
}
recurse(a, n, 1, x-1); // arrays all 1-based
}
int main(int argc, char **argv)
{
uint32_t n = 0;
if (argc > 1) n = atoi(argv[1]);
if (n-1 < 181) {puzzle(n); return 0;}
return printf("Usage %s n, 0 < n <= 181\n", argv[0]);
}
>polydivisible 2
{ 1 }
>polydivisible 4
{ 1 2 3 }
{ 3 2 1 }
>polydivisible 6
{ 1 4 3 2 5 }
{ 5 4 3 2 1 }
>polydivisible 8
{ 3 2 5 4 1 6 7 }
{ 5 2 3 4 7 6 1 }
{ 5 6 7 4 3 2 1 }
>polydivisible 10
{ 3 8 1 6 5 4 7 2 9 }
>polydivisible 12
>polydivisible 14
{ 9 12 3 10 5 4 7 6 11 8 1 2 13 }
For n=60 (no solution): % take 114 sec., fastmod take 58 sec, 114/58 = 2.0X
Reference:
Faster remainders when the divisor is a constant: beating compilers and libdivide
And, for finer points, More fun with fast remainders when the divisor is a constant
Update: I messed up the code (now fixed)
I had forgotten fastmod required very wide bits to work.
For 64-bits machine, fastmod arguments is limited to 16-bits.
This prevented any unrolling (and still use 1 fastmod per loop)
2*(n-1)*n < 2^16 → Base n ≤ 181