Threaded Mode | Linear Mode

pier4r · (This post was last modified: 01-18-2018 09:33 PM by pier4r.)

Some of you may have seen the numberphile video about the "sum of squares" problem.

The problem is simple.

One has the following sequence:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

The objective is to rearrange them in a way that every two adjacent numbers, if added, are equal to a square of an integer number. The problem is to use all the numbers.

Example 1,3,6,10,15 (it fails then)

Don't read below if you want to give it a try.

Code:

SPOILER:

I personally solved it when in the video they said "it is possible to solve it". 

How a sentence can change the attitude towards a little problem.

I first listed all the possible working couples, 

then I made a "clock" with the numbers and I started to connect them.

9,7,2,14,11,5,4,12,13,3,6,10,15,1,8

then in the video they said they tested all the sequences up to 299. From 25 to 299 they found a way and likely there will be always a way.

The point is, using only real calculators (surely someone already run some programs until one million on some pc/tablet/smartphone), could we break 299 ?

I guess the 50g, prime, dm42 have good chances to break 299 if programmed in a clever way, given enough time.

TheKaneB · 01-18-2018, 10:58 PM

That's a tricky one! I bet the real challenge would be to find an efficient algorithm that cuts down the number of trials. Trying every possible permutation of 300 numbers doesn't look like a clever way to find a good sequence quickly Big Grin

Didier Lachieze · 01-19-2018, 12:08 AM

I have a program that finds the solution for the sequence 1 to 15 in 2 seconds on my Prime, but then it takes 45 seconds for the sequence 1 to 25.
So it needs a significant optimization to reach 299.

Solution for 1 to 25 is:

Code:

spoiler

{2,23,13,12,24,25,11,14,22,3,1,8,17,19,6,10,15,21,4,5,20,16,9,7,18}

Paul Dale · 01-19-2018, 04:29 AM

If you follow the comments and links from the numberphile video, it looks like the maximum has been pushed up very significantly. Assuming the implementation was correct and addressed the problem in question etc.

Baring an inspired approach, the problem comes down to finding a Hamilton path in a graph. This is a NP complete problem.

Pauli

Thomas Okken · (This post was last modified: 01-19-2018 04:39 AM by Thomas Okken.)

(01-18-2018 09:32 PM)pier4r Wrote: then in the video they said they tested all the sequences up to 299. From 25 to 299 they found a way and likely there will be always a way.

It doesn't say so in so many words, but that made me think that there were no solutions below 25, and that is not the case:

n = 15:
8 1 15 10 6 3 13 12 4 5 11 14 2 7 9

n = 16:
8 1 15 10 6 3 13 12 4 5 11 14 2 7 9 16

n = 23:
2 23 13 12 4 21 15 10 6 19 17 8 1 3 22 14 11 5 20 16 9 7 18

I cheated and wrote a simple brute-force backtracking algorithm in C++:

Code:

#include <stdio.h>

#include <math.h>

#include <time.h>

int main(int argc, char *argv[]) {

    int n;

    if (argc != 2 || sscanf(argv[1], "%d", &n) != 1 || n < 2) {

        fprintf(stderr, "Usage: %s <n>\nwhere n >= 2\n", argv[0]);

        return 1;

    }

    time_t start_time = time(NULL);

    int *seq = new int[n + 1];

    bool *used = new bool[n + 1];

    for (int i = 1; i <= n; i++) {

        seq[i] = 0;

        used[i] = false;

    }

    int i = 1;

    long iter = 0;

    top_loop:

    iter++;

    int a = seq[i];

    used[a] = false;

    if (++a > n)

        goto fail;

    seq[i] = a;

    used[a] = true;

    seq[++i] = 0;

    inner_loop:

    a = seq[i];

    used[a] = false;

    int b;

    b = (int) ceil(sqrt(a + seq[i - 1] + 1));

    a = b * b - seq[i - 1];

    inner_loop_2:

    iter++;

    if (a > n) {

        if (--i == 1)

            goto top_loop;

        else

            goto inner_loop;

    }

    if (used[a]) {

        a += (b << 1) + 1;

        b++;

        goto inner_loop_2;

    }

    used[a] = true;

    seq[i] = a;

    if (i < n) {

        seq[++i] = 0;

        goto inner_loop;

    }

    printf("%d", seq[1]);

    for (i = 2; i <= n; i++)

        printf(" %d", seq[i]);

    printf("\nTime: %lds Iterations: %ld\n", time(NULL) - start_time, iter);

    return 0;

    fail:

    printf("No solution.\n");

    printf("Time: %lds Iterations: %ld\n", time(NULL) - start_time, iter);

    return 2;

}

In a nutshell: the algorithm would have to be a lot smarter to make a dent in this problem. It starts to get painfully slow in the 60s -- on a 1.3 GHz i5, not a speed demon by today's standards, but still, imagine how this would fare on a calculator. The case n = 60 took 6,619,547,574 iterations...

Paul Dale · 01-19-2018, 06:31 AM

Instead of a brute force backtracking algorithm, build a graph with the numbers as nodes and edges where they sum to a square. Then find a Hamilton walk with a backtracking search.

Pauli

Thomas Okken · 01-19-2018, 06:48 AM

(01-19-2018 06:31 AM)Paul Dale Wrote: Instead of a brute force backtracking algorithm, build a graph with the numbers as nodes and edges where they sum to a square. Then find a Hamilton walk with a backtracking search.

Why would that perform any better than my algorithm? For each number, I'm already finding potential successors in O(1) time, and I'm eliminating already-visited numbers with an array lookup. Using a graph *might* be more efficient by some constant, but it won't do anything for the overall time complexity.

Didier Lachieze · 01-19-2018, 12:15 PM

(01-19-2018 12:08 AM)Didier Lachieze Wrote: I have a program that finds the solution for the sequence 1 to 15 in 2 seconds on my Prime, but then it takes 45 seconds for the sequence 1 to 25.
So it needs a significant optimization to reach 299.

I have improved my program and now it takes 2.5 seconds for the sequence 1 to 25, however it's still far from handling 299. Here are some timings in seconds for different values from 25 to 52 on the Virtual Calculator:

pier4r · (This post was last modified: 01-19-2018 12:50 PM by pier4r.)

interesting, some are larger than others even with less nodes (or I read it wrongly)

Didier Lachieze · 01-19-2018, 12:47 PM

Yes, it depends how fast it finds a solution in exploring the possible ones. So even if there is a global trend of timing increase with the increase of the sequence lenght, some timings can be shorter than the previous one(s).

Didier Lachieze · 01-19-2018, 04:26 PM

A few more timings, up to 64:

Thomas Okken · 01-19-2018, 04:33 PM

(01-19-2018 04:26 PM)Didier Lachieze Wrote: A few more timings, up to 64:

Are those timings in seconds?

May we see your code?

Didier Lachieze · 01-19-2018, 04:56 PM

Yes, the timings are in seconds on the HP Prime Virtual Calculator, so it would take significantly more time on the physical Prime.

Here is my code (sorry, there is no comments), I'm using a recursive function to go through the different paths to find the solution. This may not be the most effective way of doing it, but it's quite simple and with recursion the backtracking is "built-in".

SoS(n) is the main function, that you call with the number of terms of the sequence.
rS(l) is the recursive function where most of the time is spent.
tSoS(n) is what I use to time the SoS function, it stores the result in L2.

To optimize the execution, at the beginning of SoS(n), I build a list of n sublists, each sublist being the list of numbers which added to sublist index give a square.

Code:

rS(l)

BEGIN

 LOCAL a,k,l1,l2,l3,p,s;

 s:=SIZE(l);

 a:=l(1);

 IF s==1 THEN RETURN {1,a}; END;

 l1:=l({2,s});

 l2:= INTERSECT(l1,L1(a));

 FOR k FROM 1 TO SIZE(l2) DO

  p:=POS(l1,l2(k));

  IF p>1 THEN l1:=CONCAT(l1({p,s-1}),l1({1,p-1})) END;

  l3:=rS(l1);

  IF l3(1) THEN

   l3(0):=a;

   RETURN l3;

  END;

 END;

 RETURN {0};

END;

EXPORT SoS(n)

BEGIN

 LOCAL j,k,l1,l2,l3,a;

 L1:={};

 FOR j FROM 1 TO n DO

  L1(j):=DIFFERENCE(MAKELIST(I*(FP(√(j+I))==0),I,1,n),0);

 END;

 l1:=MAKELIST(I,I,1,n);

 FOR j FROM 1 TO n DO

  l2:=rS(l1);

  IF l2(1) THEN

   RETURN l2({2,n+1});

  END;

  l1(0):=l1(1);

  l1:=l1({2,n+1});

 END;

 RETURN {0};

END;

EXPORT tSoS(n)

BEGIN

 LOCAL t:=TICKS;

 L2:=SoS(n);

 RETURN (TICKS-t)/1000;

END;

Some results (reversed from what my program returns to start with the lowest number):

Code:

57:

{1,3,6,10,15,21,28,36,45,55,9,16,20,29,35,46,54,27,37,44,56,8,17,19,30,34,2,47,5​3,11,5,4,32,49,51,13,12,52,48,33,31,50,14,22,42,39,25,24,40,41,23,26,38,43,57,7,​18}

58:

{1,3,6,10,15,21,28,36,45,55,9,16,20,5,4,32,49,51,13,12,37,44,56,8,41,40,24,57,43​,38,26,23,58,42,39,25,11,53,47,17,19,30,34,2,7,18,31,33,48,52,29,35,46,54,27,22,​14,50}

59:

{1,3,6,10,15,21,28,36,45,55,9,16,20,29,35,46,54,27,37,44,56,8,17,19,30,34,2,47,5​3,11,5,4,32,49,51,13,12,52,48,33,31,50,14,22,59,41,40,24,25,39,42,58,23,26,38,43​,57,7,18}

60:

{1,3,6,10,15,21,28,36,45,55,9,16,20,29,35,46,54,27,37,44,56,8,41,59,22,14,50,31,​33,48,52,12,13,51,49,32,17,19,30,34,2,47,53,11,5,4,60,40,24,25,39,42,58,23,26,38​,43,57,7,18}

61:

{1,3,6,10,15,21,28,36,45,55,9,16,20,29,35,46,54,27,37,44,56,8,17,19,30,34,2,47,5​3,11,5,59,41,40,24,25,39,61,60,4,32,49,51,13,12,52,48,33,31,50,14,22,42,58,23,26​,38,43,57,7,18}

62:

{1,3,6,10,15,21,28,36,45,55,9,16,20,29,35,46,54,27,37,44,56,8,17,19,30,34,47,53,​11,5,4,32,49,51,13,12,52,48,33,31,50,14,22,42,58,23,26,38,43,57,24,25,39,61,60,4​0,41,59,62,2,7,18}

63:

{1,3,6,10,15,21,28,36,45,55,9,16,20,29,35,46,54,27,37,44,56,8,17,19,30,34,47,53,​11,14,50,31,33,48,52,12,13,51,49,32,4,5,59,22,42,58,63,18,7,2,62,38,43,57,24,25,​39,61,60,40,41,23,26}

64:

{1,3,6,10,15,21,28,36,45,55,9,16,20,29,35,46,54,27,37,44,56,8,17,19,30,34,47,53,​11,5,59,62,2,7,18,63,58,42,22,14,50,31,33,48,52,12,13,51,49,32,4,60,61,39,25,24,​40,41,23,26,38,43,57,64}

Thomas Okken · (This post was last modified: 01-20-2018 05:44 PM by Thomas Okken.)

(01-19-2018 04:29 AM)Paul Dale Wrote: If you follow the comments and links from the numberphile video, it looks like the maximum has been pushed up very significantly. Assuming the implementation was correct and addressed the problem in question etc.

Baring an inspired approach, the problem comes down to finding a Hamilton path in a graph. This is a NP complete problem.

What's NP-complete is the problem of finding Hamilton paths in graphs in general. For certain specific classes of graphs, that doesn't necessarily apply, like graphs where no node connects to more than two edges. The lady in the video, who wrote the program that checked all cases up to 299, must have found something to exploit, because with the apparently roughly exponential way the CPU time in my algorithm increases, it wouldn't get to 299 in a million years, even if I parallellized the heck out of it and ran it on the GPU clusters at work. (I should let it run a bit longer than I have so far to see how the CPU usage evolves, though.)

ggauny@live.fr · 01-20-2018, 06:15 PM

Hi,

May I ask the link of the video ?

Thanks.

John Keith · 01-20-2018, 08:19 PM

(01-20-2018 05:43 PM)Thomas Okken Wrote: What's NP-complete is the problem of finding Hamilton paths in graphs in general. For certain specific classes of graphs, that doesn't necessarily apply, like graphs where no node connects to more than two edges. The lady in the video, who wrote the program that checked all cases up to 299, must have found something to exploit, because with the apparently roughly exponential way the CPU time in my algorithm increases, it wouldn't get to 299 in a million years, even if I parallellized the heck out of it and ran it on the GPU clusters at work. (I should let it run a bit longer than I have so far to see how the CPU usage evolves, though.)

A straightforward backtracking algorithm will eventually bog down no matter how much computational power you throw at it, that's just the nature of combinatorial problems. I would think that Simulated Annealing or one of its many variants would be more likely to succeed. The objective function would be simply the number of pair sums that are perfect squares.

John

Thomas Okken · 01-20-2018, 09:08 PM

(01-20-2018 06:15 PM)ggauny@live.fr Wrote: May I ask the link of the video ?

Part 1: http://youtu.be/G1m7goLCJDY
Part 2: http://youtu.be/7_ph5djCCnM

Paul Dale · 01-20-2018, 10:13 PM

(01-20-2018 05:43 PM)Thomas Okken Wrote: What's NP-complete is the problem of finding Hamilton paths in graphs in general. For certain specific classes of graphs, that doesn't necessarily apply, like graphs where no node connects to more than two edges. The lady in the video, who wrote the program that checked all cases up to 299, must have found something to exploit, because with the apparently roughly exponential way the CPU time in my algorithm increases, it wouldn't get to 299 in a million years, even if I parallellized the heck out of it and ran it on the GPU clusters at work. (I should let it run a bit longer than I have so far to see how the CPU usage evolves, though.)

I think there was mention of trying to link the new node directly into the previous graph. Starting the search from there feels like it would be a win. It kind of makes sense that much of the graph will often be unchanged with the addition of a new node. Still exponential time problems always become infeasible.

Pauli

Thomas Okken · 01-20-2018, 10:31 PM

(01-20-2018 10:13 PM)Paul Dale Wrote: I think there was mention of trying to link the new node directly into the previous graph. Starting the search from there feels like it would be a win. It kind of makes sense that much of the graph will often be unchanged with the addition of a new node. Still exponential time problems always become infeasible.

Or take it a step further and keep track of all the paths found so far, so that if extending the previous one doesn't work, you can try even earlier ones as well. I think something like that may be necessary (barring any deeper insights) because it strikes me in those videos how often two successive paths appear to be completely different. (Bonus question: how does the number of possible paths evolve with rising n?)

Thomas Okken · (This post was last modified: 01-21-2018 03:02 PM by Thomas Okken.)

I let my code run for over twelve hours, by which time it had gotten to 82.
Here's how the CPU time use progressed, in a log plot, N vs CPU time in seconds:

[attachment=5580]

It does indeed look exponential, as expected. I fed the data to Free42 STAT, removing the times of 0.000 seconds, and got an exponential fit with a correlation coefficient of 0.96 and an exponent of 0.281. Extrapolated run time for N=300: 4.78e22 years. Big Grin

The connectedness of the graph of pairs of numbers that sum to squares grows pretty slowly. When you add N to the graph for 1 through N-1, the number of edges added is floor(sqrt(2n-1)) - ceil(sqrt(n+1)) + 1, so, asymptotically, each successive N adds (sqrt(2)-1)*sqrt(N) edges. That's slow, but still fast enough that dumb backtracking isn't useful.