Re: HP42S MiniChallenge: Optimizing ! (final submission) Message #18 Posted by Egan Ford on 5 May 2007, 1:41 p.m., in response to message #1 by Valentin Albillo
Quote:
Enjoy ! :)
I did. This is my final submission for part 3 (speed) of this mini challenge. One of my objectives from the start was to NOT be 42S specific (personal challenge). As for, "Using reasonable, sound heuristics which can be rationalized in a few words is Ok". I'll let you be the judge.
Answer: 798,644 * 798,644 = 637,832,238,736
Runtimes:
Free42, 1.7 GHz Pentium M: <1 sec
Emu42, 1.7 GHz Pentium M: 23 sec
Physical 42S: 1h, 2min
Code suitable for import into Free42 or Emu42: http://sense.net/~egan/pp9.txt.raw.
I broke this problem down into two separate problems: iteration reduction and verification optimization.
Iteration reduction:
My initial brute force approach started at 999999 and counted down until a solution was found. I selected count down vs. count up because I was planning on using DSE (easier than ISE). This approach will make 201,355 attempts to find a solution. Heuristical methods will need to be used to reduce this.
Knowing that perfect squares must end in 0, 1, 4, 5, 6, or 9, I opted to count down from the largest 12 digit square (999999^2) and ignore any that didn't start with 9, 6, 5, 4, or 1. This will cut the number of iterations roughly in half. 89,333 iterations to be exact. Since N*N = the sum of the first N odd numbers, counting down is easy. Just find the last odd: 999999*2  999998*2. Then subtract 2 from the odd in a loop to get the difference to the next perfect square.
Of the perfect squares only a fraction will end in with a digit that matches the first. Is there a way to determine the next first/last matching digit perfect square? Yes, there is. Just sum 10 sequential odd numbers to get a result that ends in 0. Simply find the first perfect square with matching first/last digit by subtracting the odds, then to get 10 odds down, take the next odd, multiple by 10, subtract 90. This approach does have one gotcha, there are 2 series of first/last matching squares, and both have to be searched, e.g.:
999999^2 = 999998000001 <= Start at top. End in 9? No, then subtract odd difference.
999998^2 = 999996000004 <= End in 9? No, then subtract odd difference.
999997^2 = 999994000009 <= End in 9? Yes, 10*odd diff  90 .
999996^2 = 999992000016 
999995^2 = 999990000025 
999994^2 = 999988000036 
999993^2 = 999986000049 <= Opps, missed this one. 10 down. 
999992^2 = 999984000064  
999991^2 = 999982000081  
999990^2 = 999980000100  
999989^2 = 999978000121  
999988^2 = 999976000144  
999987^2 = 999974000169 <'
999986^2 = 999972000196 
999985^2 = 999970000225 
999984^2 = 999968000256 
999983^2 = 999966000289 <'
999982^2 = 999964000324
This happens because there is no guarantee that a downward count of the next odd will always end at 9. If ending at 7, 5, 3, or 1, then the next matching first/last digit will be lest than 10 odds away.
(9 + 7 + 5 + 3 + 1 + 9 + 7 + 5 + 3 + 1) mod 10 = 0
(7 + 5 + 3 + 1 + 9 + 7 + 5 + 3) mod 10 = 0
(5 + 3 + 1 + 9 + 7 + 5) mod 10 = 0
(3 + 1 + 9 + 7) mod 10 = 0
(1 + 9) mod 10 = 0
To get around this problem without too much work I just did two passes. Pass 1 starts with the highest first/last matching square, pass 2 starts with the 2nd largest. Each pass jumps 10 squares at a time. This reduces the number of iterations by ~1/5. 20,271 iterations to be exact.
Lastly I started looking at the 2nd digits. Without much thought you can prove that the 2nd to last digit of all perfect squares is even, unless the last digit is 6.
E.g. take squares ending in 9. Only 3*3 or 7*7 end in 9, so...
x + 3 x + 7
x + 3 x + 7
 
3x + 9 (7x+4) + 9
x*2 + 3x x*2 + (7x+0)
 
x*2 + 6x + 9 x*2 + (14x+4) + 9
2nd digit will be 6x, and even*anything is even. This is true for 14x+4 as well (note the even carry digit). You can do the rest of this exercise in your head (i.e. avoid doing all the theoretical work yourself with paper and pen), just look at the carry digit, e.g. to end in 9 you must have 3*3 or 7*7, the carry digit is even (i.e. 0 or 4), to end in 6 you must have 4*4 or 6*6, the carry digit is 1 or 3 ending with a even + odd for the 2nd digit. IANS, implementing this will reduce the number of iterations by ~1/2. 8,889 iterations to be exact.
3rd digit? Too much brain power was needed, further optimization will need to be done in the verification.
Verification optimization:
The first check is the most important check. Taking a hint from Paul Brogger I cut out the center, divided by 11 and checked for FP = 0:
1E5
×
IP
100
MOD
11
÷
FP
X!=0?
Other attempts that set up an efficient way to check the rest of the digits is a waste. The first digit check will happen at each iteration. The probability of a match is only 1:10 making the 2nd check take place ~1/10 of the iterations. Make the first check fast, make the rest source efficient.
More optimization can be done, but I made my ~1 hour time target.
Finally, I must state how awesome the 42S is. I did all my development using the VI editor and then using txt2raw.pl to convert to raw for import into Free42/Emu42. I was not looking forward to keying this into my 42S for a formal benchmark. After doing so, I must state that there is an attention to detail that I do not think I have seen with any other vertically oriented HP (48GX is close). The orientation of the menus and keys made for very accurate and quick work of entering this code. I applaud the 42S architects.
Source with comments:
00 { 279Byte Prgm }
Main loop. DSE from 9 to 1. If 8,7,3,2 skip
since perfect squares only end in 9,6,5,4,1,0.
If '6' then note the 2nd digits will be odd (start with 9
count down by 2). Evens start with 8 and count down by 2.
This is stored in register 08.
01>LBL "PP9"
02 9
03 STO 09
04>LBL 09
05 8
06 STO 08
07 RCL 09
08 X=Y?
09 GTO 10
10 7
11 RCL 09
12 X=Y?
13 GTO 10
14 3
15 RCL 09
16 X=Y?
17 GTO 10
18 2
19 RCL 09
20 X=Y?
21 GTO 10
22 6
23 RCL 09
24 X!=Y?
25 GTO 11
26 1
27 STO+ 08
28>LBL 11
29 XEQ A
30>LBL 10
31 DSE 09
32 GTO 09
33 STOP
This is the first called subroutine. It DSE counts from
5 to 1 creating a sequence for LBL B with the first 2
digits to start searching form, (e.g. 98, 96, 94, 92, 90,
69, 67, ..., 10). This will also store the mirror image
of the first to digits (REG 13). REG 04 (0 or 1) is used to
select the 2 different series of matching first/last
squares.
34>LBL A
35 5
36 STO 12
37>LBL 12
38 10
39 RCL× 09
40 RCL+ 08
41 STO 03
42 10
43 ÷
44 IP
45 RCL 03
46 10
47 MOD
48 10
49 ×
50 +
51 STO 13
52 0
53 STO 04
54 XEQ B
55 1
56 STO 04
57 XEQ B
58 2
59 STO 08
60 DSE 12
61 GTO 12
62 RTN
This is where all the work is done. Starting from
the largest two digit leading 12 digit number (e.g.
989999999999) find the first or 2nd (check REG 04)
square with matching reversed last 2 digits.
63>LBL B
64 RCL 03
65 1
66 +
67 10
68 10^X
69 ×
70 1
71 
72 SQRT
73 IP
74 X^2
75 STO 05
76 LASTX
77 1
78 
79 X^2
80 
81 STO 06
82>LBL 00
83 RCL 05
84 100
85 MOD
86 RCL 13
87 X=Y?
88 GTO 01
89 RCL 06
90 STO 05
91 2
92 STO 06
93 GTO 00
94>LBL 01
95 RCL 04
96 X=0?
97 GTO 02
98 0
99 STO 04
100 RCL 06
101 STO 05
102 2
103 STO 06
104 GTO 00
105>LBL 02
106 10
107 10^X
108 RCL× 03
109 STO 07
110 GTO 04
This is where ~90% of all the computation takes place.
The first iteration starts at line 119. Center digits
are checked, if no match GTO 03, 10*next odd  90 to get
next square, subtract 20 from next odd to get next odd.
If the two leading digits change, exit loop.
111>LBL 03
112 10
113 RCL× 06
114 90
115 
116 STO 05
117 20
118 STO 06
119>LBL 04
120 RCL 07
121 RCL 05
122 X<Y?
123 RTN
124 1E5
125 ×
126 IP
127 100
128 MOD
129 11
130 ÷
131 FP
132 X!=0?
133 GTO 03
Check last 4 pairs. If the centers match, start
from the outside and work in. Since squares are
deterministically calculated with matching first/last
digits there is no need to check the outside.
At most there are 4 checks.
134 4
135 STO 15
136>LBL 15
137 2
138 RCL× 15
139 1
140 
141 10^X
142 RCL 15
143 5
144 
145 10^X
146 RCL× 05
147 IP
148 2
149 RCL× 15
150 2
151 +
152 10^X
153 MOD
154 ×
155 LASTX
156 10
157 MOD
158 X<>Y
159 IP
160 X!=Y?
161 GTO 03
162 DSE 15
163 GTO 15
164 RCL 05
165 ENTER
166 SQRT
167 STOP
168 .END.
Edited: 5 May 2007, 2:11 p.m.
