Optimization questions

The Museum of HP Calculators

HP Forum Archive 21

Optimization questions - HP-15C
Message #1 Posted by Marcel Samek on 27 Oct 2012, 6:29 p.m.

About a year ago, as a personal challenge to myself, I decided to write a Sudoku solver for the HP-15C. It was quite a task and it took me several months to whittle it down to 34 registers of memory and 31 registers of code. When I did finally cram it into memory earlier this year and get it working, I felt as if I had achieved quite a victory. I thought I would document it, get some satisfaction from posting it to the Articles section and be done with it.

Unfortunately it was somewhat of a Pyrrhic victory. While the program will solve a Sudoku puzzle in a flash if I run it in the HP-15C emulator on my PC, it runs mind numbingly slowly on the actual calculator hardware (even on LE). It does work, but odds are that the calculator will run out of batteries before it has solved any given puzzle.

For a few months I did not touch it, burned out on the project. Now however, I though I would give it one more push to try to optimize the code to run at a more palatable speed.

In order to cram the logic into available memory, I had to resort to using a lot of subroutine calls as a form of common code elimination. A lot! I suspect that the root of the slowness is in the fact that the program is paying a large penalty for all those nested call.

The purpose of this post is to see if someone could give me some basic information on how subroutines are implemented on the hardware and what the performance costs are:

1) Is there a performance penalty for calling a label that is far away (in terms of number of instructions) versus a label that is close. (And if there is, what is it?) The documentation talks about searching for labels which makes it sounds as if there might be. The program is 200 steps and some of the subroutines can be quite far away from the caller.

2) Is there a performance penalty for calling a label that is backwards in the code?

3) Is there a performance penalty for nested calls? During the execution the call stack is almost always several calls deep.

3) Is there some trick for efficiently calling subroutines from tight loops?

Re: Optimization questions - HP-15C
Message #2 Posted by Luiz C. Vieira (Brazil) on 28 Oct 2012, 7:59 a.m.,
in response to message #1 by Marcel Samek

Hi, Marcel.
I once had some performance measurements made in an HP41CX by using its internal stopwatch and found some slightly differences while having backward and forward jumping, and also when they were single GTO's or subroutine calls (XEQ). As expected, the need of at least pushing the return address into a return stack demanded some machine cycles, even when the jump had already been compiled.
I have never measured the HP15C performance because I can only think of doing it by naked eye observation and an actual hand operated stopwatch, and I guess the new HP15C LE will make things even harder because it is too fast. Unless, of course, you can connect some tracking device (logic analyzer) and check results. That would be a precise way of doing it.
All I can think of is that the HP15C 'O.S.' seems to search for labels in a 'forward first' scheme, so if the label is backwards, the O.S. will always go ahead till the last existing program step then looping to the beginning of the program in order to search for the corresponding label. This way, based on what is written in the HP15C manual, backwards jumping will always consume more time than forward, even if the backward label is just a few steps back. So, questions #1), #2) and #4) sound coherent to me.
About question #3): again, I have no actual evidence to reason about, but if the stack memory has fixed size - and also based on the fact that the HP15C stores only 7 return addresses - then its addresses are possibly fixed as well, and if this is taken into account when pushing and popping return addresses in and out of it, I'd guess that there is no extra penalty (added after last edit:) for each nested subroutine call in this case.
But these are just conjectures of mine, cannot affirm any of these based on concrete evidences.
Sorry not helping the way you need, though. I guess that if you deal with photography, slightly changes in time measuring while capturing or revealing images are critical, right?
Cheers.
Luiz (Brazil)

Edited: 28 Oct 2012, 10:28 a.m.

Re: Optimization questions - HP-15C
Message #3 Posted by Marcel Samek on 28 Oct 2012, 12:11 p.m.,
in response to message #2 by Luiz C. Vieira (Brazil)

Hi Luiz,

The functions are called many times and so even minor changes in timing will make a big difference. Because the program is manipulating individual integers that are packed into the registers, it uses the modulus operator frequently; since the 15c does not have a modulus operator built in, that subroutine is one of my worst offenders.

I guess I will have to resort to writing some tests to see how much of a difference the distance of the jump makes. I was thinking of just writing a loop with a counter that calls a subroutine. With a stopwatch I can let it run for a minute and then interrupt it and check the counter to see how many it did in that time. While that won't give me the exact cost of a subroutine call, since the overhead of the loop is built into that time, it will allow me to compare the results of different distance jumps. I was hoping someone had already done something like that and kept the data.

Marcel

Re: Optimization questions - HP-15C
Message #4 Posted by Luiz C. Vieira (Brazil) on 28 Oct 2012, 10:23 a.m.,
in response to message #1 by Marcel Samek

In time: will you share the code? Please? 8^)
Luiz (Brazil)

Re: Optimization questions - HP-15C
Message #5 Posted by Marcel Samek on 28 Oct 2012, 12:31 p.m.,
in response to message #4 by Luiz C. Vieira (Brazil)

Sure. I haven't wanted to post it to the articles section because it is not good enough yet. Here is a link to a spreadsheet with the current code on SkyDrive:
http://sdrv.ms/UVrmXh
To actually use it:
1) Dimension your memory to use 34 registers
2) Manually stuff the rows of the input sudoku into registers 8-16. Each row is input with as one number, with 0 representing a blank.
3) GSB A to run the program
4) When the program finishes, the solution will be found in registers 17-25, which you will unfortunately have to manually use the indirect register to retrieve.
I know this is not particularly user friendly. I would like to squeeze the program down by a couple of registers in order to be able to put in an input and output loop. I have some ideas on how to do that but thought I would deal with the speed issue first.


Re: Optimization questions - HP-15C
Message #6 Posted by Marcus von Cube, Germany on 28 Oct 2012, 2:00 p.m.,
in response to message #5 by Marcel Samek

Why not use the matrix functionality to input and output the data?

Re: Optimization questions - HP-15C
Message #7 Posted by Marcel Samek on 28 Oct 2012, 6:41 p.m.,
in response to message #6 by Marcus von Cube, Germany

Yes, I will have to look into that. Whatever takes the least memory. Right now it has exactly one byte free, so some more memory needs to be freed up before I can even think about I/O.

Re: Optimization questions - HP-15C
Message #8 Posted by Thomas Klemm on 29 Oct 2012, 10:30 p.m.,
in response to message #5 by Marcel Samek

The XML-file below can be loaded into the nonpareil emulator so you don't have to type in the program. I was curious how long it would take your program to solve a simple example:

The program is still ^running.
I'm very much impressed by your effort!
Thomas

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE state SYSTEM "nonpareil.dtd"> <state version="1.00" model="15C" platform="voyager" arch="nut"> <ui/> <chip name="Nut"> <registers> <reg name="a" data="000000fffff000"/> <reg name="b" data="000000fffff000"/> <reg name="c" data="00000000000eae"/> <reg name="m" data="00000000000000"/> <reg name="n" data="00000000000000"/> <reg name="g" data="04"/> <reg name="p" data="c"/> <reg name="q" data="4"/> <reg name="q_sel" data="0"/> <reg name="fo" data="00"/> <reg name="s" data="0800"/> <reg name="pc" data="0000"/> <reg name="stack" index="0" data="006d"/> <reg name="stack" index="1" data="0042"/> <reg name="stack" index="2" data="0000"/> <reg name="stack" index="3" data="0000"/> <reg name="decimal" data="0"/> <reg name="carry" data="0"/> <reg name="awake" data="0"/> <reg name="pf_addr" data="00"/> <reg name="ram_addr" data="007"/> <reg name="active_bank" index="0" data="0"/> <reg name="active_bank" index="1" data="0"/> <reg name="active_bank" index="2" data="0"/> <reg name="active_bank" index="3" data="0"/> <reg name="active_bank" index="4" data="0"/> <reg name="active_bank" index="5" data="0"/> <reg name="active_bank" index="6" data="0"/> <reg name="active_bank" index="7" data="0"/> <reg name="active_bank" index="8" data="0"/> <reg name="active_bank" index="9" data="0"/> <reg name="active_bank" index="a" data="0"/> <reg name="active_bank" index="b" data="0"/> <reg name="active_bank" index="c" data="0"/> <reg name="active_bank" index="d" data="0"/> <reg name="active_bank" index="e" data="0"/> <reg name="active_bank" index="f" data="0"/> </registers> </chip> <chip name="Voyager LCD"> <registers> <reg name="enable" data="1"/> <reg name="blink" data="0"/> </registers> </chip> <memory as="ram"> <loc addr="000" data="00000000000000"/> <loc addr="001" data="00000000000000"/> <loc addr="002" data="00000000000000"/> <loc addr="003" data="00000000000000"/> <loc addr="004" data="000000fffff000"/> <loc addr="005" data="00000000000008"/> <loc addr="006" data="0000000000000c"/> <loc addr="007" data="00000000000eae"/> <loc addr="008" data="00000000000000"/> <loc addr="009" data="2faf8befbe2280"/> <loc addr="00a" data="00000000000000"/> <loc addr="010" data="00000000000000"/> <loc addr="011" data="00000000000000"/> <loc addr="012" data="00000000000000"/> <loc addr="013" data="00000000000000"/> <loc addr="014" data="00000000000000"/> <loc addr="015" data="c0d2d2d2d2d2d2"/> <loc addr="016" data="000000000006e1"/> <loc addr="017" data="00000000000000"/> <loc addr="018" data="00000000000000"/> <loc addr="019" data="00000000000000"/> <loc addr="01a" data="00000000a00000"/> <loc addr="0c0" data="00000000000000"/> <loc addr="0c1" data="00000000000000"/> <loc addr="0c2" data="00000000000000"/> <loc addr="0c3" data="00000000000000"/> <loc addr="0c4" data="00000000000000"/> <loc addr="0c5" data="00000000000000"/> <loc addr="0c6" data="00000000000000"/> <loc addr="0c7" data="00000000000000"/> <loc addr="0c8" data="00000000000000"/> <loc addr="0c9" data="00000000000000"/> <loc addr="0ca" data="00000000000000"/> <loc addr="0cb" data="00000000000000"/> <loc addr="0cc" data="00000000000000"/> <loc addr="0cd" data="00000000000000"/> <loc addr="0ce" data="00000000000000"/> <loc addr="0cf" data="00000000000000"/> <loc addr="0d0" data="00000000000000"/> <loc addr="0d1" data="00000000000000"/> <loc addr="0d2" data="00000000000000"/> <loc addr="0d3" data="00000000000000"/> <loc addr="0d4" data="00000000000000"/> <loc addr="0d5" data="00000000000000"/> <loc addr="0d6" data="00000000000000"/> <loc addr="0d7" data="00000000000000"/> <loc addr="0d8" data="00000000000000"/> <loc addr="0d9" data="00000000000000"/> <loc addr="0da" data="00000000000000"/> <loc addr="0db" data="00000000000000"/> <loc addr="0dc" data="00000000000000"/> <loc addr="0dd" data="00000000000000"/> <loc addr="0de" data="00000000000000"/> <loc addr="0df" data="00000000000000"/> <loc addr="0e0" data="00000000000000"/> <loc addr="0e1" data="00b225faf9c4c4"/> <loc addr="0e2" data="32ff7120f2ebfd"/> <loc addr="0e3" data="30210919ff43ff"/> <loc addr="0e4" data="2d33ff361c7126"/> <loc addr="0e5" data="c3f10c1e2d3619"/> <loc addr="0e6" data="ff52ff29342933"/> <loc addr="0e7" data="293242ff253627"/> <loc addr="0e8" data="faf11c7536f909"/> <loc addr="0e9" data="ff271e713726f1"/> <loc addr="0ea" data="b27531f0f80e41"/> <loc addr="0eb" data="c3f110ff7631f0"/> <loc addr="0ec" data="f82d7137273726"/> <loc addr="0ed" data="f100ff41c3f143"/> <loc addr="0ee" data="ff42ff0ab220f0"/> <loc addr="0ef" data="f1ebfd35862304"/> <loc addr="0f0" data="b24724f8324624"/> <loc addr="0f1" data="f7f13245ccfb33"/> <loc addr="0f2" data="f844fafcf3ebfd"/> <loc addr="0f3" data="f332ebfdf34320"/> <loc addr="0f4" data="f93142ebfdf931"/> <loc addr="0f5" data="81df06b296fafc"/> <loc addr="0f6" data="35fb30368623f7"/> <loc addr="0f7" data="f1324056ef07b2"/> <loc addr="0f8" data="25faf9c49623f6"/> <loc addr="0f9" data="f2c1c5fac353ff"/> <loc addr="0fa" data="30210bb22b342b"/> <loc addr="0fb" data="332b32250db280"/> <loc addr="0fc" data="cdc5f2fbf14005"/> <loc addr="0fd" data="b2c497fa03b286"/> <loc addr="0fe" data="23f6f2c1c101b2"/> <loc addr="0ff" data="b5fca3c5b1fd00"/> </memory> </state>

001 - 42,21, 0 LBL 0 002 - 10 � 003 - 43 36 LASTx 004 - 34 x<>y 005 - 42 44 FRAC 006 - 20 � 007 - 43 34 RND 008 - 43 32 RTN 009 - 42,21, 1 LBL 1 010 - 36 ENTER 011 - 36 ENTER 012 - 2 2 013 - 6 6 014 - 32 3 GSB 3 015 - 45 24 RCL (i) 016 - 43 32 RTN 017 - 42,21, 3 LBL 3 018 - 40 + 019 - 44 25 STO I 020 - 33 R-v 021 - 43 32 RTN 022 - 42,21, 5 LBL 5 023 - 44 0 STO 0 024 - 1 1 025 - 30 - 026 - 2 2 027 - 34 x<>y 028 - 14 y^x 029 - 42, 4, 0 x<> 0 030 - 43 32 RTN 031 - 42,21,14 LBL D 032 - 32 5 GSB 5 033 - 45 2 RCL 2 034 - 32 12 GSB B 035 - 45 3 RCL 3 036 - 32 12 GSB B 037 - 45 4 RCL 4 038 - 32 12 GSB B 039 - 43 32 RTN 040 - 42,21,12 LBL B 041 - 32 1 GSB 1 042 - 45 0 RCL 0 043 - 43, 6, 3 F? 3 044 - 16 CHS 045 - 40 + 046 - 34 x<>y 047 - 36 ENTER 048 - 2 2 049 - 6 6 050 - 32 3 GSB 3 051 - 44 24 STO (i) 052 - 33 R-v 053 - 9 9 054 - 40 + 055 - 32 5 GSB 5 056 - 43 32 RTN 057 - 42,21, 7 LBL 7 058 - 42, 4, 6 x<> 6 059 - 44 0 STO 0 060 - 45 2 RCL 2 061 - 1 1 062 - 7 7 063 - 32 3 GSB 3 064 - 45 24 RCL (i) 065 - 45 6 RCL 6 066 - 45 0 RCL 0 067 - 30 - 068 - 45 5 RCL 5 069 - 20 � 070 - 40 + 071 - 44 24 STO (i) 072 - 43 32 RTN 073 - 42,21, 6 LBL 6 074 - 44,40, 1 STO + 1 075 - 45 1 RCL 1 076 - 9 9 077 - 10 � 078 - 43 44 INT 079 - 44 2 STO 2 080 - 45 1 RCL 1 081 - 9 9 082 - 32 0 GSB 0 083 - 44 3 STO 3 084 - 3 3 085 - 10 � 086 - 43 44 INT 087 - 45 2 RCL 2 088 - 3 3 089 - 10 � 090 - 43 44 INT 091 - 3 3 092 - 20 � 093 - 40 + 094 - 44 4 STO 4 095 - 8 8 096 - 45 3 RCL 3 097 - 30 - 098 - 13 10^x 099 - 44 5 STO 5 100 - 45 2 RCL 2 101 - 1 1 102 - 7 7 103 - 32 4 GSB 4 104 - 44 6 STO 6 105 - 45 2 RCL 2 106 - 8 8 107 - 32 4 GSB 4 108 - 44 7 STO 7 109 - 43 32 RTN 110 - 42,21, 4 LBL 4 111 - 32 3 GSB 3 112 - 45 24 RCL (i) 113 - 45 5 RCL 5 114 - 10 � 115 - 43 44 INT 116 - 1 1 117 - 0 0 118 - 32 0 GSB 0 119 - 43 32 RTN 120 - 42,21,11 LBL A 121 - 43, 5, 2 CF 2 122 - 43, 5, 3 CF 3 123 - 1 1 124 - 16 CHS 125 - 44 1 STO 1 126 - 42,42,.0 LBL .0 127 - 1 1 128 - 32 6 GSB 6 129 - 45 7 RCL 7 130 - 32 7 GSB 7 131 - 45 7 RCL 7 132 - 43,30, 1 TEST 1 133 - 32 14 GSB D 134 - 8 8 135 - 0 0 136 - 45 1 RCL 1 137 - 43,30, 6 TEST 6 138 - 22 .0 GTO .0 139 - 1 1 140 - 16 CHS 141 - 44 1 STO 1 142 - 42,21,15 LBL E 143 - 8 8 144 - 0 0 145 - 45 1 RCL 1 146 - 43,30, 5 TEST 5 147 - 43 32 RTN 148 - 1 1 149 - 32 6 GSB 6 150 - 45 7 RCL 7 151 - 43,30, 1 TEST 1 152 - 22 15 GTO E 153 - 32 7 GSB 7 154 - 42,42,.9 LBL .9 155 - 9 9 156 - 45 6 RCL 6 157 - 43,30, 5 TEST 5 158 - 22 13 GTO C 159 - 1 1 160 - 40 + 161 - 32 7 GSB 7 162 - 45 6 RCL 6 163 - 32 5 GSB 5 164 - 43, 5, 2 CF 2 165 - 45 2 RCL 2 166 - 32 9 GSB 9 167 - 45 3 RCL 3 168 - 32 9 GSB 9 169 - 45 4 RCL 4 170 - 32 9 GSB 9 171 - 43, 6, 2 F? 2 172 - 22 .9 GTO .9 173 - 45 6 RCL 6 174 - 32 14 GSB D 175 - 22 15 GTO E 176 - 42,21,13 LBL C 177 - 1 1 178 - 16 CHS 179 - 32 6 GSB 6 180 - 43,30, 1 TEST 1 181 - 22 13 GTO C 182 - 45 6 RCL 6 183 - 43, 4, 3 SF 3 184 - 32 14 GSB D 185 - 43, 5, 3 CF 3 186 - 22 .9 GTO .9 187 - 42,21, 9 LBL 9 188 - 32 1 GSB 1 189 - 45 0 RCL 0 190 - 10 � 191 - 43 44 INT 192 - 2 2 193 - 32 0 GSB 0 194 - 43,30, 1 TEST 1 195 - 43, 4, 2 SF 2 196 - 33 R-v 197 - 33 R-v 198 - 9 9 199 - 40 + 200 - 32 5 GSB 5 201 - 43 32 RTN


Re: Optimization questions - HP-15C
Message #9 Posted by Marcel Samek on 29 Oct 2012, 11:55 p.m.,
in response to message #8 by Thomas Klemm

I have never used nonpareil. I tried the example below in the HP emulator (that came on CD with LE) and it took less than 1 second to solve.
One thing I happened to notice is that in the key codes that you posted below, the numbered labels that are above 9 show an odd key code: For example, "LBL .9" shows up as "42,42,.9" instead of "42,21,.9" - is that acceptable for nonpareil or could it be a bug that is impacting the execution of the code?


Re: Optimization questions - HP-15C
Message #10 Posted by Marcel Samek on 30 Oct 2012, 12:08 a.m.,
in response to message #9 by Marcel Samek

An easy way to make sure the program is working correctly is to input data of all 0. Thus, do a CLEAR REG and a GSB A.
The program will return the first solution it finds, which in this case will be:

123456789 456789123 789123456 214365897 365897214 897214365 531642978 642978531 978531642
It finds this solution very quickly compared to even a simple sudoku puzzle.
If the nonpareil version does not ever return a proper solution, let me know and I will give it a whirl. I have become pretty good at debugging this program :)


Re: Optimization questions - HP-15C
Message #11 Posted by Thomas Klemm on 30 Oct 2012, 5:49 a.m.,
in response to message #9 by Marcel Samek

Hi Marcel
You found a bug in the script that generates the listing. For further details, please consult the following article: Effective Computer-aided Calculator Programming - Part 1 - Voyager
The following patch can be applied to fix the script:
2812,2821c2812,2821 < ff00 | LBL .0 | - 42,42,.0 | < ff01 | LBL .1 | - 42,42,.1 | < ff02 | LBL .2 | - 42,42,.2 | < ff03 | LBL .3 | - 42,42,.3 | < ff04 | LBL .4 | - 42,42,.4 | < ff05 | LBL .5 | - 42,42,.5 | < ff06 | LBL .6 | - 42,42,.6 | < ff07 | LBL .7 | - 42,42,.7 | < ff08 | LBL .8 | - 42,42,.8 | < ff09 | LBL .9 | - 42,42,.9 | --- > ff00 | LBL .0 | - 42,21,.0 | > ff01 | LBL .1 | - 42,21,.1 | > ff02 | LBL .2 | - 42,21,.2 | > ff03 | LBL .3 | - 42,21,.3 | > ff04 | LBL .4 | - 42,21,.4 | > ff05 | LBL .5 | - 42,21,.5 | > ff06 | LBL .6 | - 42,21,.6 | > ff07 | LBL .7 | - 42,21,.7 | > ff08 | LBL .8 | - 42,21,.8 | > ff09 | LBL .9 | - 42,21,.9 |
As far as I know the HP emulator is based on Nonpareil so it might be that you can save the state oft the calculator to a file and retrieve it from there. Nonpareil uses a gziped XML-file but you can just use the plain XML-file as well. The script "vcomp" allows to compile programs for the voyagers to this format and to list them as well. And there are some other features like macros.

Meanwhile your program has finished and produced the correct result! Wow, I'm impressed! Congratulations!.
Cheers
Thomas

This is the listing of your sudoku-program after correcting the script:
001 - 42,21, 0 LBL 0 042 - 45 0 RCL 0 083 - 44 3 STO 3 124 - 16 CHS 165 - 45 2 RCL 2 002 - 10 / 043 - 43, 6, 3 F? 3 084 - 3 3 125 - 44 1 STO 1 166 - 32 9 GSB 9 003 - 43 36 LSTx 044 - 16 CHS 085 - 10 / 126 - 42,21,.0 LBL .0 167 - 45 3 RCL 3 004 - 34 x<>y 045 - 40 + 086 - 43 44 INT 127 - 1 1 168 - 32 9 GSB 9 005 - 42 44 FRAC 046 - 34 x<>y 087 - 45 2 RCL 2 128 - 32 6 GSB 6 169 - 45 4 RCL 4 006 - 20 x 047 - 36 ENTER 088 - 3 3 129 - 45 7 RCL 7 170 - 32 9 GSB 9 007 - 43 34 RND 048 - 2 2 089 - 10 / 130 - 32 7 GSB 7 171 - 43, 6, 2 F? 2 008 - 43 32 RTN 049 - 6 6 090 - 43 44 INT 131 - 45 7 RCL 7 172 - 22 .9 GTO .9 009 - 42,21, 1 LBL 1 050 - 32 3 GSB 3 091 - 3 3 132 - 43,30, 1 TEST 1 173 - 45 6 RCL 6 010 - 36 ENTER 051 - 44 24 STO (i) 092 - 20 x 133 - 32 14 GSB D 174 - 32 14 GSB D 011 - 36 ENTER 052 - 33 Rv 093 - 40 + 134 - 8 8 175 - 22 15 GTO E 012 - 2 2 053 - 9 9 094 - 44 4 STO 4 135 - 0 0 176 - 42,21,13 LBL C 013 - 6 6 054 - 40 + 095 - 8 8 136 - 45 1 RCL 1 177 - 1 1 014 - 32 3 GSB 3 055 - 32 5 GSB 5 096 - 45 3 RCL 3 137 - 43,30, 6 TEST 6 178 - 16 CHS 015 - 45 24 RCL (i) 056 - 43 32 RTN 097 - 30 - 138 - 22 .0 GTO .0 179 - 32 6 GSB 6 016 - 43 32 RTN 057 - 42,21, 7 LBL 7 098 - 13 10^x 139 - 1 1 180 - 43,30, 1 TEST 1 017 - 42,21, 3 LBL 3 058 - 42, 4, 6 x<>6 099 - 44 5 STO 5 140 - 16 CHS 181 - 22 13 GTO C 018 - 40 + 059 - 44 0 STO 0 100 - 45 2 RCL 2 141 - 44 1 STO 1 182 - 45 6 RCL 6 019 - 44 25 STO I 060 - 45 2 RCL 2 101 - 1 1 142 - 42,21,15 LBL E 183 - 43, 4, 3 SF 3 020 - 33 Rv 061 - 1 1 102 - 7 7 143 - 8 8 184 - 32 14 GSB D 021 - 43 32 RTN 062 - 7 7 103 - 32 4 GSB 4 144 - 0 0 185 - 43, 5, 3 CF 3 022 - 42,21, 5 LBL 5 063 - 32 3 GSB 3 104 - 44 6 STO 6 145 - 45 1 RCL 1 186 - 22 .9 GTO .9 023 - 44 0 STO 0 064 - 45 24 RCL (i) 105 - 45 2 RCL 2 146 - 43,30, 5 TEST 5 187 - 42,21, 9 LBL 9 024 - 1 1 065 - 45 6 RCL 6 106 - 8 8 147 - 43 32 RTN 188 - 32 1 GSB 1 025 - 30 - 066 - 45 0 RCL 0 107 - 32 4 GSB 4 148 - 1 1 189 - 45 0 RCL 0 026 - 2 2 067 - 30 - 108 - 44 7 STO 7 149 - 32 6 GSB 6 190 - 10 / 027 - 34 x<>y 068 - 45 5 RCL 5 109 - 43 32 RTN 150 - 45 7 RCL 7 191 - 43 44 INT 028 - 14 y^x 069 - 20 x 110 - 42,21, 4 LBL 4 151 - 43,30, 1 TEST 1 192 - 2 2 029 - 42, 4, 0 x<>0 070 - 40 + 111 - 32 3 GSB 3 152 - 22 15 GTO E 193 - 32 0 GSB 0 030 - 43 32 RTN 071 - 44 24 STO (i) 112 - 45 24 RCL (i) 153 - 32 7 GSB 7 194 - 43,30, 1 TEST 1 031 - 42,21,14 LBL D 072 - 43 32 RTN 113 - 45 5 RCL 5 154 - 42,21,.9 LBL .9 195 - 43, 4, 2 SF 2 032 - 32 5 GSB 5 073 - 42,21, 6 LBL 6 114 - 10 / 155 - 9 9 196 - 33 Rv 033 - 45 2 RCL 2 074 - 44,40, 1 STO+ 1 115 - 43 44 INT 156 - 45 6 RCL 6 197 - 33 Rv 034 - 32 12 GSB B 075 - 45 1 RCL 1 116 - 1 1 157 - 43,30, 5 TEST 5 198 - 9 9 035 - 45 3 RCL 3 076 - 9 9 117 - 0 0 158 - 22 13 GTO C 199 - 40 + 036 - 32 12 GSB B 077 - 10 / 118 - 32 0 GSB 0 159 - 1 1 200 - 32 5 GSB 5 037 - 45 4 RCL 4 078 - 43 44 INT 119 - 43 32 RTN 160 - 40 + 201 - 43 32 RTN 038 - 32 12 GSB B 079 - 44 2 STO 2 120 - 42,21,11 LBL A 161 - 32 7 GSB 7 039 - 43 32 RTN 080 - 45 1 RCL 1 121 - 43, 5, 2 CF 2 162 - 45 6 RCL 6 040 - 42,21,12 LBL B 081 - 9 9 122 - 43, 5, 3 CF 3 163 - 32 5 GSB 5 041 - 32 1 GSB 1 082 - 32 0 GSB 0 123 - 1 1 164 - 43, 5, 2 CF 2

Edited: 30 Oct 2012, 5:53 a.m.

Re: Optimization questions - HP-15C
Message #12 Posted by Egan Ford on 30 Oct 2012, 9:22 a.m.,
in response to message #11 by Thomas Klemm

Quote:
You found a bug in the script that generates the listing.
Fixed. Thanks for the bug report.

Re: Optimization questions - HP-15C
Message #13 Posted by Thomas Klemm on 30 Oct 2012, 1:28 p.m.,
in response to message #12 by Egan Ford

Thanks a lot for the quick fix!

Re: Optimization questions - HP-15C
Message #14 Posted by Marcel Samek on 30 Oct 2012, 12:50 p.m.,
in response to message #11 by Thomas Klemm

Hi Thomas,
Thanks for the reference to the article. I wish I had known about vcomp when I started on this project. It would have saved me some time.
I'm actually relatively new to the world of HP calculators. I had a 32sII for years but used it infrequently and never wrote any meaningful programs for it. It was only in the last year when on a whim I bought a 15c LE that I decided to take on a challenging program as a way of familiarizing myself with it.
Where can I download nonpareil for the windows with the 15c rom? I poked around google for a bit and could not find it.


Re: Optimization questions - HP-15C
Message #15 Posted by Thomas Klemm on 30 Oct 2012, 1:24 p.m.,
in response to message #14 by Marcel Samek

Quote:
Where can I download nonpareil for the windows with the 15c rom?
High-Fidelity Calculator Simulator Download page in the Wayback Machine
If the first direct link doesn't work you mighty search for "http://nonpareil.brouhaha.com/download" and go back to Jan. 16th 2007. You need revision 0.78 or earlier.
Have fun
Thomas

Edited: 30 Oct 2012, 1:26 p.m.

Re: Optimization questions - HP-15C
Message #16 Posted by Marcel Samek on 31 Oct 2012, 5:25 p.m.,
in response to message #15 by Thomas Klemm

I really wanted to try the nonpareil emulator. It seems like a cool workflow with the compiler.
I had no end of problems getting the windows version working (a lot of GTK issues) and I eventually gave up.
So I built the .78 code on a linux box and that went smoothly. It took me a while to figure out how to load a file, so in the end I replaced the 15c.nst file with the XML file you had posted. That seemed to do the trick and when I fired up nonpareil 15c, the program seemed to be in memory in tact.
I ran a simple test case, however, and it failed after several minutes.
During execution, nonpareil would occasionally print out "warning: stray READ RAM 0bf PF 00 reg 0". The numbers on the warning were always the same.
Since you were able to run a test case in non-pareil, I am assuming that something in my configuration is wrong.
Does anyone have any thoughts on what my issue might be? Any common pitfalls to bringing a build of nonpareil up and running?
p.s
The one piece of information I left out is that I originally did an scons and "scons install" on the newest version of nonpareil. Only then did I notice that it did not come with 15c ROM. I then got version .7 and did a scons and scons install with it. Is there any chance that having had a newer version installed previously is somehow installing the version .78?


Re: Optimization questions - HP-15C
Message #17 Posted by Thomas Klemm on 31 Oct 2012, 9:24 p.m.,
in response to message #16 by Marcel Samek

Quote:
It took me a while to figure out how to load a file
File -> Open

Quote:
warning: stray READ RAM 0bf PF 00 reg 0
I get that warning as well. I didn't care too much. It might be that we initialize something wrong in 'vcomp'. Does it happen only after loading a compiled nst-file?

Quote:
I am assuming that something in my configuration is wrong.
Probably not. Or at least I'm having the same wrong configuration.
I think you can just copy the 15c ROM and the other corresponding files (kml, image) from release 0.78 to the newest version of nonpareil. Just in case you want to start from scratch again.
Cheers
Thomas


Re: Optimization questions - HP-15C
Message #18 Posted by Marcel Samek on 31 Oct 2012, 9:28 p.m.,
in response to message #17 by Thomas Klemm

I honestly don't see a file menu anywhere.....

Re: Optimization questions - HP-15C
Message #19 Posted by Thomas Klemm on 31 Oct 2012, 11:41 p.m.,
in response to message #18 by Marcel Samek

That's strange.


Re: Optimization questions - HP-15C
Message #20 Posted by Andr�s C. Rodr�guez (Argentina) on 1 Nov 2012, 8:15 a.m.,
in response to message #19 by Thomas Klemm

If the low "r" character is what puzzles you, some time ago it was mentionesd here as sort of a signature from Nonpareil. It has no further effects. HTH

Re: Optimization questions - HP-15C
Message #21 Posted by Thomas Klemm on 1 Nov 2012, 8:54 a.m.,
in response to message #20 by Andr�s C. Rodr�guez (Argentina)

No, the strange thing is that Marcel's emulator seems to lack the File-menu.
Edited: 1 Nov 2012, 8:55 a.m.

Re: Optimization questions - HP-15C
Message #22 Posted by Marcel Samek on 1 Nov 2012, 12:25 p.m.,
in response to message #21 by Thomas Klemm

I figured it out. As I had mentioned, I could not get things working under windows and so I installed it under Linux.
Although I am an old unix hack, I almost never use the Linux desktop. I was using the latest Ubuntu distribution and the desktop not only puts the application's menu to a global position at the top of the screen (instead of at each window), but it also auto-hides it so that it does not show up unless you mouse over it. It was this latter behavior that was confusing me.
As soon as I found some time to try to do something else on the system besides just run nonpareil, I came to the realization that all applications were missing their menu and I eventually figured out what was going on.
So.... sorry - my ignorance.

Re: Optimization questions - HP-15C
Message #23 Posted by Marcel Samek on 1 Nov 2012, 1:32 p.m.,
in response to message #17 by Thomas Klemm

Ok. I finally worked out the issue
When I had loaded your file into nonpareil, I had assumed that it would take care of setting the memory partition, and I never bothered to check it.
Once I manually did the DIM 34, everything started working and I am ready to start focusing on the program and everyone suggestions.
I really wanted to get nonpareil working because it seems suitable for performing timing tests. For development I having been using the HP15C emulator that came from HP with the LE. It is great for testing if things work because it is blindingly fast. Test cases that take hours on the hardware will take less than 1 second in that emulator. Unfortunately, for the same reason, it is is not practical for running timing tests.

Re: Optimization questions - HP-15C
Message #24 Posted by uhmgawa on 1 Nov 2012, 5:35 p.m.,
in response to message #16 by Marcel Samek

Quote:
During execution, nonpareil would occasionally print out "warning: stray READ RAM 0bf PF 00 reg 0". The numbers on the warning were always the same.
There are quirks in at least the 15c firmware where it will perform read access of address 0xbf. When I discovered this developing KEMU, I found the reads to be innocuous and IIRC occur in more than one routine in the firmware.
My dusty notes indicate it is possible to force an access to 0xbf manually via the key sequence:

ON ENTER Rv (roll down)


Re: Optimization questions - HP-15C
Message #25 Posted by Eric Smith on 30 Oct 2012, 4:32 p.m.,
in response to message #11 by Thomas Klemm

Quote:
As far as I know the HP emulator is based on Nonpareil
To the best of my knowledge the HP emulator is not based on any code from Nonpareil, though I did provide Cyrille with some minor assistance a few times as the new 12C and 15C LE were in development.
I have no idea what HP's state file format is, but it's probably not compatible with Nonpareil.

Re: Optimization questions - HP-15C
Message #26 Posted by Kiyoshi Akima on 30 Oct 2012, 10:38 p.m.,
in response to message #11 by Thomas Klemm

I haven't taken the time to take a good look at the program, but just by eyeballing I came across the following:
At steps 55, 118, and 200, there's a GSB immediately followed by a RTN. They can be replaced by a GTO.
Unless they're needed for the stack lift, the RCL 0 ; - at 66/67 can be replaced by RCL - 0, the RCL 5 ; x at 68/69 by RCL x 5, the RCL 3 ; - at 96/97 by RCL - 3, the RCL 5 ; / at 113/114 by RCL / 5, the RCL 0 ; / at 189/190 by RCL / 0.
Probably won't make any appreciable difference, but every byte and every step may help.

Re: Optimization questions - HP-15C
Message #27 Posted by Thomas Klemm on 30 Oct 2012, 11:28 p.m.,
in response to message #26 by Kiyoshi Akima

Quote:
At steps 55, 118, and 200, there's a GSB immediately followed by a RTN. They can be replaced by a GTO.
For the same reason lines 038 and 039 can be removed. First they can be replaced by a GTO B which isn't used since the next step is LBL B.
My guess is a lot of performance could be gained by simply rearinging the code to avoid backward-jumps.
Kind regards
Thomas

Re: Optimization questions - HP-15C
Message #28 Posted by Marcel Samek on 30 Oct 2012, 11:44 p.m.,
in response to message #26 by Kiyoshi Akima

Thanks very much for taking the time to look at the code!

Quote:
At steps 55, 118, and 200, there's a GSB immediately followed by a RTN. They can be replaced by a GTO.
The C++ developer in me had not thought of that (and cringes at the thought). However, that's a great suggestion.

Quote:
Unless they're needed for the stack lift, the RCL 0 ; - at 66/67 can be replaced by RCL - 0, the RCL 5 ; x at 68/69 by RCL x 5, the RCL 3 ; - at 96/97 by RCL - 3, the RCL 5 ; / at 113/114 by RCL / 5, the RCL 0 ; / at 189/190 by RCL / 0.
Unfortunately, I believe that the suggested replacements are all two byte instructions, so the memory usage would remain the same.

Re: Optimization questions - HP-15C
Message #29 Posted by Thomas Klemm on 31 Oct 2012, 12:15 a.m.,
in response to message #28 by Marcel Samek

Quote:
the memory usage would remain the same
But it's probably still faster since it's only one instruction. You might run a benchmark test to find out.

Re: Optimization questions - HP-15C
Message #30 Posted by Marcel Samek on 31 Oct 2012, 5:27 p.m.,
in response to message #29 by Thomas Klemm

You are absolutely correct. Based on Namir's performance document it seems that stack lift/drop is also expensive and reducing it should improve performance.

Re: Optimization questions - HP-15C
Message #31 Posted by Thomas Klemm on 31 Oct 2012, 1:48 a.m.,
in response to message #26 by Kiyoshi Akima

This is the call-chain of your program:

I tried to rearrange your subroutines to avoid backward-calls. In addition to the other suggestions I've renamed LBL .0 to LBL 2 and LBL .9 to LBL 8. These use only 1 byte instead of 2.
If the return value of sub-routine 7 isn't used you could replace
078 RCL+ (i) 079 STO (i)
by a simple statement
078 STO+ (i)
And then I suggest to rename LBL {B, D} <-> {2, 8}. So all the alphanumeric labels are used for loops. But that's for aesthetic reasons only.
Just keep in mind: I made these changes without understanding a single line of your code.
Cheers
Thomas
PS: The program is currently running with previous clearing the registers.

001 - 42,21,11 LBL A 039 - 22 13 GTO C 077 - 45,20, 5 RCLx 5 115 - 9 9 153 - 1 1 002 - 43, 5, 2 CF 2 040 - 1 1 078 - 45,40,24 RCL+ (i) 116 - 40 + 154 - 7 7 003 - 43, 5, 3 CF 3 041 - 40 + 079 - 44 24 STO (i) 117 - 42,21, 5 LBL 5 155 - 32 4 GSB 4 004 - 1 1 042 - 32 7 GSB 7 080 - 43 32 RTN 118 - 44 0 STO 0 156 - 44 6 STO 6 005 - 16 CHS 043 - 45 6 RCL 6 081 - 42,21,14 LBL D 119 - 1 1 157 - 45 2 RCL 2 006 - 44 1 STO 1 044 - 32 5 GSB 5 082 - 32 5 GSB 5 120 - 30 - 158 - 8 8 007 - 42,21, 2 LBL 2 045 - 43, 5, 2 CF 2 083 - 45 2 RCL 2 121 - 2 2 159 - 32 4 GSB 4 008 - 1 1 046 - 45 2 RCL 2 084 - 32 12 GSB B 122 - 34 x<>y 160 - 44 7 STO 7 009 - 32 6 GSB 6 047 - 32 9 GSB 9 085 - 45 3 RCL 3 123 - 14 y^x 161 - 43 32 RTN 010 - 45 7 RCL 7 048 - 45 3 RCL 3 086 - 32 12 GSB B 124 - 42, 4, 0 x<>0 162 - 42,21, 4 LBL 4 011 - 32 7 GSB 7 049 - 32 9 GSB 9 087 - 45 4 RCL 4 125 - 43 32 RTN 163 - 32 3 GSB 3 012 - 45 7 RCL 7 050 - 45 4 RCL 4 088 - 42,21,12 LBL B 126 - 42,21, 6 LBL 6 164 - 45 24 RCL (i) 013 - 43,30, 1 TEST 1 051 - 32 9 GSB 9 089 - 32 1 GSB 1 127 - 44,40, 1 STO+ 1 165 - 45,10, 5 RCL/ 5 014 - 32 14 GSB D 052 - 43, 6, 2 F? 2 090 - 45 0 RCL 0 128 - 45 1 RCL 1 166 - 43 44 INT 015 - 8 8 053 - 22 8 GTO 8 091 - 43, 6, 3 F? 3 129 - 9 9 167 - 1 1 016 - 0 0 054 - 45 6 RCL 6 092 - 16 CHS 130 - 10 / 168 - 0 0 017 - 45 1 RCL 1 055 - 32 14 GSB D 093 - 40 + 131 - 43 44 INT 169 - 42,21, 0 LBL 0 018 - 43,30, 6 TEST 6 056 - 22 15 GTO E 094 - 34 x<>y 132 - 44 2 STO 2 170 - 10 / 019 - 22 2 GTO 2 057 - 42,21,13 LBL C 095 - 36 ENTER 133 - 45 1 RCL 1 171 - 43 36 LSTx 020 - 1 1 058 - 1 1 096 - 2 2 134 - 9 9 172 - 34 x<>y 021 - 16 CHS 059 - 16 CHS 097 - 6 6 135 - 32 0 GSB 0 173 - 42 44 FRAC 022 - 44 1 STO 1 060 - 32 6 GSB 6 098 - 32 3 GSB 3 136 - 44 3 STO 3 174 - 20 x 023 - 42,21,15 LBL E 061 - 43,30, 1 TEST 1 099 - 44 24 STO (i) 137 - 3 3 175 - 43 34 RND 024 - 8 8 062 - 22 13 GTO C 100 - 33 Rv 138 - 10 / 176 - 43 32 RTN 025 - 0 0 063 - 45 6 RCL 6 101 - 9 9 139 - 43 44 INT 177 - 42,21, 1 LBL 1 026 - 45 1 RCL 1 064 - 43, 4, 3 SF 3 102 - 40 + 140 - 45 2 RCL 2 178 - 36 ENTER 027 - 43,30, 5 TEST 5 065 - 32 14 GSB D 103 - 32 5 GSB 5 141 - 3 3 179 - 36 ENTER 028 - 43 32 RTN 066 - 43, 5, 3 CF 3 104 - 43 32 RTN 142 - 10 / 180 - 2 2 029 - 1 1 067 - 22 8 GTO 8 105 - 42,21, 9 LBL 9 143 - 43 44 INT 181 - 6 6 030 - 32 6 GSB 6 068 - 42,21, 7 LBL 7 106 - 32 1 GSB 1 144 - 3 3 182 - 32 3 GSB 3 031 - 45 7 RCL 7 069 - 42, 4, 6 x<>6 107 - 45,10, 0 RCL/ 0 145 - 20 x 183 - 45 24 RCL (i) 032 - 43,30, 1 TEST 1 070 - 44 0 STO 0 108 - 43 44 INT 146 - 40 + 184 - 43 32 RTN 033 - 22 15 GTO E 071 - 45 2 RCL 2 109 - 2 2 147 - 44 4 STO 4 185 - 42,21, 3 LBL 3 034 - 32 7 GSB 7 072 - 1 1 110 - 32 0 GSB 0 148 - 8 8 186 - 40 + 035 - 42,21, 8 LBL 8 073 - 7 7 111 - 43,30, 1 TEST 1 149 - 45,30, 3 RCL- 3 187 - 44 25 STO I 036 - 9 9 074 - 32 3 GSB 3 112 - 43, 4, 2 SF 2 150 - 13 10^x 188 - 33 Rv 037 - 45 6 RCL 6 075 - 45 6 RCL 6 113 - 33 Rv 151 - 44 5 STO 5 189 - 43 32 RTN 038 - 43,30, 5 TEST 5 076 - 45,30, 0 RCL- 0 114 - 33 Rv 152 - 45 2 RCL 2


Re: Optimization questions - HP-15C
Message #32 Posted by Luiz C. Vieira (Brazil) on 31 Oct 2012, 4:28 a.m.,
in response to message #31 by Thomas Klemm

Hi, Thomas.
Just out of curiosity: did you use any specific graph-generator program for the call-chain representation or it is just hand-made using a regular graphic program? I'm asking about this because the final view is very harmonic and easy to understand.
Cheers.
Luiz (Brazil)

Re: Optimization questions - HP-15C
Message #33 Posted by Thomas Klemm on 31 Oct 2012, 5:40 a.m.,
in response to message #32 by Luiz C. Vieira (Brazil)

I used "dot" to create the image "sudoku.png":
dot -Tpng -o sudoku.png < sudoku.dot
You need a configuration file "sudoku.dot":

digraph config { size ="10,13"; rankdir = "LR"; ranksep =.5; nodesep = .1; ratio = "fill"; subgraph SUDOKU { node [style=filled, color="#bccec2", shape=rectangle, fontname=Verdana, fontsize=24]; edge [style=dashed]; label=GTO; "A" -> ".0"; "A" -> "E"; "A" -> "C"; "A" -> ".9"; edge [style=solid]; label=GSB; "1" -> "3"; "D" -> "5"; "D" -> "B"; "D" -> "B"; "D" -> "B"; "B" -> "1"; "B" -> "3"; "B" -> "5"; "7" -> "3"; "6" -> "0"; "6" -> "4"; "6" -> "4"; "4" -> "3"; "4" -> "0"; "A" -> "6"; "A" -> "7"; "A" -> "D"; "A" -> "6"; "A" -> "7"; "A" -> "7"; "A" -> "5"; "A" -> "9"; "A" -> "9"; "A" -> "9"; "A" -> "D"; "A" -> "6"; "A" -> "D"; "9" -> "1"; "9" -> "0"; "9" -> "5"; } }

It's just part of my Linux distribution so I don't know where to get it.
HTH
Thomas

Re: Optimization questions - HP-15C
Message #34 Posted by Luiz C. Vieira (Brazil) on 31 Oct 2012, 5:48 a.m.,
in response to message #33 by Thomas Klemm

Very interesting!
Thanks a lot!
In time: which Linux distribution/version do you use?
Luiz (Brazil)

Re: Optimization questions - HP-15C
Message #35 Posted by Thomas Klemm on 31 Oct 2012, 6:09 a.m.,
in response to message #34 by Luiz C. Vieira (Brazil)

/etc/SuSE-release:
openSUSE 11.3 (i586) VERSION = 11.3
That's rather old I guess but not my decision.
Cheers
Thomas


Re: Optimization questions - HP-15C
Message #36 Posted by Thomas Klemm on 31 Oct 2012, 6:20 a.m.,
in response to message #33 by Thomas Klemm

Meanwhile I found this article: DOT language
It appears that dot is a filter based on Graphviz for drawing directed graphs.

Re: Optimization questions - HP-15C
Message #37 Posted by Marcel Samek on 31 Oct 2012, 5:45 p.m.,
in response to message #31 by Thomas Klemm

Thanks for taking the time with the code!! That's a cool visualization.
The labeling is a mess. As I mentioned earlier, I spent a long time trying to squeeze the code down and during that time I went through many iterations of labeling.
One consideration that I dealt with was looking at the number of GSB/GTOs targeting a particular label and then using that to determine whether to use a two byte label or a single byte one.
I am going to incorporate the various changes that have been suggested and that you have incorporated into the code below, but I was thinking of doing it step by step and running some timing tests as I do that. I am curious as to how much of a performance increase each of the changes gets me on a real world test rather than a benchmark.
I'm trying to get nonpareil working for me so I can try using it to run the timing tests. (All I'm interested in is relative times, not actual times, so an emulator seems fine for that).


Re: Optimization questions - HP-15C
Message #38 Posted by Thomas Klemm on 31 Oct 2012, 9:56 p.m.,
in response to message #37 by Marcel Samek

In your utility subroutine for setting flag matrix values (LBL B) the ENTER is dispensable:

XY SWITCH ; bring the row index back into x ; ENTER ; duplicate it 2 ; 26 is the starting register for the flag matrix 6
ENTER disables the stack lift so the consecutive 26 will just overwrite the value X of the stack.

Quote:
The labeling is a mess.
Rest assured, your code is well structured. The main problem I see is that you declared the subroutines before using them.
Kind regards
Thomas

Re: Optimization questions - HP-15C
Message #39 Posted by Thomas Klemm on 31 Oct 2012, 11:17 p.m.,
in response to message #37 by Marcel Samek

Just another suggestion for change(x):
Original:

LBL 6 STO+ 1 ; x holds +1 or -1; Register 1 is the current index
RCL 1 ; get the current index (0 to 80) 9 ; integer divide by 9 to get the row index (0 to 8) / ; no integer divide on 15c so do a floating point divide INT ; use the INT operator to finish of the integer divide STO 2 ; register 2 contains the current row index
RCL 1 ; start with the current index again 9 ; we are going to do a modulus 9 to get the column index GSB 0 ; do the modulus operation STO 3 ; register 3 holds the current column index
Suggestion:

LBL 6 STO+ 1 ; x holds +1 or -1; Register 1 is the current index
RCL 1 ; get the current index (0 to 80) RCL 1 ; copy to calculate col later 9 ; integer divide by 9 to get the row index (0 to 8) / ; no integer divide on 15c so do a floating point divide INT ; use the INT operator to finish of the integer divide STO 2 ; register 2 contains the current row index
9 ; * ; - ; col = index - 9 * row STO 3 ; register 3 holds the current column index
This avoids the call to mod(x, y).
Did you consider to just keep row and col? The drawback is that you have to handle over- and underflow of col. But most of the time it's just to increment or decrement col. That might be faster than the calculation of row and col from index in all cases.
In the utility routine to extract the value at the current column from the matrix indirectly specified by x&y, I suggest to reverse the use of INT and FRAC:

LBL 4 GSB 3 ; set the indirect register based on x & y RCL (i) ; get the row from the matrix passed in RCL/ 5 ; get the power of 10 value for the current position ; divide the complete row value by the power of 10 (equivalent to a shift down) FRAC ; trim off the digits on the left of the decimal 1 ; now shift the first digit to the left of the decimal 0 * INT ; cut off what is left on the right of the decimal
I assume you'd have to use 9 instead of 8 in the calculation of the power of 10 of current column index:

9 ; now calculate the power of 10 of the current column RCL- 3 ; Get the current column index into x ; subtract 9 because we want to go right to left for integer math to work
I haven't tested that though.
Let's see how we can get rid of the call to mod(x, y) in the subroutine 9:
LBL 9 ; note that this subroutine label is reused GSB 1 ; get the appropriate row (x) from the flag matrix RCL/ 0 ; get the temp register stuffed by the caller ; do the integer divide INT 2 ; do a modulus 2 to see if the bit is set / FRAC TEST 1 ; if the bit is set, set user flag 2

So in the end the subroutine mod(x, y) can be removed altogether. This saves more bytes than we added but most of all it saves an expensive subroutine call which is repeated often.
Kind regards
Thomas

Re: Optimization questions - HP-15C
Message #40 Posted by Marcel Samek on 1 Nov 2012, 7:08 p.m.,
in response to message #39 by Thomas Klemm

Thomas,
Thank you very much for your all your truly excellent suggestions. I am not sure you realize how much you helped!
I ran some instrumented profiles on the algorithm earlier today and it turns out that roughly 80% of the subroutine calls to mod() were coming from the main loop (LBL 9) and the other 20% from the change() subroutine (LBL 6).
Your suggestions for removing the calls to MOD in the LBL 9 subroutine immediately eliminated roughly 80% of the calls to MOD
Furthermore, the few bytes of memory saved by yours and others various suggestions allowed me to put in an optimization in the main loop that reduced the amount of work that it performs by roughly 40%. Previously, it would check whether a digit was found in the current row, column, and block with each of those checks setting a flag if it was found. Only after all three checks had been made, was there a conditional to check the flag and try the next digit if the one being tried was already found somewhere. With a few extra bytes of memory available, I was able to put that conditional check after each check, so, for example, it will not do the work to check the column or the block, if the digit was already found in the row.
I have not included your reordering of the subroutines yet. When I did the instrumented profiles I got data on how often each subroutine is called from each different place in the code. I am now pondering how to use that data to optimally arrange the subroutines that are called from multiple places.
I have not yet included your suggestion on changing the LBL 4 subroutine yet, but I will investigate it.
I am attaching a new annotated listing and a nonpareil state file which reflect the current state of the code I am working with.
One question: When I use nonpareil, I am always having to issue the DIM command. It never seems to be sticky. Is there something I can do about that or do I have to live with it?
Current annotated listing:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; mod(x, y) ; Returns y % x. Note the RND appears necessary because of precision issues. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 0 / ; y / x � start with the floating point division LASTX ; put x back onto the stack X<>Y ; switch x & y so that we can operate on y FRAC ; take the fractional part of y * ; and multiply it with the original x to get the modulus RND ; have to round on the 15c to ensure it's an integer RTN
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; getPart(x) ; Returns the integer representing the entire Xth row of the flag matrix ; Row numbers start at 0. ; returns value in x - input parameter x ends up in y ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 1 ENTER ENTER ; duplicate the parameter so we can leave it for the caller 2 ; 26 is the starting register for the flag matrix 6 GSB 3 ; set the indirect register to the row specified by x RCL (i) ; retrieve the entire row from the flag matrix RTN
; subroutine to set the indirect register and remove the parameters from the stack LBL 3 + ; x+y is the memory offset we want STO I ; put it in the indirect register RDN ; get rid of the sum from the stack RTN
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; setPow2(x) ; Sets the utility temp register to 2^(x-1). Leaves x in place. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 5 STO 0 ; store the input X in the temp register 1 ; we want to subtract 1 from the exponent - ; calculate x-1 2 ; set the base as 2 X<>Y ; the y^x function wants x and y reversed y^x ; calculate the value X<>0 ; stuff the result in the temp register and restore the input x RTN
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; setU(x) ; Sets/clears the flag matrix values (used to indicate that x is set at the current row/column/block) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL D GSB 5 ; calculate the bit value (for the row) we need to set or clear in the existing values RCL 2 ; Get the current row index into x GSB B ; set the appropriate flag matrix values and calculate the new bit value for the column RCL 3 ; Get the current column index into x GSB B ; set the appropriate flag matrix values and calculate the new bit value for the block RCL 4 ; Get the current block index into x
; MUST IMMEDIATELY FOLLOW PRECEEDING SUBROUTINE ; utility subroutine for setting flag matrix values
LBL B GSB 1 ; get the current flag matrix row at index x
RCL 0 ; get the temp register which holds the bit value we will be setting F? 3 ; flag 3 indicates if we are setting or clearing the flag CHS ; if we are clearing, we will do a subtraction instead + ; add the values which is equivalent to a boolean operation either setting or clearing the flag
X<>Y ; bring the row index back into x 2 ; 26 is the starting register for the flag matrix 6 GSB 3 ; set the indirect register so that we are ready to store the new value STO (i) ; store the new value into the flag matrix RDN ; get rid of the new value to restore the stack 9 ; just for convenience, the next bit value will be 9 bits to the left + ; set the next bit index GTO 5 ; calculate the value with that bit set ; we GTO instead of GSB and it will take care of returning for us
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; putA(x) ; Set the value x into the current row/column in the trial solution. ; Does it by subtracting the previous value and adding the new one. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 7 X <> 6 ; swap new value with register that holds current value STO 0 ; store the old value in the temp register RCL 2 ; Get the current row index into x 1 ; 17 is the starting register for the current trial solution 7 GSB 3 ; Set the indirect register RCL (i) ; Get the current value for the entire row RCL 6 ; Get the new value RCL- 0 ; subtract the old value from the new value RCL* 5 ; shift the power of 10 to the appropriate column + ; add to the old value STO (i) ; store the new row value from where we got it RTN
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; change(x) ; Increments or decrements the current position in the trial solution. ; Updates the registers containing the current row, column and block index, ; as well as those containing the power of 10 factor for the current column and others ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 6 STO+ 1 ; x holds +1 or -1; Register 1 is the current index
RCL 1 ; get the current index (0 to 80) RCL 1 ; get the current index (0 to 80) 9 ; integer divide by 9 to get the row index (0 to 8) / ; no integer divide on 15c so do a floating point divide INT ; use the INT operator to finish of the integer divide STO 2 ; register 2 contains the current row index
9 * - ; col = index � 9 * row STO 3 ; register 3 contains the current column index
3 ; calculate the block index from the row & column indexes / ; block = INT(col/3) + INT(row/3)*3 INT ; it would be nice to save a couple of bytes in this section of code RCL 2 3 / INT 3 * + STO 4 ; register 4 holds the block index
8 ; now calculate the power of 10 of the current column RCL- 3 ; Get the digit (from right) based on the column 10^X ; calculate the exponent STO 5 ; save in register 5 which is used throughout the code
RCL 2 ; get the current row 1 ; 17 is the start register of the current trial solution 7 GSB 4 ; extract the value at the current column STO 6 ; register 6 contains the current trial value at the current row/column
RCL 2 ; get the current row 8 ; 8 is the start register of the input data from the user GSB 4 ; extract the value at the current column STO 7 ; register 7 contains the starting value at the current row/column (0 if none) RTN
; Utility routing to extract the value at the current column from the matrix indirectly specified by x&y LBL 4 GSB 3 ; set the indirect register based on x & y RCL (i) ; get the row from the matrix passed in RCL / 5 ; shift the row to the right so that digit we are interested in is least significant INT ; trim off the digits shifted to the right of the decimal 1 ; we will do a modulus 10 to extract the last digit 0 GTO 0 ; do the modulus operation ; do a GTO instead of GSB and it will return for us
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; main() ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL A CF 2 ; make sure flag 2 is unset � CLR REG does not do this CF 3 ; make sure flag 3 is unset � CLR REG does not do this 1 ; start with a index in register 1 of -1 (0 to 80) CHS ; that way we can start with an increment operation STO 1 ; and actually start at 0 where we want.
LBL 2 ; we are going to loop through all input values and set the flags to show they are set 1 ; go forward one position at a time GSB 6 ; go to the next position in the trial solution and set all temp variables
RCL 7 ; get the starting input value at this row/col GSB 7 ; set the value in the trial solution RCL 7 ; get the starting input value again because the last call destroyed it TEST 1 ; if it is > 0 then the user actually input a value for this row/col GSB D ; set the flags to indicate this value is set
8 ; 80 is the upper bound of the indexes (9x9 = 80 = 0:80) 0 RCL 1 ; get the current index TEST 6 ; if the current index hasn't reached 80 GTO 2 ; do the next value 1 ; reset the starting value CHS ; to -1 as we did at the beginning of the program STO 1 ; register 1 holds the current index
LBL E ; m1 ; main solution loop 8 ; when we reach the last index (80) we are done 0 RCL 1 ; register 1 holds the current index TEST 5 ; see if we are at the end RTN ; finished ; woohoo � we are done! 1 ; Go forward one spot GSB 6 ; Do the position increment RCL 7 ; get the starting input value at this row/col TEST 1 ; if it's > 0, the user specified a value here GTO E ; go forward, since this value was explicitly specified by the user and doesn't need to be solved for GSB 7 ; Set the value in the trial solution
LBL 8 ; m2 9 ; we check the possible digits in order 1-9. If we are on nine, we've tried them all at this location RCL 6 ; Get the current trial solution value TEST 5 ; Check to see if it is 9 GTO C ; If it is, backup one step 1 ; We weren't at 9 yet, so increment the value by 1 + GSB 7 ; Set the value in the trial solution
RCL 6 ; Get the current trial solution value GSB 5 ; Calc 2^x-1 to get the bit mask CF 2 ; Clear the flag we will use to see if the test value has already been used in row/column/block RCL 2 ; Get the current row index into x GSB 9 ; see if the current trial solution value has already been used in the row F? 2 ; If the number has been used in the block, try the next value GTO 8 RCL 3 ; Get the current column index into x GSB 9 ; see if the current trial solution value has already been used in the column F? 2 ; If the number has been used in the block, try the next value GTO 8 RCL 4 ; Get the current block index into x GSB 9 ; see if the current trial solution value has already been used in the block F? 2 ; If the number has been used in the block, try the next value GTO 8 RCL 6 ; Get the current trial solution value GSB D ; set the flags to indicate this value is set GTO E ; move on to the next position in the puzzle
LBL C ; m3 ; Come here to back up to the previous position 1 ; We will go one spot backwards CHS GSB 6 ; Set the new current position and all temp values TEST 1 ; the previous call leaves the starting value at this position in X GTO C ; if that value is > 0, meaning the value was set, backup one more spot RCL 6 ; Get the current trial solution value
SF 3 ; Set the 3 flag to indicate we will be clearing the flag matrix bits, not setting them. GSB D ; Set/Clear the flag matrix bits CF 3 ; unset the 3 flag GTO 8 ; check the next digit
LBL 9 GSB 1 ; get the appropriate row (x) from the flag matrix RCL/ 0 ; get the temp register stuffed by the caller INT 2 ; if the bit is set, there will be a remainder when dividing by two / FRAC TEST 1 ; if the bit is set, set user flag 2 which is used as a return value SF 2 RDN ; move the stack down to prepare the caller for the next call RDN ; move the stack down to prepare the caller for the next call 9 ; the bits flags for row/column/block are shifted by 9 from each other + ; calculates the appropriate bit offset for the next call GTO 5 ; calc 2^x-1 to get the bit mask ; do a GTO instead of GSB and it will return for us

Nonpareil state file:

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE state SYSTEM "nonpareil.dtd"> <state version="1.00" model="15C" platform="voyager" arch="nut"> <ui/> <chip name="Nut"> <registers> <reg name="a" data="00000000000000"/> <reg name="b" data="00000000000000"/> <reg name="c" data="00000000000eae"/> <reg name="m" data="00000000000000"/> <reg name="n" data="00000000000000"/> <reg name="g" data="09"/> <reg name="p" data="c"/> <reg name="q" data="4"/> <reg name="q_sel" data="0"/> <reg name="fo" data="00"/> <reg name="s" data="0800"/> <reg name="pc" data="0000"/> <reg name="stack" index="0" data="009b"/> <reg name="stack" index="1" data="0042"/> <reg name="stack" index="2" data="0000"/> <reg name="stack" index="3" data="0000"/> <reg name="decimal" data="0"/> <reg name="carry" data="0"/> <reg name="awake" data="0"/> <reg name="pf_addr" data="00"/> <reg name="ram_addr" data="007"/> <reg name="active_bank" index="0" data="0"/> <reg name="active_bank" index="1" data="0"/> <reg name="active_bank" index="2" data="0"/> <reg name="active_bank" index="3" data="0"/> <reg name="active_bank" index="4" data="0"/> <reg name="active_bank" index="5" data="0"/> <reg name="active_bank" index="6" data="0"/> <reg name="active_bank" index="7" data="0"/> <reg name="active_bank" index="8" data="0"/> <reg name="active_bank" index="9" data="0"/> <reg name="active_bank" index="a" data="0"/> <reg name="active_bank" index="b" data="0"/> <reg name="active_bank" index="c" data="0"/> <reg name="active_bank" index="d" data="0"/> <reg name="active_bank" index="e" data="0"/> <reg name="active_bank" index="f" data="0"/> </registers> </chip> <chip name="Voyager LCD"> <registers> <reg name="enable" data="1"/> <reg name="blink" data="0"/> </registers> </chip> <memory as="ram"> <loc addr="000" data="00000000000000"/> <loc addr="001" data="00000000000000"/> <loc addr="002" data="00000000000000"/> <loc addr="003" data="00000000000000"/> <loc addr="004" data="00000000000000"/> <loc addr="005" data="00000000000008"/> <loc addr="006" data="0000000000000c"/> <loc addr="007" data="00000000000eae"/> <loc addr="008" data="00000000000000"/> <loc addr="009" data="2faf8befbe2280"/> <loc addr="00a" data="bef282bcbcaf80"/> <loc addr="010" data="00000000000000"/> <loc addr="011" data="00000000000000"/> <loc addr="012" data="00000000000000"/> <loc addr="013" data="00000000000000"/> <loc addr="014" data="00000000000000"/> <loc addr="015" data="c0d2d2d2d2d2d2"/> <loc addr="016" data="00000000000000"/> <loc addr="016" data="000000000003e1"/> <loc addr="017" data="00000000000000"/> <loc addr="018" data="00000000000000"/> <loc addr="019" data="00000000000000"/> <loc addr="01a" data="00000000a00000"/> <loc addr="0c0" data="00000000000000"/> <loc addr="0c1" data="00000000000000"/> <loc addr="0c2" data="00000000000000"/> <loc addr="0c3" data="00000000000000"/> <loc addr="0c4" data="00000000000000"/> <loc addr="0c5" data="00000000000000"/> <loc addr="0c6" data="00000000000000"/> <loc addr="0c7" data="00000000000000"/> <loc addr="0c8" data="00000000000000"/> <loc addr="0c9" data="00000000000000"/> <loc addr="0ca" data="00000000000000"/> <loc addr="0cb" data="00000000000000"/> <loc addr="0cc" data="00000000000000"/> <loc addr="0cd" data="00000000000000"/> <loc addr="0ce" data="00000000000000"/> <loc addr="0cf" data="00000000000000"/> <loc addr="0d0" data="00000000000000"/> <loc addr="0d1" data="00000000000000"/> <loc addr="0d2" data="00000000000000"/> <loc addr="0d3" data="00000000000000"/> <loc addr="0d4" data="00000000000000"/> <loc addr="0d5" data="00000000000000"/> <loc addr="0d6" data="00000000000000"/> <loc addr="0d7" data="00000000000000"/> <loc addr="0d8" data="00000000000000"/> <loc addr="0d9" data="00000000000000"/> <loc addr="0da" data="00000000000000"/> <loc addr="0db" data="00000000000000"/> <loc addr="0dc" data="00000000000000"/> <loc addr="0dd" data="00000000000000"/> <loc addr="0de" data="00000000000000"/> <loc addr="0df" data="00000000000000"/> <loc addr="0e0" data="00000000000000"/> <loc addr="0e1" data="0000000015faf9"/> <loc addr="0e2" data="c4c432ff71a3fd"/> <loc addr="0e3" data="f2ebe0cf210918"/> <loc addr="0e4" data="43ff2d33ff361c"/> <loc addr="0e5" data="7126c3f10c1e2d"/> <loc addr="0e6" data="361852ff293418"/> <loc addr="0e7" data="52ff29331852ff"/> <loc addr="0e8" data="293242ff253627"/> <loc addr="0e9" data="faf11c7536f908"/> <loc addr="0ea" data="271e713726f1b2"/> <loc addr="0eb" data="7531f0f80e41c3"/> <loc addr="0ec" data="f1127631f0f82d"/> <loc addr="0ed" data="7137273726f102"/> <loc addr="0ee" data="41c3f143ff42ff"/> <loc addr="0ef" data="0a10f0f1ebe5cf"/> <loc addr="0f0" data="862304b24724f8"/> <loc addr="0f1" data="324624f7f13245"/> <loc addr="0f2" data="cca3cff844fafc"/> <loc addr="0f3" data="f3ebfdf332ebfd"/> <loc addr="0f4" data="f343fbfcf942eb"/> <loc addr="0f5" data="fdf9313181df06"/> <loc addr="0f6" data="b296fac5cfa0cf"/> <loc addr="0f7" data="368623f7f13240"/> <loc addr="0f8" data="56ef0715faf9c4"/> <loc addr="0f9" data="9623f6f2c5fac3"/> <loc addr="0fa" data="53ff30210b342b"/> <loc addr="0fb" data="332b32250db280"/> <loc addr="0fc" data="cdc5f2fbf14005"/> <loc addr="0fd" data="b2c497fa03b286"/> <loc addr="0fe" data="23f6f2c1c101b2"/> <loc addr="0ff" data="b5fca3c5b1fd00"/> </memory> </state>


Re: Optimization questions - HP-15C
Message #41 Posted by Thomas Klemm on 1 Nov 2012, 8:49 p.m.,
in response to message #40 by Marcel Samek

Quote:
One question: When I use nonpareil, I am always having to issue the DIM command. It never seems to be sticky. Is there something I can do about that or do I have to live with it?
If you don't specify the option -new when compiling the program with vcomp the nst-file that is already there is used instead of the template of the script. So you could save the state after changing the dimension and use this file as template. I don't know whether that works properly and can't test that at the moment.
cf. Effective Computer-aided Calculator Programming - Part 1 - Voyager
Quote:
When compiling you have the option of creating a new out-of-the-box state file or inserting your code into an existing state file preserving the contents of the stack and storage registers.
NOTE: Inserting into an existing state file is experimental. Backup your data first.
NOTE: New state files are not exactly "out-of-the-box", the 11C, 12C, and 15C are out-of-the-box + FIX 9.
I don't know where the DIM-information is stored within the nst-file. It could be "pf_addr" or "ram_addr" but I might be completely wrong. You could try to find out by comparing different nst-files after only changing the dimension and modify the template that is used for the 15C model. Or if you prefer add an additional option to specify that.
HTH
Thomas

Re: Optimization questions - HP-15C
Message #42 Posted by Thomas Klemm on 2 Nov 2012, 6:34 a.m.,
in response to message #40 by Marcel Samek

Meanwhile I found out how to do it. Replace line 3036 of vcomp:
<loc addr="015" data="c0d2d2d2d2d2d2"/>
with:
<loc addr="015" data="c0e1e1e1e1e1e1"/>
The formula is: e1₁₆ = bf₁₆ + 34₁₀.
Cheers
Thomas

Re: Optimization questions - HP-15C
Message #43 Posted by Gerson W. Barbosa on 2 Nov 2012, 1:04 p.m.,
in response to message #40 by Marcel Samek

Quote:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; mod(x, y) ; Returns y % x. Note the RND appears necessary because of precision issues. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 0 / ; y / x � start with the floating point division LASTX ; put x back onto the stack X<>Y ; switch x & y so that we can operate on y FRAC ; take the fractional part of y * ; and multiply it with the original x to get the modulus RND ; have to round on the 15c to ensure it's an integer RTN
This will not always work. For instance,
1234567891 ENTER 123 GSB 0 --> 40.59000000 (FIX 8) or 41. (FIX 0)
Correct answer: 40.00000000
For other applications, I would suggest either

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; mod(x, y) ; Returns y % x. ; (x, y: positive integers) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 0 X<>Y ; STO 0 ; X<>Y ; / ; LASTX ; X<>Y ; INT ; * ; RCL- 0 ; CHS ; RTN
or
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; mod(x, y) ; Returns y % x. ; (x, y: positive integers) ; (t is not preserved) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LBL 0 X<>Y ; ENTER ; R^ ; R^ ; X<>Y ; Rv ; / ; LASTX ; X<>Y ; INT ; * ; - ; RTN
Regards,
Gerson.

Re: Optimization questions - HP-15C
Message #44 Posted by Marcel Samek on 2 Nov 2012, 3:48 p.m.,
in response to message #43 by Gerson W. Barbosa

Gerson,
Thanks for pointing that out. I had stuck that RND in a long time ago without really thinking it through.
In my quest for speed, I have actually removed the need for a mod() subroutine entirely. However it's good to know that the code was flawed so that I don't reuse it anywhere.
Marcel

Re: Optimization questions - HP-15C
Message #45 Posted by David Hayden on 29 Oct 2012, 3:23 p.m.,
in response to message #1 by Marcel Samek

Out of curiosity, I ran some tests. My main program looped 1000 times. Each loop does a GSB 2 and label 2 simply returns. Then I inserted different numbers of ENTER instructions between the main program and LBL 2 to increase the distance between the two.
I ran the program in a 15C LE and timed the execution with the second hand of my watch. To my surprise, there was essentially no difference in execution time, whether LBL 2 was 4 instructions away or 75, the program always took about 4 seconds to run.
I suspect that 15C reserves room in the GSB instruction to store the actual location of the target label. The first time you run the program, it probably stores the location so subsequent GSB's will jump right to the correct spot.
This is total conjecture of course.

Re: Optimization questions - HP-15C
Message #46 Posted by Marcel Samek on 29 Oct 2012, 4:07 p.m.,
in response to message #45 by David Hayden

Thanks! I had been planning on running the same kind of tests when I found a bit of spare time.
This is really interesting (and, unfortunately, not particularly encouraging).
If your 1000 item loop has a similar overhead to the one that Namir used in his testing in his great article on performance (http://tinyurl.com/9xtok7k) then the loop took about 2.5 seconds and the GSB & RTN took the remaining 1.5 seconds.
I may have to spend my energies on algorithmic optimization - which is quite difficult given the tight memory. In fact, one could argue that I went through a process of intentional algorithmic deoptimization just to get the code small enough....


Re: Optimization questions - HP-15C
Message #47 Posted by Marcel Samek on 29 Oct 2012, 8:15 p.m.,
in response to message #45 by David Hayden

I just ran a test which did not involve a GSB, but a GTO and it did seem to show that there is a speed difference depending on the distance.
The calculator had a 200 step program in it. I deleted 10 steps and inserted the following at the very beginning of memory:

LBL 1 1 STO+ 0 GTO 1
I did a CLEAR REG followed by a GSB 1 and I let it run for 30 seconds and then did a R/S and examined register 0.
I did a couple runs, averaging the results which were consistent and averaged about 215 loops per second.
Because I had inserted the program at the beginning of memory, there were about 190 program steps including about 15 labels after this loop.
I then deleted all the program steps after the above loop, so that it was the only thing in memory and reran it several times.
Once again, the results were relatively consistent but showd a speed of about 565 loops per second.
This seems to imply that the distance did have some impact on the performance. It also implies that the if you have any loops, the larger your program, the slower all your loops will be...

Edited: 29 Oct 2012, 8:16 p.m.

Re: Optimization questions - HP-15C
Message #48 Posted by Luiz C. Vieira (Brazil) on 29 Oct 2012, 9:10 p.m.,
in response to message #47 by Marcel Samek

Hi.
Your results agree with the way the O.S. is described as for searching labels, i.e., always forward. No compilation, as it happens with the HP41. I am not sure if you know about it, but in the HP41 the first execution of a program is the slowest one if it has many backward jumps, as the one you show here. At the first jump, the O.S. identifies the direction and # of bytes and stores it in the GTO itself (modifies the bytes composing the GTO instruction). There are some particularities as for short-range and long-range jump, but this is the main 'modus operandi'.
I liked your approach for measuring the calculator speed.
Cheers.
Luiz (Brazil)

Re: Optimization questions - HP-15C
Message #49 Posted by David Hayden on 8 Nov 2012, 7:03 a.m.,
in response to message #48 by Luiz C. Vieira (Brazil)

Quote:
At the first jump, the O.S. identifies the direction and # of bytes and stores it in the GTO itself (modifies the bytes composing the GTO instruction).
It occurs to me that this has implications for programs stored in ROM and, to a lesser extent, programs stored on cards or tape. If the program is in ROM then the OS can't modify the GTO instruction, so you'd want all those GTO/GSB instructions to be modified already before copying them to ROM. But how would OS know which GTO/GSB targets are part of the program being copied and thus safe to modify?
If programs in ROM don't have the GTO/GSB instructions modified, then they might run faster if copied to RAM first.
Does anyone know how this works?
Dave

Re: Optimization questions - HP-15C
Message #50 Posted by Luiz C. Vieira (Brazil) on 8 Nov 2012, 7:45 a.m.,
in response to message #49 by David Hayden

Hi.
You are correct in your assumptions, but consider that the HP41-based, user-coded programs stored in ROM modules cannot be changed once recorded, so they are pre-compiled and the distance (# of bytes and direction) from GTO/XEQ (GSB) to their corresponding LBL's is already stored accordingly. So they are executed in their 'best speed' anytime.
This is not true with programs stored in recordable media (magnetic cards or tape). Even if you have your program already compiled in RAM (after being executed as may times as needed to compute all distances OR with specific programs to compile them), these data are lost when they are copied into cards or magnetic tape.
Cheers.
Luiz (Brazil)

[ Return to Index | Top of Index ]

Go back to the main exhibit hall