(50g) Savage SysRPL Revisited
11-11-2018, 07:29 PM (This post was last modified: 01-08-2019 03:39 AM by DavidM.)
Post: #1
 DavidM Senior Member Posts: 821 Joined: Dec 2013
(50g) Savage SysRPL Revisited

(Edit: the SysRPL and Saturn assembly code included below has been edited for compatibility with the built-in MASD assembler [ASM] available on the 49G-50g calculators)

A straightforward UserRPL implementation of the Savage benchmark on a 50g usually looks something like this:
Code:
\<<    RAD    1.    DUP 2499. START       DUP * \v/       LN EXP       ATAN TAN       1. +    NEXT \>> Result: 2499.99948647 Accumulated Error: 0.00051353 Avg. Run Time (50g): 64.639s

Translating the above into SysRPL turns out to be very easy, given that most of those commands have nearly identical counterparts in a SysRPL context:
Code:
!NO CODE !RPL ::    SETRAD    %1    2499 ZERO_DO       DUP %* %SQRT       %LN %EXP       %ATAN %TAN       %1+    LOOP ; @ Result: 2499.99948647 Accumulated Error: 0.00051353 Avg. Run Time (50g): 52.427s

Recall that the emphasis of this particular benchmark is to determine the computation speed of the particular functions used, ie. x*x, SQRT, LN, EXP, ATAN, TAN, x+1. The vast majority of processing time for both of the above versions is spent in the numerical computations of those particular functions, so the SysRPL version is only able to gain about a 19% performance advantage in this case. That time savings results from two general optimizations in the SysRPL version: no type checking of data, and a slightly faster looping construct. While those two processing features can sometimes achieve a decent savings in SysRPL run times, they are minimal when compared to the time spent "number crunching" in these two programs.

One of the other potential advantages of SysRPL implementations is the ability to perform calculations using the full 15-digit internal representation of real values. Chain calculations such as those used in this benchmark are more likely to receive a benefit from this kind of treatment, so it makes sense to re-implement the SysRPL version with this in mind. As is the case for the first SysRPL implementation, translating the UserRPL code to an extended real SysRPL version is fairly simple, except for one particular function: there is no defined ATAN function for extended reals. Each of the other commands has direct counterparts, but a work-around has to be used for ATAN. In this case, the argument to that function is always positive, so an alternative can be used to compute ATAN using arctan(x)=arccos(1/sqrt(1+x^2)) (source: this comp.sys.hp48 post):
Code:
!NO CODE !ASM    DC %%1\2B_        27012 !RPL ::    %%1    2499 ZERO_DO       DUP %%* %%SQRT       %%LN %%EXP       %%1 SWAPDUP %%* %%1+_ %%SQRT %%/ %%ACOSRAD       %%TANRAD       %%1+_    LOOP    ; @ Result: 2499.99999106989 Accumulated Error: 0.00000893011 Avg. Run Time (50g): 73.000s

As expected, this version reduces the accumulated "error" of the final result, but unfortunately takes even longer than the UserRPL version to finish due to the extra computations required for the ATAN work-around. This is frustrating, especially since the internal computations performed by even the standard precision real functions were actually carried out to full 15-digit precision internally before being rounded. The lack of an ATAN function for extended reals thus limits the ability to measure the true performance of the calculator.

Edit: Since my original posting of the following programs, I've subsequently learned that I'm merely the latest participant in a party that started almost 20 years ago! Jonathan Busby had already gone through the same thought processes and come up with similar Saturn solutions to what I've presented below. His code targets the 48-series as opposed to the 49-50, but if you look you'll see that we essentially used the same approach (and nearly the exact same code). All credit is due to Jonathan for these ideas (though I promise I had not seen them prior to my post!).

To remedy this, I propose the following alternative version of an extended real SysRPL implementation:
Code:
!NO CODE !ASM    DC %%1\2B_ 27012    DC POP1%%         3089B    DC ATANF          310C8    DC PUSH%%LOOP     308B7 !RPL ::    %%1    2499 ZERO_DO       DUP %%* %%SQRT       %%LN %%EXP       CODE          GOSBVL POP1%%          ST=1 9          ST=0 4          GOSBVL ATANF          GOVLNG PUSH%%LOOP       ENDCODE       %%TANRAD       %%1+_    LOOP ; @ Result: 2499.99999106989 Accumulated Error: 0.00000893011 Avg. Run Time (50g): 52.084s

This version was written using a Saturn code object that simulates the equivalent of an %%ATANRAD function if it had existed. The entry points it uses are NOT supported, but they are at least consistent with both a v1.19-6 49G as well as a v2.15 50g. They stand a good chance of being in the same fixed locations on intermediate firmware versions, but I haven't attempted to verify that since I don't have any calculators with those firmware versions to test this on.

The final result is of course the same as the previous extended real version. But note the execution time: nearly identical to the standard real SysRPL version. This is because the actual computations occurring in both programs are all carried out to 15 digits internally (though obviously with different intermediate values). The very slight performance improvement of the extended real version is due to the intermediate results not having to be rounded to 12 digits at each step. This rounding in the standard real SysRPL version takes a small but measurable amount of time (compared to the overall computation time).

To satisfy my curiosity, I also implemented one last version of the benchmark, this time entirely in Saturn assembly:
Code:
!NO CODE !ASM    % Unsupported entry point declarations    % (valid for v2.15 50g, v1.19-6 49g)    DC PUSH%%LOOP     308B7    DC SQRT%%         317D2    DC LN%%           3107B    DC EXP%%          31089    DC SIN%%                310B3    DC ATANF          310C8    DC TANF           310C1 !RPL CODE    % save RPL registers and set CPU to decimal mode    SAVE    SETDEC    % R4.A = loop counter    LC 02498    R4=C A    % x = 1 (A/B registers)    A=0 W    A+1 S    ASR W    B=A W    {       % x = x^2       C=B W       D=C W       C=A W       GOSBVL =MULTF       % x = sqrt(x)       GOSBVL SQRT%%       % x = ln(x)       GOSBVL LN%%       % x = e^x       GOSBVL EXP%%       % x = atan(x)       ST=1 9       ST=0 4       GOSBVL ATANF       % x = tan(x)       ST=1 9       ST=0 4       GOSBVL TANF       % x = x + 1       GOSBVL =RADD1       % loop       C=R4 A       C-1 A       R4=C A       UPNC    }    GOVLNG PUSH%%LOOP ENDCODE @ Result: 2499.99999106989 Accumulated Error: 0.00000893011 Avg. Run Time (50g): 42.451s

This version is perhaps the best one to show how much time is spent performing actual numerical calculation for the benchmark (at least at the Saturn emulation level). Stack manipulation only happens once, and is limited to the very last step. Loop overhead is minimal and happens at Saturn speed, and all intermediate calculations are simply performed on the value currently stored in the CPU's A/B registers. This means that very little processing is performed outside the realm of numerical computation in this version, giving the purest view of how much time is spent on the calculations themselves (as opposed to stack manipulation and loop overhead).

I believe these versions of the code give additional insight into the Savage benchmark running on a standard 50g, allowing better comparisons to be made with other platforms and configurations.

(Note: all run times listed are an average of 5 runs of the specified code on my 50g)
 « Next Oldest | Next Newest »