Post Reply 
(50g) Savage SysRPL Revisited
11-11-2018, 07:29 PM (This post was last modified: 01-08-2019 03:39 AM by DavidM.)
Post: #1
(50g) Savage SysRPL Revisited
(See this page for more information regarding the Savage benchmark on HP calculators)

(Edit: the SysRPL and Saturn assembly code included below has been edited for compatibility with the built-in MASD assembler [ASM] available on the 49G-50g calculators)

A straightforward UserRPL implementation of the Savage benchmark on a 50g usually looks something like this:
Code:
\<<
   RAD
   1.
   DUP 2499. START
      DUP * \v/
      LN EXP
      ATAN TAN
      1. +
   NEXT
\>>

Result: 2499.99948647
Accumulated Error: 0.00051353
Avg. Run Time (50g): 64.639s

Translating the above into SysRPL turns out to be very easy, given that most of those commands have nearly identical counterparts in a SysRPL context:
Code:
!NO CODE
!RPL
::
   SETRAD
   %1
   2499 ZERO_DO
      DUP %* %SQRT
      %LN %EXP
      %ATAN %TAN
      %1+
   LOOP
;
@

Result: 2499.99948647
Accumulated Error: 0.00051353
Avg. Run Time (50g): 52.427s

Recall that the emphasis of this particular benchmark is to determine the computation speed of the particular functions used, ie. x*x, SQRT, LN, EXP, ATAN, TAN, x+1. The vast majority of processing time for both of the above versions is spent in the numerical computations of those particular functions, so the SysRPL version is only able to gain about a 19% performance advantage in this case. That time savings results from two general optimizations in the SysRPL version: no type checking of data, and a slightly faster looping construct. While those two processing features can sometimes achieve a decent savings in SysRPL run times, they are minimal when compared to the time spent "number crunching" in these two programs.

One of the other potential advantages of SysRPL implementations is the ability to perform calculations using the full 15-digit internal representation of real values. Chain calculations such as those used in this benchmark are more likely to receive a benefit from this kind of treatment, so it makes sense to re-implement the SysRPL version with this in mind. As is the case for the first SysRPL implementation, translating the UserRPL code to an extended real SysRPL version is fairly simple, except for one particular function: there is no defined ATAN function for extended reals. Each of the other commands has direct counterparts, but a work-around has to be used for ATAN. In this case, the argument to that function is always positive, so an alternative can be used to compute ATAN using arctan(x)=arccos(1/sqrt(1+x^2)) (source: this comp.sys.hp48 post):
Code:
!NO CODE
!ASM
   DC %%1\2B_        27012
!RPL
::
   %%1
   2499 ZERO_DO
      DUP %%* %%SQRT
      %%LN %%EXP

      %%1 SWAPDUP %%* %%1+_ %%SQRT %%/ %%ACOSRAD

      %%TANRAD
      %%1+_
   LOOP
   ;
@

Result: 2499.99999106989
Accumulated Error: 0.00000893011
Avg. Run Time (50g): 73.000s

As expected, this version reduces the accumulated "error" of the final result, but unfortunately takes even longer than the UserRPL version to finish due to the extra computations required for the ATAN work-around. This is frustrating, especially since the internal computations performed by even the standard precision real functions were actually carried out to full 15-digit precision internally before being rounded. The lack of an ATAN function for extended reals thus limits the ability to measure the true performance of the calculator.

Edit: Since my original posting of the following programs, I've subsequently learned that I'm merely the latest participant in a party that started almost 20 years ago! Jonathan Busby had already gone through the same thought processes and come up with similar Saturn solutions to what I've presented below. His code targets the 48-series as opposed to the 49-50, but if you look you'll see that we essentially used the same approach (and nearly the exact same code). All credit is due to Jonathan for these ideas (though I promise I had not seen them prior to my post!).

To remedy this, I propose the following alternative version of an extended real SysRPL implementation:
Code:
!NO CODE
!ASM
   DC %%1\2B_ 27012
   DC POP1%%         3089B
   DC ATANF          310C8
   DC PUSH%%LOOP     308B7
!RPL
::
   %%1
   2499 ZERO_DO
      DUP %%* %%SQRT
      %%LN %%EXP
      CODE
         GOSBVL POP1%%
         ST=1 9
         ST=0 4
         GOSBVL ATANF
         GOVLNG PUSH%%LOOP
      ENDCODE
      %%TANRAD
      %%1+_
   LOOP
;
@

Result: 2499.99999106989
Accumulated Error: 0.00000893011
Avg. Run Time (50g): 52.084s

This version was written using a Saturn code object that simulates the equivalent of an %%ATANRAD function if it had existed. The entry points it uses are NOT supported, but they are at least consistent with both a v1.19-6 49G as well as a v2.15 50g. They stand a good chance of being in the same fixed locations on intermediate firmware versions, but I haven't attempted to verify that since I don't have any calculators with those firmware versions to test this on.

The final result is of course the same as the previous extended real version. But note the execution time: nearly identical to the standard real SysRPL version. This is because the actual computations occurring in both programs are all carried out to 15 digits internally (though obviously with different intermediate values). The very slight performance improvement of the extended real version is due to the intermediate results not having to be rounded to 12 digits at each step. This rounding in the standard real SysRPL version takes a small but measurable amount of time (compared to the overall computation time).

To satisfy my curiosity, I also implemented one last version of the benchmark, this time entirely in Saturn assembly:
Code:
!NO CODE
!ASM
   % Unsupported entry point declarations
   % (valid for v2.15 50g, v1.19-6 49g)
   DC PUSH%%LOOP     308B7
   DC SQRT%%         317D2
   DC LN%%           3107B
   DC EXP%%          31089
   DC SIN%%                310B3
   DC ATANF          310C8
   DC TANF           310C1
!RPL
CODE
   % save RPL registers and set CPU to decimal mode
   SAVE
   SETDEC

   % R4.A = loop counter
   LC 02498
   R4=C A

   % x = 1 (A/B registers)
   A=0 W
   A+1 S
   ASR W
   B=A W
   {
      % x = x^2
      C=B W
      D=C W
      C=A W
      GOSBVL =MULTF

      % x = sqrt(x)
      GOSBVL SQRT%%

      % x = ln(x)
      GOSBVL LN%%

      % x = e^x
      GOSBVL EXP%%

      % x = atan(x)
      ST=1 9
      ST=0 4
      GOSBVL ATANF

      % x = tan(x)
      ST=1 9
      ST=0 4
      GOSBVL TANF

      % x = x + 1
      GOSBVL =RADD1

      % loop
      C=R4 A
      C-1 A
      R4=C A
      UPNC
   }
   GOVLNG PUSH%%LOOP
ENDCODE
@

Result: 2499.99999106989
Accumulated Error: 0.00000893011
Avg. Run Time (50g): 42.451s

This version is perhaps the best one to show how much time is spent performing actual numerical calculation for the benchmark (at least at the Saturn emulation level). Stack manipulation only happens once, and is limited to the very last step. Loop overhead is minimal and happens at Saturn speed, and all intermediate calculations are simply performed on the value currently stored in the CPU's A/B registers. This means that very little processing is performed outside the realm of numerical computation in this version, giving the purest view of how much time is spent on the calculations themselves (as opposed to stack manipulation and loop overhead).

I believe these versions of the code give additional insight into the Savage benchmark running on a standard 50g, allowing better comparisons to be made with other platforms and configurations.


(Note: all run times listed are an average of 5 runs of the specified code on my 50g)
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)