HP Forums

Full Version: [SOLVED] DSRN (dog slow roman numerals)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
This program seems like it takes much longer than it should to run. How long should it take? I have no idea, but not as long as it is taking. I have another program (not shown) that does some string substitution (replacement) and the seat of the pants feeling is it's faster than it should be. I would have thought that scanning for substrings and replacing them with unlike-sized substrings would be one of the more expensive operations. But this simple piece of code that just concatenates a few strings can't get out of its own way.

As you can see there is not a whole lot going on in this code. 1000 tests of converting 2014 (5 while loop executions) to roman numerals in a loop take 970.3 seconds on a 50g on USB power.

Code:
%%HP: T(3)A(R)F(.);
@
@ ( real or int ==> roman numerals )
@
\<< "" SWAP
  WHILE DUP 0 > REPEAT
     CASE 
        DUP 1000 \>= THEN 1000 "M"  END
        DUP  900 \>= THEN  900 "CM" END
        DUP  500 \>= THEN  500 "D"  END 
        DUP  400 \>= THEN  400 "CD" END 
        DUP  100 \>= THEN  100 "C"  END
        DUP   90 \>= THEN   90 "XC" END
        DUP   50 \>= THEN   50 "L"  END
        DUP   40 \>= THEN   40 "XL" END
        DUP   10 \>= THEN   10 "X"  END
        DUP    9 \>= THEN    9 "IX" END
        DUP    5 \>= THEN    5 "V"  END
        DUP    4 \>= THEN    4 "IV" END
        DUP    1 \>= THEN    1 "I"  END
    END
    ROT ROT -
    ROT ROT +
    SWAP
  END DROP

What is the source of the expression "dog slow?" The dogs I had were anything but slow. And even a slow dog is faster than a man can run.
First observation: make sure you use reals rather than ints (e.g. 1000. rather than 1000) as they're handled much faster.

I'll play with it on my 48 and see if I can spot anything else. I have a couple hunches for what to look at.
Thank you, Dave. I knew that. But for some reason (playing around with CAS today) I left it in = mode and that seems to cause numbers typed in to default to ints. I did not notice this happened until you mentioned it. Will run the test again and update!

YOWZA! 142.6449 seconds with reals. Couldn't see the forest for the trees! Thanks again.
Huh, I figured the WHILE/CASE anti-pattern would be slowing it down, but I've tried several different approaches to pulling it apart, and all the locally defined subroutine tricks are actually slower than the extraneous CASE evaluations.

I've got a couple other weird ideas to try, though.
Alright, this was as fast as I could get it. It runs in about half the time, but it's also about 140 bytes larger. Anything I tried involving local variables and subroutines actually ran slower. :/ I'm guessing it could be even faster if you get rid of the A subroutine and unroll it wherever it's used, but then the program will get even more bloated.

Code:
\<<
  \<< ROT ROT - ROT ROT + SWAP \>> \-> A
  \<< "" SWAP
    WHILE DUP 1000 \>= REPEAT 1000 "M" A EVAL END
    WHILE DUP 900 \>= REPEAT 900 "CM" A EVAL END
    WHILE DUP 500 \>= REPEAT 500 "D" A EVAL END
    WHILE DUP 400 \>= REPEAT 400 "CD" A EVAL END
    WHILE DUP 100 \>= REPEAT 100 "C" A EVAL END
    WHILE DUP 90 \>= REPEAT 90 "XC" A EVAL END
    WHILE DUP 50 \>= REPEAT 50 "L" A EVAL END
    WHILE DUP 40 \>= REPEAT 40 "XL" A EVAL END
    WHILE DUP 10 \>= REPEAT 10 "X" A EVAL END
    WHILE DUP 9 \>= REPEAT 9 "IX" A EVAL END
    WHILE DUP 5 \>= REPEAT 5 "V" A EVAL END
    WHILE DUP 4 \>= REPEAT 4 "IV" A EVAL END
    WHILE DUP 1 \>= REPEAT 1 "I" A EVAL END
    DROP
  \>>
\>>

I tried it with IF instead of WHILE for the multi-letter terms (since they can only be appended once), but it was slightly slower. I can't explain that. Maybe extra parsing overhead looking for an ELSE or something.
(06-11-2014 08:23 PM)Dave Britten Wrote: [ -> ]I'm guessing it could be even faster if you get rid of the A subroutine and unroll it wherever it's used, but then the program will get even more bloated.

That was actually my first version since I develop on the calc and don't have enough screen real estate and not enough experience with RPL to hit a solution I like on the first try. Then I went through it and factored out the redundant stuff, and moved it outside the CASE structure. Unfortunately, I didn't time it. I just knew it was redundant and got rid of it. It wasn't exactly the same as the subroutine, actually it was less instructions because i could do things more in order, closer to where I had the numerical value and symbol. Next time I'll be a little more careful, since it's obvious I can't tell a good performing RPL program from a bad one just by looking at it.

Dave Britten Wrote:the extraneous CASE evaluations.

Are you talking about inside the CASE statement where the debugger appears to be stepping over all the unselected clauses one by one? If so, I noticed this and I don't understand it. Obviously CASE does work, and once a clause is selected the other clauses are certainly not selected. But sometimes (always?) the debugger appears to be testing all the clauses when you STEP.

Thanks for looking at this. Some interesting stuff for me in what you did. I had not seen a subroutine made local like you mentioned possibly trying earlier and showed in your example here. And I didn't think about getting rid of the while/case and using a bunch of whiles. I'll play around with this a little more.
(06-12-2014 11:44 AM)HP67 Wrote: [ -> ]Are you talking about inside the CASE statement where the debugger appears to be stepping over all the unselected clauses one by one? If so, I noticed this and I don't understand it. Obviously CASE does work, and once a clause is selected the other clauses are certainly not selected. But sometimes (always?) the debugger appears to be testing all the clauses when you STEP.

Yeah, the CASE statement would have to evaluate each condition sequentially until it finds the correct block to execute. As the remaining value gets smaller, you'll see it taking longer to work its way down to the bottom conditions. The 48 CASE statement is really more if an IF/ELSE IF/ELSE structure than the case structure from C, etc. In other words, you're getting a lot of extra DUP >= checks when the remaining value is small.
Just moved the WHILE-loop to the subroutine A. But I didn't check whether this is faster or slower.
Code:
\<<
  "" SWAP
  \<< \-> k r
    \<<
      WHILE k DUP2 \>=
      REPEAT - SWAP r + SWAP
      END DROP
    \>>
  \>> \-> A
  \<<
    1000 "M" A EVAL
    900 "CM" A EVAL
    500 "D" A EVAL
    400 "CD" A EVAL
    100 "C" A EVAL
    90 "XC" A EVAL
    50 "L" A EVAL
    40 "XL" A EVAL
    10 "X" A EVAL
    9 "IX" A EVAL
    5 "V" A EVAL
    4 "IV" A EVAL
    1 "I" A EVAL
  \>>
  DROP
\>>

Cheers
Thomas
(06-12-2014 01:48 PM)Thomas Klemm Wrote: [ -> ]Just moved the WHILE-loop to the subroutine A. But I didn't check whether this is faster or slower.
Code:
\<<
  "" SWAP
  \<< \-> k r
    \<<
      WHILE k DUP2 \>=
      REPEAT - SWAP r + SWAP
      END DROP
    \>>
  \>> \-> A
  \<<
    1000 "M" A EVAL
    900 "CM" A EVAL
    500 "D" A EVAL
    400 "CD" A EVAL
    100 "C" A EVAL
    90 "XC" A EVAL
    50 "L" A EVAL
    40 "XL" A EVAL
    10 "X" A EVAL
    9 "IX" A EVAL
    5 "V" A EVAL
    4 "IV" A EVAL
    1 "I" A EVAL
  \>>
  DROP
\>>

Cheers
Thomas

I think I tried that too (I did about 6 different modified versions), and it was a tiny bit slower, despite the code being cleaner.

Also, doing it with binary ints rather than reals was a little bit slower as well.
(06-12-2014 01:53 PM)Dave Britten Wrote: [ -> ]Also, doing it with binary ints rather than reals was a little bit slower as well.

Sorry, I did that on an HP-48.
(06-12-2014 02:04 PM)Thomas Klemm Wrote: [ -> ]
(06-12-2014 01:53 PM)Dave Britten Wrote: [ -> ]Also, doing it with binary ints rather than reals was a little bit slower as well.

Sorry, I did that on an HP-48.

Me too; I meant, e.g., #1000d vs. 1000. Sorry, wasn't directly referring to your code there, just throwing out something else I tried.

I'm not too surprised that ran slower, given how the Saturn and RPL are built for BCD. I just tried it because of a vague, nagging memory that I had something that ran faster with binary rather than real.
(06-12-2014 12:23 PM)Dave Britten Wrote: [ -> ]
(06-12-2014 11:44 AM)HP67 Wrote: [ -> ]Are you talking about inside the CASE statement where the debugger appears to be stepping over all the unselected clauses one by one? If so, I noticed this and I don't understand it. Obviously CASE does work, and once a clause is selected the other clauses are certainly not selected. But sometimes (always?) the debugger appears to be testing all the clauses when you STEP.

Yeah, the CASE statement would have to evaluate each condition sequentially until it finds the correct block to execute. As the remaining value gets smaller, you'll see it taking longer to work its way down to the bottom conditions. The 48 CASE statement is really more if an IF/ELSE IF/ELSE structure than the case structure from C, etc. In other words, you're getting a lot of extra DUP >= checks when the remaining value is small.

Oh, that's not what I meant. When debugging CASE statements in other code, the debugger seemed to be evaluating other clauses after the clause that was selected. It's easy to prove the code doesn't do that but in the debugger it sometimes looks like it does.

As far as what I think you're talking about now, having to DUP DUP DUP did bother me because it doesn't look very sano. But at the same time DUP is cheap, probably one of the cheapest ops in RPL so in practice it shouldn't do anything bad. And, if the code has to check for a certain condition, CASE and IF/THEN/ELSE are all the same. There are always going to be a bunch of redunant checks until you get to condition you want. The only way around it would be to code a very complicated CASE or IF/THEN/ELSE that would repeat only relevant tests. I thought about it, but didn't want to try it badly enough to actually code it for this.

Thanks Dave, Thomas for your comments and code.
Dave, your replacement with the local subroutine screams on my 50g. Time is 102 seconds average for two runs of 1,000 conversions. Nice work. Lots to be said for local subroutines.

I'll try Thomas' and post the 50g time later.

Ok, Thomas' version takes 157 seconds. Although as is usual for him the code is very clean I would guess calling the subroutine whether you need to or not accounts for the extra time. I think the fact that Dave turned my CASE into a bunch of descending whiles was pure genius. Thanks for the insights guys!
Yet another variant without using a subroutine:
Code:
\<< \-> n
  \<<
    {
      { ""      n }
      { "M"  1000 }
      { "CM"  900 }
      { "D"   500 }
      { "CD"  400 }
      { "C"   100 }
      { "XC"   90 }
      { "L"    50 }
      { "XL"   40 }
      { "X"    10 }
      { "IX"    9 }
      { "V"     5 }
      { "IV"    4 }
      { "I"     1 }
    }
    \<<
      + OBJ\-> DROP
      \-> r k
      \<<
        WHILE DUP k \>=
        REPEAT
          k - SWAP
          r + SWAP
        END
        2 \->LIST
      \>>
    \>>
    STREAM HEAD
  \>>
\>>
But I doubt that it will be faster with all that creating and destroying of lists.

Cheers
Thomas

PS: I tried using DOLIST but I failed. However I noticed something I didn't know:
{ 1 2 3 }
1
\<< DUP \>>
DOLIST

This produces:
{ 1 1 2 2 3 3 }

Whatever the program produces will be added to the list. I thought this will lead to:
1
2
3
{ 1 2 3 }
(06-12-2014 09:17 PM)Thomas Klemm Wrote: [ -> ]PS: I tried using DOLIST but I failed. However I noticed something I didn't know:
{ 1 2 3 }
1
\<< DUP \>>
DOLIST

This produces:
{ 1 1 2 2 3 3 }

Whatever the program produces will be added to the list. I thought this will lead to:
1
2
3
{ 1 2 3 }

Yeah, DOLIST (and I think maybe DOSUBS) is a little quirky. If the program leaves anything on the stack, the outputs will be collected in a list. Otherwise DOLIST doesn't have any final output.


This example produces just {1 2 3}:

Code:
{ 1 2 3 }
1
\<< \>>
DOLIST

The empty program does nothing and just leaves its one input as an output, which DOLIST obediently collects back into a list.
(06-12-2014 09:17 PM)Thomas Klemm Wrote: [ -> ]Yet another variant without using a subroutine:
...
PS: I tried using DOLIST...

I kept thinking there would be a DOLIST option for this as well. I'm not sure how the following compares with the rest speed-wise, but I did finally find a way to shoehorn a DOLIST (and a sigmaLIST) into the mix.

Code:

\<<
    { } SWAP
    { 1000 900 500 400 100 90 50 40 10 9 5 4 1 }    
        
    1 13 FOR n
        DUP n GET
        ROT SWAP IDIV2
        ROT ROT 4 ROLL
        SWAP + UNROT
    NEXT
    DROP2
    
    { "M" "CM" "D" "CD" "C" "XC" "L" "XL" "X" "IX" "V" "IV" "I" } SWAP
    2 \<< NDUPN DROP \>> DOLIST
    \GSLIST
\>>
(06-13-2014 01:01 AM)DavidM Wrote: [ -> ]I did finally find a way to shoehorn a DOLIST (and a sigmaLIST) into the mix.

Very nice! My only suggestion is to use a local variable L for the list. This makes the FOR-loop simpler:
Code:
\<<
  { 1000 900 500 400 100 90 50 40 10 9 5 4 1 } \-> L
  \<<
    { } SWAP
    1 13 FOR n
      L n GET
      IDIV2 UNROT + SWAP
    NEXT
    DROP
  \>>
  { "M" "CM" "D" "CD" "C" "XC" "L" "XL" "X" "IX" "V" "IV" "I" } SWAP
  2 \<< NDUPN DROP \>> DOLIST
  \GSLIST
\>>

However I can't test this at the moment.

Cheers
Thomas
(06-13-2014 08:12 AM)Thomas Klemm Wrote: [ -> ]
(06-13-2014 01:01 AM)DavidM Wrote: [ -> ]I did finally find a way to shoehorn a DOLIST (and a sigmaLIST) into the mix.

Very nice! My only suggestion is to use a local variable L for the list. This makes the FOR-loop simpler:
Code:
\<<
  { 1000 900 500 400 100 90 50 40 10 9 5 4 1 } \-> L
  \<<
    { } SWAP
    1 13 FOR n
      L n GET
      IDIV2 UNROT + SWAP
    NEXT
    DROP
  \>>
  { "M" "CM" "D" "CD" "C" "XC" "L" "XL" "X" "IX" "V" "IV" "I" } SWAP
  2 \<< NDUPN DROP \>> DOLIST
  \GSLIST
\>>

However I can't test this at the moment.

Cheers
Thomas

Very nice. IDIV2 isn't present on the 48, however. I think that was new on the 49. You could probably get away with some combination of DUP2/MOD/IP since this program would only need to deal with positive numbers.

EDIT: Simple numeric IDIV2 replacement for anybody that wants to run it on a 48.

Code:
\<< DUP2 / FLOOR SWAP OVER * ROT SWAP - \>>
(06-13-2014 08:12 AM)Thomas Klemm Wrote: [ -> ]My only suggestion is to use a local variable L for the list. This makes the FOR-loop simpler...
However I can't test this at the moment.

The simplification of the "stackrobatics" (I love that word, Dave) made up for the performance penalty of the local var, and it brought the overall run time down slightly.

I was curious as to how these various algorithms performed, so I did some testing and came up with the following for cycle timings:

HP1 (HP67's original): 0.11592 sec
DB1: 0.07968 sec
TK1: 0.13363 sec
TK2: 5.02521 sec
DM1: 0.91066 sec
TK3: 0.89567 sec

These were all timed on a 50g, converting the range of numbers from 1000-1499. The one exception was TK2 - I ran out of patience and reduced it to 10 iterations just to get a reasonable estimate.

Dave's algorithm appears to be the fastest, with HP67's and Thomas' first attempts being not too far behind. The algorithms (mine and Thomas') that use list processing are, at best, an order of magnitude slower. At worst, well we might have to find a different animal...

While doing the timings, I also found a bug in my app. If the resulting string only has one character (as in "M" for 1000), \GSLIST will error out proclaiming an invalid dimension. So I had to insert an empty string into the list ("" +) just before \GSLIST was called to make sure it always had something to add.

- David

P.S. My first attempt at this app was actually a SysRPL version, which is more in my comfort zone. I'd be happy to share it if anyone's interested, but it seemed to be off-topic so I didn't want to introduce it to the mix.
(06-13-2014 05:15 PM)DavidM Wrote: [ -> ]My first attempt at this app was actually a SysRPL version, which is more in my comfort zone. I'd be happy to share it if anyone's interested, but it seemed to be off-topic so I didn't want to introduce it to the mix.

Please do share it. The folks that know SysRPL will enjoy it (and likely attempt to improve it) and the folks that don't will get a taste of what SysRPL is like, to compare. Also, please post the analogous performance numbers as an additional incentive for the non-SysRPL folks.

Interesting thread to watch the techniques evolve, my thanks to contributors. It seems that using DOLIST always has the exact same 2 effects; makes code much cleaner/shorter and arguably easier to follow, and always slows the run-times.
Pages: 1 2 3
Reference URL's